[HN Gopher] Botspam apocalypse
___________________________________________________________________
Botspam apocalypse
Author : panic
Score : 379 points
Date : 2022-08-04 04:17 UTC (18 hours ago)
(HTM) web link (memex.marginalia.nu)
(TXT) w3m dump (memex.marginalia.nu)
| SavageBeast wrote:
| I get paid over 92 Dollars per hour working from home with 2 kids
| at home. i never thought i'd be able to do it but my best friend
| earns over 15k a month doing this and she convinced me to try.
| the potential with this is endless... Simply go to the BELOW LINK
| and start your work..
|
| EDIT: bad joke but maybe someone will get a chuckle.
| lloydatkinson wrote:
| I found that my netlify site attracts a lot of spam specifically
| from the same spammer/group. The messages always start with some
| variation of "Hi my name is Eric".
|
| Netlify seem to not really care after reporting it on their
| support forum. The spammer disables JS so no client side
| protection works. I've recently decided to (unfortunately) break
| the ability for JS disabled browsers to be able to submit the
| contact form. The form elements attributes are all wrong meaning
| the form won't submit correctly. Instead, some JS on page load
| sets the attributes to the correct values. I will wait a while
| and see if this solves it.
|
| While netlify does correctly mark all this as spam the fact is
| that legitimate messages sometimes can slip past these with false
| positives. So I have to check the vast amount of spam often.
| 1vuio0pswjnm7 wrote:
| "The rest are forced to build web services with no interactivity,
| or seek shelter behind something like Cloudflare, which
| discriminates against specific browser configurations and uses IP
| reputation to selectively filter traffic."
|
| Interactivity is not a must-have. The world's first general
| purpose computer, ENIAC, was not built for "interactivity". It
| was built to calculate ballistic trajectories, which were
| otherwise calculated manually. Computers exist to allow
| automation, to reduce manual labour.^1 "Tech" companies need
| interactivity to support collection of data about www users and
| paid services related to programmatic online advertising.
| Generally, users do not need interactivity. Generally, users do
| not need to spend excessive quantities of time "interacting" with
| networked computers.
|
| As a user, I want __non-interactive__ www services, whether it is
| data /information retrieval or e-commerce. I want to use more
| automation, not less. Automation is not reserved for those
| providing "services". It also should be available to those using
| them.
|
| Provide bulk data access. Let others mirror it. Take advantage of
| "open datasets" hosting if necessary. For example, Common Crawl
| is hosted for free with Amazon. Upload the data to internet
| Archive.
|
| "The API gateway is another stab at this, you get to choose from
| either a public API with a common rate limit, or revealing your
| identity with an API key (and sacrificing anonymity)."
|
| Publish the rate limit for the public API. Do not make users
| guess. Do not require "sign-in" to use an API to retrieve public
| data.
|
| 1. Some folks consider having to "interact" with a computer as
| labour, not fun.
| lifeisstillgood wrote:
| >>> Automation is not reserved for those providing "services".
| It also should be available to those using them.
|
| Yes !
|
| I call this software literacy. And yes - no matter how cool the
| JS on a major site, the fact that the sites goals are to keep
| me there and clicking and my goals are to get what I want with
| minimal action are in conflict.
|
| I would suggest that bots are actually not a problem. For most
| things I would like a bot _acting for me_. Telling me as and
| when that I need to visit the dentist, who has slots free next
| weds and friday. Friday is best because I am also WFH that day.
|
| The bot apocalypse is only one because we are trying to make a
| "web for humans" when actually a "web for bots, and a bot for a
| human" is a much better idea :/)
| denton-scratch wrote:
| > Telling me as and when that I need to visit the dentist
|
| Isn't that simply your calendar? Sure, you want it automated;
| but it doesn't need internet access, it doesn't need to crawl
| or search, I don't know why you refer to it as a 'bot'.
|
| To my mind, the idea of personal 'bots' was that you could
| give it some general instructions such as "Let me know when
| the content at any of these URLs changes", and then leave it
| running. Were they also called agents?
| nicbou wrote:
| Some part of it is loss in the process.
|
| I run a website about immigration. I'd love to reinstate
| comments and get valuable feedback from people who just tried
| my advice. Bots just make it too time-consuming.
| fabianhjr wrote:
| It would be simpler to decentralize and implement webs of
| trust[1] (that locality would also help community-building /
| social cohesion).
|
| Secure Scuttlebutt[1] doesn't have a lack of moderation / spam
| issue and it is completely decentralized and without monetary
| fees nor proof-of work. Why can't centralized services do better?
|
| [1]: https://ssbc.github.io/scuttlebutt-protocol-guide/#follow-
| gr...
| jgalt212 wrote:
| > I can't afford to operate a datacenter to cater to traffic that
| isn't even human. This spam traffic is all from botnets with IPs
| all over the world.
|
| In our experience (we don't have a forum), almost all of our bot
| traffic has been SEO spiders (or claiming to be so).
| jart wrote:
| This kind of botspam is usually pretty easy to address with
| redbean using the finger https://redbean.dev/#finger and maxmind
| https://redbean.dev/#maxmind modules. The approach I usually
| recommend people isn't so much ip reputation, which can be
| unfair, but rather it allows you to find evidence of clients
| lying to you. For example, if the User-Agent says it's Windows
| with a language preference of English, but the TCP SYN packet
| says it's Linux and MaxMind says it's coming from China, then
| that means the client is lying (or being MiTM'd) and you can
| righteously hellban once it's fingered for the crime.
| [deleted]
| unglaublich wrote:
| What keeps bots from just fixing their acts and reporting
| correct info instead?
| krageon wrote:
| Nothing. Once this sort of fingerprinting becomes common the
| common bot frameworks will bypass it.
| marginalia_nu wrote:
| To be fair, bot countermeasures are and have always been an
| arms race.
| jart wrote:
| But it's not common. So for the time being, redbean users
| have the advantage.
| krageon wrote:
| If taking away normal users' agency is an advantage to
| you, you go and use it.
| jart wrote:
| What's stopping them from clubbing you with a monkey wrench?
| With bots, to answer your question, it'd probably take
| another standard deviation in the IQ of the person using it.
| So you've ruled out all the script kiddies in the world by
| default. The purpose of this game isn't to have a perfect
| defense, which is impossible, but rather to make the list of
| people who can mess with you as short as possible.
| BiteCode_dev wrote:
| My laptop is lying all the time. I change my UA, preferred
| language, my ip, mac and so on, because of tracking, terrible
| dev assumptions and personal preferences.
|
| Yet, I'm a very good web citizen.
|
| Because of this, I often have to solve the same captcha many
| times before it thinks I'm human.
| jart wrote:
| I don't doubt it. Given how rare people like you are, I'm
| sure a good citizen like you would also be perfectly fine
| sending an email to the service asking to be whitelisted, or
| having a second browser that isn't your daily driver for
| situations like this that doesn't try to obfuscate its
| identity by behaving like a bot.
| krageon wrote:
| I won't lie, if you make an asshole system that bans me for
| doing perfectly normal things that make the internet work
| I'm going to assume I don't want to interact with it
| anyway.
| jart wrote:
| Then what would you propose that's better?
| BiteCode_dev wrote:
| The first solution isn't practical (so many services to
| manually find a mail to send a message to, then interact
| with a human that might not even exist), and if you do,
| they don't whitelist you. I tried. Either they don't
| answer, or have "no way to have a specific whitelist for a
| single user in our system".
|
| So the second browser is the solution. But then the site
| will do all the bad things that I wanted it not to do in
| the first place. Like serving terrible French results
| instead of good English ones, or assuming Firefox doesn't
| work based on UA while their site work fine with it. And of
| course track me to death, sell my data, and so on.
|
| The only solution that works is to chose services you pay
| money for: they have your card, so they know you are not a
| bot. For years now, I have been suspicious of anything
| free. But it doesn't solve the tracking problem.
| jart wrote:
| Yes I understand the desire for capitalism rather than
| surveillance capitalism, but that's a derailment. The OP
| appears to be someone who just wants to build something
| cool and share it with other human beings. In that case,
| it's really helpful to be able to have a free practical
| way to address abuse. Would you really tell someone like
| the OP to stop expressing themself and shut down their
| service and put a paid one in its place? How can you
| charge for search when Google gives it away for free?
| BiteCode_dev wrote:
| I understand all causes and consequences of this problem,
| and I'm not implying there is an easy solution, only
| underlying that using "the user is lying" will lead to
| frustrating false positives.
| nottorp wrote:
| > The OP appears to be someone who just wants to build
| something cool and share it with other human beings.
|
| Only thing is, you don't _know_ if that statement is
| true. Or they could really be wanting to build something
| cool but take advantage of all those "free" services and
| basically sell you to Google and Facebook.
| viraptor wrote:
| That's actually a terrible heuristic. My requests are often
| from windows proxied by Linux, with language set to my
| preferred one in a non-matching country. And that's before I
| start travelling and using a hotspot with a faked TTL to
| workaround telco limitations. That's before you even get to
| people completely unaware of interesting routing applied to
| them (like corporate proxies, vpns) and people with incorrect
| maxmind entries.
| Test0129 wrote:
| Not totally unrelated but I had to turn off email alerts and come
| up with a way to summarize things because Fail2Ban and other
| alert systems were hit quite literally every 15 seconds with port
| scans/attempted entries on SSH and other ports. Reporting the
| abuse to ARIN/ICANN didn't help because almost a full 95% of the
| traffic originated from China, and 90% of the remaining 5% was
| Russia. Of the remaining they were zombies inside of America,
| typically on digital ocean, and I was able to get those handled
| quickly and efficiently. When I had a simple (secure) login
| system hosted on HTTPS it was getting hit hard enough my VPS ISP
| was sending emails to figure out a way to stop it. There are
| literally 3 people that even know of the existence of these
| services.
|
| It is actually nuts just how much bot spam there is.
| elias94 wrote:
| > has been upwards of 15 queries per second from bots
|
| What type of queries are they generating? For what purpose are
| querying Marginalia? Scraping and filling internal search
| engines?
|
| > If anyone could go ahead and find a solution to this mess
|
| I would maybe trying to investigate why are querying your search
| engine. Is for the search results? Maybe from there you can
| create and sell an API service. Is for the wiki? Is for research
| purpose?
|
| I would love to see some data, raw or with some behavior derived
| from it.
| marginalia_nu wrote:
| Most of the queries don't seem to be tailored toward my search
| engine, they're ridiculously over-specified and typically don't
| return any results at all.
|
| As I've mentioned in another comment, my best guess is they're
| betting it's backed by google, and are attempting to poison
| their search term suggestions. The queries I've been getting
| are fairly long and highly specific, often within e-pharma or
| online casino or similarly sketchy areas.
|
| Like
|
| > cialis 50mg online pharmacy canada price
|
| Either that, or nonsense like the below, where they appear to
| be looking for CMSes to exploit (although I don't understand
| the appendage at the end)
|
| > "Please enter the email address associated with your User
| account. Your username will be emailed to the email address on
| file." Finestre Antirumore Torino
|
| > affordable local seo services "Din epostadress delas eller
| publiceras aldrig Obligatoriska flt r markerade med"
|
| > "You are not logged in. (Login)" Country "City/Town" "Web
| page" erst
|
| Point is, none of these queries actually return anything at
| all. I don't offer real full text search, for one. And the
| queries are much too long.
| shp0ngle wrote:
| The actual blogpost aside: the margianalia search is the first of
| these "alternative search engines" that I actually like.
|
| Most of those has been either "worse google" or "utter trash"...
| this one returns some interesting results for some queries I have
| tried.
| BiteCode_dev wrote:
| It's not that bad.
|
| First, of course, you have cloudflare and recaptcha, which are
| free and very efficient, as the author say.
|
| But even if you don't want to use them (some of my services
| don't), most bots are very dumb:
|
| - require JS, and you lose half of the web ones
|
| - silly tricks like hidden input fields in forms that worked in
| 2000 still work in 2022. Use a bunch of them, and you can yet
| again halve the bot traffic.
|
| - many URL should have impossible to guess paths. E.G: just
| changing the /admin/ url to a uuid in django or the /wp-admin/ in
| wordpress, you save so many requests.
|
| - bots are usually not tailored to your site, meaning if you
| require JS, you can actually embed anti-bot measure in the client
| code and they will work. E.G: exponential backoff + some heavy
| calculations if too many fast consecutive ajax requests.
|
| - fail2ban + a few iptables rules (mitigate syn flood, etc) will
| help
|
| - varnish + redis gets you very far to shave excess dummy traffic
|
| It's not great, but it's not an apocalypse.
|
| Unless you are under targeted attack.
|
| Then it sucks and you die.
| sparkling wrote:
| Very nice list of countermeasures. I agree that doing these
| small things like hidden input fields really go a long way.
|
| I would add to that:
|
| - block signups/comments from known throwaway email domains
|
| - block known datacenter IP ranges, at least for POST requests.
| Honestly on our sites 50% of spam was coming from AWS EC2 IPs
|
| - use a proxy/vpn/bot detection service like https://focsec.com
| Zak wrote:
| Please don't make blocking VPNs plan A. Between snooping ISPs
| and public wifi networks that have indiscriminate content
| filters, I'm on a VPN about half the time. Many other
| legitimate users are as well.
| emptyparadise wrote:
| But then you end up forcing people to use Gmail.
| EdwardDiego wrote:
| Yup, in adtech, "IP is an AWS block" was a bot 99.999% of the
| time.
|
| The 0.001% was that person using EC2 as a proxy or VPN
| server.
| mobilio wrote:
| It's not only AWS. Also happens on Azure and GCP.
| EdwardDiego wrote:
| True, but at the time, 3 - 4 years ago, Azure and GCP IPs
| were minimal.
|
| Guess the fraudsters were vendor locked lol.
| SyneRyder wrote:
| > First, of course, you have cloudflare and recaptcha, which
| are free and very efficient, as the author say.
|
| Recaptcha has been almost useless, in my experience. If you
| read the spam logs, you'll quickly learn about the spam
| software they (claim to) use to bypass Recaptcha, because
| that's what they end up promoting. I started tagging in logs if
| Recaptcha had validated on messges, and sure enough these spam
| posts had all successfully passed it. Great opportunity to rip
| out more Google dependencies from my website.
|
| I've found my own custom written filters to be vastly more
| effective than Recaptcha.
|
| Lots of the bots are running full Chrome with JS, lots of
| HeadlessChrome being used lately. The fact that they're using
| HeadlessChrome is something that makes them easy to detect,
| ahem.
| BiteCode_dev wrote:
| Those are very specific bots, recaptch will stop a lot of
| casual ones. Most of them in fact.
| SyneRyder wrote:
| That really hasn't been my experience, perhaps I'm just
| getting hit more by the sophisticated bots than the naive
| ones. I'm glad that it works for some people.
|
| Recaptcha was also filtering out some legit humans (I
| logged all posts regardless of captcha status to be
| reviewed later), so it just wasn't worth reducing the user
| experience when the captcha bot detection rate was so low.
| BasiliusCarver wrote:
| One of the things I've done before among the other suggestions
| is to put a hidden link like /followmeifyouscraping.html in the
| landing page to get a bit of info about scraping volume and
| then you can use fail2ban filters to block if it's visited if
| you want
| jiggawatts wrote:
| I would add that link to robots.txt as an exclusion.
|
| That way well-behaved search engines won't be affected, but
| naive scrapers get auto-banned.
| efitz wrote:
| No, then you hide behind CloudFlare, because only the CSPs and
| network operators have the infrastructure to deal with
| volumetric attacks.
| efitz wrote:
| Also, attackers are rarely going to try to guess your URLs -
| they're going to find them via Google or Shodan, or, if you're
| a good rest citizen, via "/<yourapp>/"
| bryanrasmussen wrote:
| >Also, attackers are rarely going to try to guess your URLs -
|
| because then the attack becomes DOS as they cycle through
| dictionaries of words?
| marginalia_nu wrote:
| I get quite a lot of guesses in my logs, probing in /solr/
| and so on (fairly pointlessly, I might add, as I run bespoke
| software).
| raverbashing wrote:
| Any website gets probed for wp-admin.php etc, even if you
| don't use WP
| BiteCode_dev wrote:
| In fact, if someone is probing for wp-admin, you should
| insta ban them, no matter the site.
| JimWestergren wrote:
| I am running a website builder with > 20K sites. I use open
| contact forms without captcha. What worked for me is to use a one
| line javascript that places current timestamp in a hidden input
| field that is default 0. Then I check on the backend and if the
| value is either 0 or time to fill out and send the form is less
| than 4 seconds I block as spam. This blocks more than 99% of spam
| and also takes care of most human copy paste spam as well.
| naillo wrote:
| I like this solution because spammers are unlikely to try to
| get around it. A delay eats into their time budget and they
| can't introduce a human-like waiting time on every site they
| try to spam, better to just move on to find cheaper targets.
| walls wrote:
| You could just decrease the timestamp instead of actually
| waiting.
| naillo wrote:
| I meant for general spammers who goes after tons of sites
| mostly blind. I agree it would not help for a targeted
| attack.
| JZerf wrote:
| I'm already using this timestamp technique on my website
| and so far no bot operator has bothered trying to work
| around this. However even if some bot operator were to
| specifically target a website using this technique and
| try to decrease the timestamp, I believe you could still
| force a bot to wait by just changing the website to use
| something like a cryptographic nonce that includes a
| timestamp instead of just a simple timestamp that can be
| understood easily.
| robalni wrote:
| If you don't want to require users to run javascript you should
| be able to make the server generate the timestamp.
| JimWestergren wrote:
| I used to do it with PHP but the problem is that then you
| can't cache the HTML (varnish or other solutions). Javascript
| don't have that problem and the added benefit of stopping
| bots that don't run javascript.
|
| In the error message I have a friendly texts for human to
| turn on javascript if it is off and a <a href="javascript:
| history.go(-1);">Go back and try again</a> making them not
| loose the text that they have typed.
| mariusor wrote:
| How do you do that, without bots being able to circumvent the
| feature?
| sschueller wrote:
| You could generate a CSRF token or something similar in a
| hidden filed based on a JWT token (yes I know) on the
| server side. The JWT token can either contain some
| timestamp after which it's valid or the time it was
| created.
| robalni wrote:
| People will be able to write programs that circumvent the
| feature but that's also true for the javascript solution.
| The point of it was that it gets rid of most spam because
| most bots fill in the form faster than 4 seconds and are
| not made to circumvent this feature.
| Aachen wrote:
| Bots can also circumvent this JS thing, so it's the same
| either way. <?php echo '<input
| type=hidden name=starttime value='.time().'>';
|
| On submit: <?php if (time() -
| $_POST['starttime'] < 4) die('2fast4me');
|
| Revealing the error condition (that it was submitted too
| fast) is nice for users and bots alike, of course. Up to
| you.
|
| I've had websites where I was too fast in submitting a form
| before. Not any kind of anti-spam, just their server was so
| fricking slow that I had input the date (iirc it was a
| reservation system) and clicked next before the JS blobs
| had finished triggering each other and fully loaded. It
| would break the page somehow with no visual indication. I
| found out by looking in the dev console and noticing stuff
| was still loading in the background. How normal people are
| able to use the Internet with how often I need the dev
| console to do entirely ordinary things is a mystery to me.
| JZerf wrote:
| I also use essentially the same technique (although I have the
| server generate the timestamp instead of using JavaScript) on
| my website and concur that this is a highly effective technique
| for blocking bot submissions.
| js4ever wrote:
| "There has been upwards of 15 queries per second from bots. There
| is just no way to deal with that sort of traffic, barely even to
| reject it."
|
| What??? My phone can serve that easily, any modern server can
| handle 50-250 rps
| marginalia_nu wrote:
| This is queries per second (as in I run a search engine), not
| requests per second.
| Joel_Mckay wrote:
| For small sites, I would just use a simple firewall:
|
| 1. whitelist the finite IP ranges for the regional ISPs/country
| where you do business
|
| 2. blacklist the proxy and tor exit nodes
|
| 3. blacklist the list of published compromised servers
|
| 4. add spamhaus blacklists
|
| 5. add fail2ban rules to trip on common server security scans,
| and unused common service ports
|
| 6. publicly reply to those having access issues, and imply they
| have bad neighbors.
|
| This will often take care of 99% of the nuisance traffic, but I
| still recommend live monitoring traffic regularly. ;)
| philprx wrote:
| Tor users are often legitimate good internet citizens.
|
| A lot of (lucky) us have the luxury to live in real
| democracies.
|
| Some others live in countries that use every single aspect of
| their private lives (DPI, mass surveillance) to put pressure on
| them and bend them to the regime's will.
|
| In my opinion, Tor and anonymity should not be killed as a
| result of silly bots.
| Joel_Mckay wrote:
| Your opinion is duly noted, and I agree most knowledge should
| be equally accessible to give everyone a chance to grow.
|
| That being said, a commercial site owes nothing to
| financially irrelevant bandits, sociopaths, or shills.
|
| Try it for a week, and then weigh the liability again. ;)
| RL_Quine wrote:
| > Tor users are often legitimate good internet citizens.
|
| We have had exactly zero traffic from it at any point which
| was legitimate. Any user who ever showed up with a exit IP
| ended up being banned eventually, so we just proactively
| fraud banned anybody who uses one, and anybody that was
| related to them. There is zero value in allowing anonymizer
| traffic on your service, and a whole lot to lose.
| TekMol wrote:
| Crypto currency mining could be the solution.
|
| If one request to the site generates more revenue than it costs
| in resources, the bot problem is solved.
|
| The author says that he is getting 15 bot requests to his site
| per second. That is about 36 million requests per month. How much
| does it cost to serve those? $1000 would seem high.
|
| $1000/36M = $0.00003 per request.
|
| How long would a crypto currency, that is suitable for mining in
| the browser, need to be mined before $0.00003 is generated?
|
| If it turnes out it is a few seconds or so, the solution would be
| nicely user friendly. A few seconds of CPU time for access to the
| site. No ads needed to finance the site.
|
| It is kind of telling, that Bitcoin started as a spam blocker.
| The original "hashcash" use case was to use proof of work to
| prevent email spam.
| GeckoEidechse wrote:
| As much as I hate the whole cryptocurrency hype myself, I think
| I agree that a proof-of-work requirement on spam detection that
| pays in the hosts favour could help solve spam to some degree.
| endgame wrote:
| Before bitcoin, there was hashcash, which aimed to do exactly
| this: http://www.hashcash.org/ . The original bitcoin paper
| cites it, in fact.
| kragen wrote:
| Satoshi Nakamoto almost certainly isn't Adam Back.
|
| It might be enough for the request to require more resources
| from the requestor than from the server, even if it doesn't
| actually give the server any money. I mean the requestor
| probably isn't going to be willing to dedicate more hardware
| "horsepower" to taking your search engine down than you are to
| keeping it up. That was the idea behind Hashcash.
|
| As for coins, the current Bitcoin hashrate is about 200
| exahashes per second, down from a high of over 250 a couple of
| months ago, and the block reward is 6.25 BTC until probably
| June 02024. At a price of US$24000/BTC that's US$150k per block
| (plus a much smaller amount in transaction fees) or about
| US$1.25e-18 per hash. So your suggestion of US$3e-5 would
| require about 2e13 hashes. https://en.bitcoin.it/wiki/Non-
| specialized_hardware_comparis... says an overclocked ATI Radeon
| HD 6990 can do about 800 megahashes per second (8e8) so you're
| looking at about 3e4 seconds of compute on that card, about 8
| hours.
|
| Maybe one of the altcoins that uses a hash function with a
| smaller ASIC speedup would be a better fit, although I don't
| know enough about mining to know if there are any where GPUs
| are still competitive. Still, it seems like it might be more
| than a few seconds?
| TekMol wrote:
| I have not seen any arguments yet, why Satoshi is not Adam.
|
| I did say "crypto currency, that is suitable for mining in
| the browser" for exactly this reason: That Bitcoin is not
| well suited for it.
|
| One would have to look at what typical consumer hardware is
| good at. Maybe an algorithm that saturates one CPU core with
| serial calculations that need fast access to exactly 1GB of
| RAM. I think consumer hardware is pretty good when it comes
| to single core performance and RAM access.
| swinglock wrote:
| You'd only need lightning integration and paying a small
| amount of sats, recycling the Bitcoin proof of work instead
| of making more. You could even have the server transfer the
| same sats back and forth as a token as long as the server
| side is happy.
| TekMol wrote:
| "only"
|
| You will not get your visitors to buy Bitcoin and set up
| a lighnign wallet to visit your website.
|
| But having some JS on your site that crunches numbers for
| 2 seconds before the user can progress would work.
| swinglock wrote:
| Well it would "just" have to be integrated in browsers.
| :)
|
| You don't have to buy it, you could crunch numbers for an
| equivalent cost if that's preferable. The advantage is
| that the effort can be stored and used later, so you need
| not even add a 2 second latency. Similar to "Privacy
| Pass".
| jb1991 wrote:
| > They're a major part in killing off web forums,
|
| I've noticed that a lot of old popular forums disappeared in
| recent years, but I didn't realize it was possibly due to bots.
| Why is that? I assumed that the admins just got tired of running
| them and moderating them.
| marginalia_nu wrote:
| It's more complicated than just bots, competition from Reddit
| is another factor, but bot traffic were certainly a significant
| part of the problem, both in terms of the constant drive-by
| exploits and ceaseless comment spam drove up the amount of work
| needed to operate a forum as a hobby to basically a full time
| job. With waning visitor numbers, it simply became untenable.
| SyneRyder wrote:
| Really glad to see someone finally talking about this.
|
| Does anyone know what's going on with that "Duke de Montosier"
| spam botnet? It accounts for more than half of the botspam
| attacks on my sites, and I can't find anyone talking about it
| online anywhere, except one tweet dating back to mid-2021. It's
| identifiable by several short phrases that it posts:
|
| _Duke de Montosier_
|
| _for Countess Louise of Savoy_
|
| _Testaru. Best known_
|
| And cryptic short posts that can assemble into creepy sequences:
|
| _Europe, and in Ancient Russia_
|
| _Century to a kind of destruction:_
|
| _Western Europe also formed_
|
| _and was erased, and on cleaned_
|
| _only a few survived_
|
| _number of surviving European_
|
| _55 thousand Greek, 30 thousand Armenian_
|
| Many of the IPs involved seemed to be in Russia, China and Hong
| Kong, though they're coming from all over (eg European & US VPNs,
| Tor exit nodes). From tracking the IPs on AbuseIPDB, the weird
| spam posts seem to be just one layer, while behind the scenes it
| also attempts SMTP Auth and IMAP attacks on the server.
|
| I'm eager to know more if anyone knows, and especially if anyone
| is trying to shut this thing down. But I can't find anyone even
| talking about it. (Maybe there's a reason for that?)
| marginalia_nu wrote:
| How very numbers station of them.
|
| I've seen it suggested that botnets use comment fields for
| command and control, maybe something like that?
| SyneRyder wrote:
| My theory for the phrases above is that they're a "unique
| seed" used to identify sites that are easily compromised. Do
| a web search, find a website filled with "Duke de Montosier"
| comments - bingo, you've identified an easy website to target
| with your backlink comment spam. Or, more maliciously, a
| website that is easy to thoroughly compromise with
| vulnerabilities. But that's just my current theory.
|
| Here's the one tweet I found in Swedish about the comment
| spam botnet, and it dates back to February 2021. She's the
| only person I could find who has mentioned it in public. Or
| maybe my search skills are failing me.
|
| https://twitter.com/aureliagu/status/1357368329573400578
| bombcar wrote:
| I suspect some of these are "bots sold for hire" where they
| make money selling the bot to people, many of whom don't know
| how to use it and run it with the default config.
|
| I've found spam email that certainly is the above, because it
| has things like PUT_LINK_TO_STORE_HERE and other variables that
| obviously weren't updated in the config file.
| prepend wrote:
| I assume it's time travelers trying to post enough so their
| message persists.
| Kiro wrote:
| Coin Hive was an interesting solution before it became synonymous
| with crypto jacking. In order to post a comment you had to lend
| your CPU to mine for X seconds. The only true anonymous and
| frictionless micropayment system I've seen.
| Nextgrid wrote:
| > The only ones that can survive the robot apocalypse is large
| web services. Your reddits, and facebooks, and twitters, and
| SaaS-comment fields, and discords. They have the economies of
| scale to develop viable countermeasures, to hire teams of people
| to work on the problem full time and maybe at least keep up with
| the ever evolving bots.
|
| I only agree when it comes to the system resources that can keep
| up with bots. When it comes to fighting spam, these services
| often do a terrible job because 1) their business model benefits
| from higher user & engagement numbers and 2) their monopoly
| status affords them to retain users even if their experience is
| degraded by the spam, something a small site often won't be able
| to do.
| Avamander wrote:
| It's annoying for sure. I deal with abuse at a large scale.
|
| I'd recommend:
|
| - Rate-limit everything, absolutely everything. Set sane limits.
|
| - Rate-limit POST requests harder. Preferably dynamically based
| on geoip.
|
| - Rate-limit login and comment POST requests even harder. Ban IPs
| that exceed the amount.
|
| - Require TLS. Drop TLSv1.0 and TLSv1.1. Bots certainly break.
|
| - Require SNI. Do not reply without SNI (nginx has 444 return
| code for that). Ban IP's on first hit that connect without.
| There's no legitimate use and you'll also disappear from places
| like Shodan.
|
| - If you can, require HTTP/2.0. Bots break.
|
| - Ban IP's listed on StopForumSpam, ban destination e-mail
| addresses listed there. If possible also contribute back to SFS
| and AbuseIPDB.
|
| - Collect JA3 hashes, figure out malicious ones, ban IPs that use
| those hashes. This blocks a lot of shit trivially because
| targeting tools instead of behaviour is accurate.
| [deleted]
| andai wrote:
| >Bots break.
|
| Wonder if you could respond in a way to get them to crash, or
| even better, to hang indefinitely.
| neurostimulant wrote:
| I'm sure these would work but I'll probably got banned too just
| because I often try to poke ip addresses directly. I also often
| use VPN especially when outside, so I'll definitely got banned.
| Avamander wrote:
| VPNs tend to be smaller offenders in terms of clients-per-IP
| than say educational institutions or offices.
| 1vuio0pswjnm7 wrote:
| "This spam traffic is all from botnets with IPs all over the
| world. Tens, maybe hundreds of thousands of IPs, each with a
| relatively modest query rates, so rate limiting does all of
| bupkis."
| Avamander wrote:
| Yep, there isn't a silver bullet that curtails all abuse.
| shiftpgdn wrote:
| Blocking the entire aws/gcp/azure/digital ocean/linode IP
| ranges will stop 99.999% of malicious bot traffic full
| stop.
| CWuestefeld wrote:
| Yes, it would.
|
| It would also stop a not-insignificant number of my
| customers.
| Plasmoid wrote:
| Why do your customers pay Amazon for egress to the
| internet? Isn't that very expensive?
| CodeSgt wrote:
| > Rate-limit login and comment POST requests even harder. Ban
| IPs that exceed the amount
|
| Don't ban IPs. Or if you do, let the ban expire relatively
| quickly (days/weeks, not months/years).
| Avamander wrote:
| Ideally you'd keep track of repeat offenders and decide the
| length based on that.
| LinuxBender wrote:
| Or at least rate limit session cookies. If a person does not
| have a session cookie, rate limit by IP. If they are
| authenticated as a unique person have different rate limits
| and different levels of authentication. HAProxy can do
| different rate limits by ACL conditions.
|
| Or instead of strictly rate limiting, ask them a question
| that can't be "looked up" in a table and that requires human
| thought, philosophy, emotion, ethics. Maybe GPT could
| eventually adapt to this and in that case fall back to IP
| rate limiting and grow the set of questions.
| annoyingnoob wrote:
| I ban IPs from small data centers all the time. For my
| purposes there is no need to support traffic from small
| hosting providers that are everywhere all over the world. I
| do not tend to ban the IPs of commercial ISPs that provide
| service to end users.
| rndgermandude wrote:
| You will probably ban a lot of VPN users as collateral
| damage. VPN providers often use these small and relatively
| cheap providers for their endpoints.
|
| You may be fine with banning those VPN users, or even want
| that - lots of bots will try to hide behind "legitimate"
| VPNs - but one has to be aware of this consequence at
| least, especially considering that more and more people
| seem to use them - probably also thanks to the aggressive
| "sponsoring" certain providers such as ExpressVPN do on
| e.g. a wide variety youtube videos.
| annoyingnoob wrote:
| It depends on what you are trying to protect I suppose.
| Banning OVH IPs (and others) cleared up a lot of issues
| for me. I don't miss them, but sure you might.
| superkuh wrote:
| > Require TLS. Drop TLSv1.0 and TLSv1.1. Bots certainly break.
|
| So will people who run older computers with older software. But
| I guess people who don't have money don't matter for commercial
| websites so screw 'em.
| Avamander wrote:
| I don't think there's much web you can visit with those
| browsers anyway. Windows XP with IE and no SNI support, maybe
| sites from that era without JavaScript would work?
| superkuh wrote:
| I think you'd be surprised the how recent a browser can be
| and still lack a client/server cypher overlap once you
| start whittling down what TLS versions you accept. Just at
| the start of the pandemic many government sites had to re-
| enable early TLS because so many people couldn't access
| their recent TLS only sites.
|
| But yeah, corporate employees aren't going to care about
| those people. Governments have to. And human persons
| building personal websites should too.
| Kalium wrote:
| Anything that got a significant update in the past ten to
| twelve years will support TLS 1.2. The window of systems that
| would support 1.1 but not 1.2 is pretty small. You have to go
| all the way back to IE on Windows XP before a lack of 1.2
| support becomes an issue.
|
| So, yeah. You're absolutely right. In a lot of cases the loss
| of revenue from users with severely outdated software will be
| less than the cost decease of cutting spam and abuse.
|
| This gets back to an old question - to what degree should
| legacy systems be supported and at what level of expense?
| There's no one easy answer that works for everyone.
| jmt_ wrote:
| I'm not very familiar with all the workings of HTTP/2.0 - why
| would it break bots? Assuming no CloudFlare type protection,
| does it somehow stop someone from using curl to get (non-JS
| generated) content? Does it thwart someone accessing the site
| from something like playwright/selenium?
| Avamander wrote:
| > I'm not very familiar with all the workings of HTTP/2.0 -
| why would it break bots?
|
| There's a lot of outdated garbage bots out there. Not using
| HTTP/2.0 is also often the default with various HTTP
| libraries.
| jmt_ wrote:
| So it just comes down to bot software not being compatible
| with HTTP 2.0 rather than any sort of HTTP 2.0 specific
| mechanism/feature?
| ruuda wrote:
| Yes
| troad wrote:
| In other words, make your website unusable for people who have
| to connect through VPNs or public networks, difficult for
| anyone without a stable Western broadband connection, and
| unpleasant for everyone else.
| hbn wrote:
| Google search seems to have this issue for most of the
| regions near me on the VPN I use (Private Internet Access)
|
| Sometimes I just turn it on if I have to fire off a few
| searches because it'll make me complete a long, tedious
| captcha for EVERY search
| Avamander wrote:
| With a bit of work any limits can be fine-tuned usually not
| to impact actual users behind NATs. Some collateral does
| happen but that's an unfortunate reality. I'd like you to
| elaborate on the rest of your comment though.
| dalbasal wrote:
| I agree that there's a "that's life" aspect to
| collateral/tradeoff.
|
| That said, I sympathize somewhat with the parent. "Done
| right, negative side effects are minimal" is an
| uncomforting statement. First, because things are often not
| implemented correctly. There are a lot of details and
| tuning that will often fail to materialize in practice.
| Second, because long tail usability issues can often go
| overlooked. The abuse->anti-abuse feedback loop is pretty
| tight. Abuse gets identified and counteracted. The anti-
| abuse-> UX problems loop tends to be noticeably looser.
| Often, it's just aggregates (revenue/AUD/etc).
| gkbrk wrote:
| > If you can, require HTTP/2.0. Bots break.
|
| Non-bots break as well. I have Firefox configured to use
| HTTP/1.1 only.
|
| No reason to chase Google's standard-of-the-day, HTTP/1.1 has
| worked for ages and it will continue to do so for the
| foreseeable future.
| Avamander wrote:
| Some old browsers break as well, if it's worth it depends on
| the website. It's your prerogative to disable an useful
| feature, you can also disable JavaScript. But there's little
| reason for a website operator to cater to that unnecessary
| edge case if it's mostly used for abuse.
| NullPrefix wrote:
| JavaScript is mostly used for abuse.
| Avamander wrote:
| Not in that direction though.
| bsuvc wrote:
| That seems like a strange reason to me. Isn't HTTP/2.0
| faster? Isn't it also basically transparent to the end user?
|
| I'm trying to figure out what I would gain by configuring my
| browser to use HTTP/1.1 only.
| dspillett wrote:
| If doing something fends off a lot of bots, but also
| inconveniences a very small number of people who have
| significantly non-standard or just out-of-date
| configurations, I'm likely to favour protecting myself from
| the former over worrying about the latter. To paraphrase Mr
| Spok: The inconveniences of the me outweigh the
| inconveniences of the you!
| bbarnett wrote:
| Bear in mind, inconveniencing 4.8% of users, does not map
| identically.
|
| Instead, you are often dumping 4.8+4.8+4.8 as you add block
| methods, with some overlap.
| marginalia_nu wrote:
| To be fair, most of my visitors are not exactly lining up
| with the expectations of "standard". I get >90% of my
| [human] traffic from desktop clients, for example.
| bbarnett wrote:
| Sure, but the logic about mitigation does hold true, if
| you overlap methods.
|
| Eg, your method described in prior post, along with
| thongs which may lock out VPN or NAT users.
|
| Just something to consider.
| sirshmooey wrote:
| Genuinely curious, why disable HTTP2? Your web browsing must
| be awfully slow sans multiplexing.
| gkbrk wrote:
| > why disable HTTP2
|
| Because it adds nothing to improve my browsing experience,
| and reducing the number of protocols supported by my
| browser from 3 to 1 also reduces the attack surface.
|
| > Your web browsing must be awfully slow sans multiplexing.
|
| And yet it's not slowed down at all. How many different
| resources must a web page use before it feels slow on a
| connection pool of keep-alive TCP sockets? Maybe people
| visit some wild experimental web pages with hundreds of
| blocking <src> tags that are not bundled/minified?
|
| Either way, my experience is it doesn't slow anything down
| when I use both websites (forums, resources, youtube,
| social media) and web apps (banking, maps, food delivery
| etc).
| dboreham wrote:
| Perhaps your experience is the same, but it may impose
| extra load on middleboxes that track TCP flows.
| stonemetal12 wrote:
| https://github.com/dalf/pyhttp-
| benchmark/blob/master/results...
|
| HTTP2 is barely any better than http 1, if you want it to
| make a 1/10 of a second difference you have to be making
| 100s of requests.
| ajsnigrutin wrote:
| > - Rate-limit everything, absolutely everything. Set sane
| limits.
|
| This breaks when multiple users are behind the same IP. I've
| seen services fail even in classroom, because the prof did
| something and a few tens of students followed (captchas
| everywhere).
| icelancer wrote:
| Yup. This happened to us when we had rate limiting turned on
| our sites and ran off-site events at hotels, for example -
| then the hotel's IP got temp banned and our sales engineers
| would complain, rightfully so.
| winternett wrote:
| Black hats always find ways around rate limiting, and that's
| why they are more prevalent than actual users. People can
| literally run click farms with cheap 4g cell phones that
| artificially pump anything they want without consequences,
| while authentic posters that simply run 2 necessary accounts
| are penalized if they post regularly.
|
| The only real way to properly police Internet communities is
| to keep them smaller so that botting is more obvious, and to
| involve carefully managed moderation. Reddit tried this, but
| also lost track of the human factors involved and now
| moderators collect side money and promote their own posts
| artificially.
|
| The main problems facilitating the surge in bots are shammy
| creator funds and all the other measures sites take to boost
| their profit and market dominance. They have grown far too
| big and can no longer effectively manage their user bases
| effectively. Things weren't meant to be this way at all, the
| excessive quest for market dominance and profit has
| thoroughly corrupted freedom of info online in business, now
| many users are also following the same road map.
| [deleted]
| CWuestefeld wrote:
| For sure! Our site is B2B ecommerce, and any sizeable
| customer has all their users coming to us from a single NAT
| or proxy. For major customers it's likely that there are
| several of their employees using our system at any given
| time.
|
| The answer needs a whole lot more finesse than this.
| marginalia_nu wrote:
| (Author)
|
| I do in fact rate-limit everything, it is good advice, but
| the way you implement rate-limiting allows for traffic
| bursts. It's basically a reverse leaky bucket, where you
| start out with N allowed requests, which gets depleted for
| each request, and refilled slowly over time.
|
| Search traffic is fairly bursty, people do a few requests
| where they tweak the query and then they go go away.
| RamRodification wrote:
| Off-topic, but isn't that a normal (non-reverse) leaky
| bucket? When the bucket gets full the rate limiting
| engages. An empty bucket allows for a burst without getting
| full. It slowly leaks over time at a rate that allows a
| normal amount of traffic without filling up.
| megous wrote:
| To me, it's a bucket that's being filled at constant rate
| from a tap until it's full, and the traffic requires
| taking some water from the bucket. If there's no water,
| the trafic has to be dropped or wait in a queue.
|
| Basicaly, you can look at it either way.
| RamRodification wrote:
| I like it. I guess it's one of those things where it
| depends on the example used when you learned about it?
| For me it was some Nginx guide on rate limiting and I
| think they described it in the way I see it.
| marginalia_nu wrote:
| Hmm, yeah, that's actually true now that I think about
| it.
| Avamander wrote:
| A "sane limit" wouldn't be "one person one IP", such a global
| limit should rather stop one IP (even if it's a nasty CGNAT)
| from having a negative impact on the entire service.
|
| If such a limit would hinder classroom usage, but that's your
| target audience then other solutions should be found, fairly
| logical.
| GeckoEidechse wrote:
| I wonder if a general solution could be to make the visit more
| computationally demanding to the visitor than to the host, e.g.
| some form of proof-of-work. I guess captchas already do that in
| some sense but they require the humans to do the work.
|
| Now the author above has stated they dislike the crypto route and
| I agree that the whole web3 idea is bs but what if in the case
| that spam of some form is detected by the server, it requires the
| visitor to show some proof-of-work and combine that with the
| "mining crypto in JS instead of ads" craze. That way the bot
| would need to put work in which would slow it down and at the
| same time it would pay for its own visit.
|
| No ofc no spam detection system is perfect and it would also hit
| human users but in their case it would be just a wait a few more
| seconds longer for page to load kinda case.
| AviationAtom wrote:
| It's funny the author mentions Facebook and Twitter, because the
| bot spam on both is quite apparent. The spam on the former has
| risen greatly, seemingly mostly from India and parts of Africa.
| Scam air duct cleaning posts, posts about hacked account
| recovery, and random other crap. It really degrades the
| experience of the Internet, IMHO.
| kazinator wrote:
| In the 1980's, we kept anklebyters off dial-up BBSses with a
| simple technique: voice validation. To join the forum, you had to
| fill an application first, which included your real name and
| phone number. The sysop would give you a call for a quick chat,
| and then grant you access if you didn't seem like a twit.
|
| This would be entirely practical for some small-time operator
| trying to run a forum off residential broadband, while
| impractical for the reddits, facebooks and twitters.
| OliverJones wrote:
| "anklebyters"! I learned a useful new word today. Thanks.
| [deleted]
| bluedino wrote:
| I remember some BBS registration forms where you would have to
| give the names of a couple existing users that would vouch for
| you. Kind of like other sites where you need an invite or
| referral from an existing member.
| s1k3s wrote:
| I guess it's a different time and it also depends on who's your
| target audience. Some people go crazy if you ask for their
| email address. Phone numbers and calling is a big no-no.
| mjevans wrote:
| I'm one of those radical militants who refuses to give up any
| means of direct contact...
|
| However for a small scale thing I'd gladly go visit at a face
| to face meetup to fulfill this type of validation.
| shaburn wrote:
| What if they came to you. What is the imputed value of that
| network connection relative to cost...?
| nottorp wrote:
| > However for a small scale thing I'd gladly go visit at a
| face to face meetup to fulfill this type of validation.
|
| Even if it were 3 flights totalling 18 hours away? :)
|
| Or even just from one coast of the US to another...
| mjevans wrote:
| Someone that far away shouldn't want my direct contact
| information to join a group.
|
| However there is a medium / large organization case,
| where each area has local 'chapters' or some other term
| for a small fragment of the larger group. In that case
| the local leaders each operate as a small group for their
| areas.
| easrng wrote:
| You could schedule a voice-only jitsi or some other kind of
| call that doesn't need an email or phone number.
| mnd999 wrote:
| Phone number is an excellent tracking identifier across
| services. Even better than email, which is why the data
| hoarders want it.
| figmaheart255 wrote:
| While clearly bot spam is on the rise, we need to be _very
| careful_ on how we choose to deal with it. Cloudflare has already
| introduced "proof-of-Apple" [1], where proven Apple devices get
| special treatment, bypassing captchas. Later we might see
| websites that are _only_ accessible via Google, Microsoft, or
| Apple devices. If we continue down this path, we 'll end up with
| a social credit system ruled by big tech.
|
| [1]: https://news.ycombinator.com/item?id=31751203
| kube-system wrote:
| We basically already have "social credit" systems, we just call
| them anti-fraud/anti-spam/reputation scores.
| boredumb wrote:
| I get a ton of spam from my contact me pages even with a captcha
| in place, i've been experimenting with loading an initial dummy
| form and replacing it within a few seconds of loading to the real
| deal which seems to have cut down on bots submitting stuff.
|
| Rate limit everything you can and use a captcha where acceptable,
| there are also a load of public IP and email blacklists that you
| can use to run a quick check. Working in a field where there is a
| large amount of bots and incentive to abuse we invest quite a bit
| of time and money in fraudulent traffic detection using a
| cornocopia of different services in tangent and at the end of the
| day we still see a small percentage of traffic getting through
| that is fantastically human like.
|
| With that out of the way, I've been engulfed in AI and GPT3
| functionality lately and I thought this post was going to be
| doomsaying the coming apocalypse of bot spam, because the level
| of human like quality coming from the AI is going (already has)
| to make deciphering human vs bot traffic/posts/emails/comments
| nearly impossible. It will be fun here soon when we see forums
| entirely dedicated to bots conversing and arguing with each other
| outside of reddit.
| ComputerCat wrote:
| Same! The captcha doesn't seem to be able to slow down the
| bots. Inbox is still getting flooded with spam.
| [deleted]
| golergka wrote:
| > If Marginalia Search didn't use Cloudflare, it couldn't serve
| traffic. There has been upwards of 15 queries per second from
| bots.
|
| 15 RPS is very far from an apocalypse.
| [deleted]
| marginalia_nu wrote:
| It is if you're hosting an internet search engine on a PC.
| [deleted]
| golergka wrote:
| Why would you do such a thing in the first place?
| marginalia_nu wrote:
| Because I want this search engine to exist, and I'm not a
| multimillionaire so I can't afford better hardware.
|
| See, when it comes to not being able to find stuff on
| Google, you can either complain about it on the internet,
| or you can build a search engine yourself that allows you
| to find what you are looking for.
|
| I chose the second option.
| Avamander wrote:
| It's bad if it's your dead-average Wordpress site that has 10
| PHP workers, each page load being >1s. Easy DoS.
| Aachen wrote:
| Yeah but WordPress is an extreme example. Every time a WP
| blog is posted to HN without a static-page-ifier (caching
| layer that basically turns the dynamic pages into static
| ones), it dies within minutes. Normal software doesn't seem
| to have that problem.
|
| I traced it once, and I got to admit there was not an obvious
| bottleneck (this was 2015 or so). Just millions upon millions
| of calls into deeper and deeper layers for things like
| translations or themes. Wrapping mysql_query in a function
| that caches the result (to avoid doing identical queries)
| helped a few % I think, but aside from major changes like
| patching out the entire translation system for single-
| language sites, I didn't spot an obvious way to fix it. You'd
| need to spend a lot of time to optimize away the complexity
| that grew from suiting a million different needs, contributed
| by thousands of people across many years.
| unixbane wrote:
| >spam
|
| captchas were designed to solve this (and only this, as opposed
| to requiring them to merely view content like modern ignorant web
| devs like to do [yes i know some web devs now require it to be
| able to make sure the people they're datamining are real, but
| this is a new practice from this year basically])
|
| public services should be implemented by decentralized p2p.
| static content is solved by ipfs, freenet, etc. dynamic content
| perhaps can only be solved with smart contracts, which would be
| less bad than cloudflare if they weren't expensive, as they still
| provide protocol conformance (unlike cloudflare that requires you
| to have your packets look like a big 4 browser), anonymity (yeah,
| pseudonyms, you can still make one per query), etc. without smart
| contracts many interactive applications are still possible
|
| > The other alternatives all suck to the extent of my knowledge,
| they're either prohibitively convoluted, or web3 cryptocurrency
| micro-transaction nonsense that while sure it would work, also
| monetizes every single interaction in a way that is more
| dystopian than the actual skull-crushing robot apocalypse.
|
| centralized web hosting is and always was unsustainable and this
| is the reason most web content is commercial garbage, and the
| problem will only get worse. my concern was always what kind of
| garbage boomer protocol will become the new standard. i sure as
| hell dont want something that looks like email, web, or UN*X.
| ltr_ wrote:
| tangential : Two weeks ago (and for a while) our country's
| twitter-sphere (Chile) was completely and obviously dominated by
| bots, they were starting and inflating trending topics with
| absurd lies, spreading fear and chaos in favor of "Rechazo" (the
| option against our new constitution in the next ballot) or echo
| chambers for republican and extreme right associated
| politicians.. What happened? a self organized group[1] started to
| do data analysis of the trending topics and delivering the
| results to the people showing who was behind the campaigns and
| synthetic likes, after this, prominent and public figures from
| that sector started to cut funding for bot networks (because of
| the public shaming and media attention they were receiving) and
| is so pathetic now ,they can't even get more than 100 likes and
| often the most popular response is a refutation or the very same
| analysis showing the bot network working with substantially more
| organic likes. I think is a very interesting phenomenon to watch.
| Note that this lies/fear/chaos campaign is transversal, from
| rural AM radio to tiktok, but is not working at all. People is
| very aware of these campaigns and knows how to defend against.
| Truth is stronger than money.
|
| - [1] https://twitter.com/BotCheckerCL
| 12907835202 wrote:
| For my forum with 500k users a month I just added a registration
| captcha related to my niche. E.g. for a Dark Souls forum it would
| say "what game is this forum about?" And if you got it wrong the
| validation would include "tip it's just two words D _rk S*ls ".
| This reduced spam by over 99% and didn't annoy people with
| recaptcha.
|
| If someone was unable to get past that captcha (it still happens
| I have logs!) I figured they were probably not that valuable a
| contributor anyway.
|
| If someone wanted to target my site directly they could but
| hasn't happened so far._
| ridgered4 wrote:
| Reminds me of a guy who implemented a pre-screen on his phone
| calls to stop spammers. He said he wanted to use something
| simple at first and that he planned to tweak it depending upon
| how many spammers go through. So his phase one it asks "Dial 1
| to continue". But that was enough to stop all the spam calls so
| he never had to improve it.
| imperialdrive wrote:
| I did the same for my parents home phone. Completely stopped
| all spam calls!
| Lex-2008 wrote:
| re: someone was unable to get past that captcha - this reminded
| me a story I heard back in ICQ times about some human who
| couldn't pass anti-bot question: "What planet do we live on?"
| Suzuran wrote:
| I remember a friend's con-group forums who had an issue along
| these lines - the anti-bot question was "What is the
| brightest thing in the sky at noon?" the expected answer was
| "the sun", but some guy got stuck because he was answering
| "Sol". Since they had an IRC channel the issue was relatively
| quickly resolved, but it was an in-joke for some time.
| bombcar wrote:
| The key takeaway is that if you have a _second_ line of
| communication, humans can use it but bots won 't - "Issues
| registering? Contact someemail or see us on IRC/Discord"
| can do wonders.
| gilrain wrote:
| Fair enough... one can only speak for oneself, after all.
| d3nj4l wrote:
| A niche dark souls forum sounds interesting, any chance I could
| get a link?
| google234123 wrote:
| You misread the post. That was just an example. A niche dark
| souls forum wouldn't have 500k users lol.
| EGreg wrote:
| I will reiterate what I had been saying on HN for years:
|
| 1) The problem is centralization. Yes DNS is federated but there
| is a central registry. This means anyone can spam
| you@yourdomain.com or visit your web server listening for HTTP
| connections at www.domain.com
|
| 2) DNS is a glorified search engine. Human readable domain names
| are only needed for dictating a domain name (and listeners often
| make mistakes anyway). They only map to a small fraction of URLs)
| namely the ones with "/" path name. For most others, the human
| readability adds little benefit.
|
| 3) Start using URIs that are not human readable. The titles,
| favicons and other metadata of resources should simply be cached,
| and displayed to the user in their own bookmarks, search engines
| or whatever. For Javascript environments, variables can easily
| hold non human readable URIs. Also QR codes can resolve to non
| human readable URIs.
|
| 4) There may be some cookie policy for third party hostnames etc.
| but just make them non human readable also.
|
| 5) We should have DHT or other decentralized systems for routing,
| and here is the key... in this system, you need a capability
| issued by the website / mailbox owner in order for your message
| to be routed to them. If the capability is compromised and used
| to get a ton of SPAM, they simply revoke that specific capability
| (key).
|
| For HTTP websites you can already implement it on your side by
| signing the keys / capabilities ie session cookie balues with an
| HMAC, and there is no need to even do network I/O to verify them,
| you can upload the whitelist to the edges and check them there
| easily.
|
| But going further, for new routing protocols, IP addresses should
| be removed after the first hop in the DHT, because the global
| routing system will send traffic there otherwise. See how SAFE
| network does it.
|
| 6) I don't need a "real names policy" or "blue checkmark". I can
| know who "The Real Bill Gates (TM)" is through some verified
| claims by Twitter or someone else. Just because I have the email
| billgates@microsoft.com doesnt mean I should be able to email
| him. There can be many Bill Gates. The names are just verified
| claims by some third party. Here on HN we dont have names or
| photos, and it works just fine.
|
| 7) Most of the celebrity culture, papparazzi, Elon Musk and
| Donald Trump moving markets and tweeting at 5am to 5 million
| people at once, are problems of centralization. Both a 1 to many
| megaphone and a many to 1 inbox. Citizens United is just a
| symptom of the problem. I have spoken about this (privately
| owning access to an audience) with Noam Chomsky in an interview I
| did a year ago:
|
| https://community.qbix.com/t/freedom-of-speech-and-capitalis...
|
| Fox News (Rupert Murdoch), CNN (Ted Turner), Twitter (Elon or
| Jack), Facebook (Zuck) are controlled by only a few people.
| Channels on youtube, telegram, podcasts etc are controlled by a
| few people. This leads to divisions in society, as outrage
| clickbait rises to the top. Nonprofit models based on
| collaboration like Wikipedia, Wikinews, Open Source and Science
| produce far more balanced and benign information for the public.
|
| In short we need alternatives to celebrity culture, DNS and other
| systems that centralize decision making in the hands of a few, or
| create firehoses and megaphones. Neither the celebrity nor the
| public actually enjoy the results.
| david_draco wrote:
| Have a "CAPTCHA" that gives the IP reputation for some time
| (cookie+IP=key), but instead of a CAPTCHA make the web page /
| browser solve and submit a BOINC task from a randomly picked
| science project. No user interaction needed, it has the benefits
| of "paying by computation" of cryptocurrencies without the
| tracing, and if bots solve the problem efficiently, it's good for
| science.
| GTP wrote:
| But solving a BOINC task requires too much time while the
| average user rightfully expects a webpage to load within 5
| seconds or so
| rapnie wrote:
| That is a nice idea. So bit similar to mCaptcha [0] that uses
| PoW algorithm, mentioned in other comment [1] in the thread.
|
| [0] https://mcaptcha.org/
|
| [1] https://news.ycombinator.com/item?id=32339902
| zakki wrote:
| Can we make a bot to mine a cryptocurrency?
| GTP wrote:
| It's called miner and you can already install it on your pc.
| julianlam wrote:
| I disagree with TFA's take on dealing with spam -- giving up!
|
| For our app, we don't deal with spam in any novel way. We use
| honey pot, SFS, and Akismet.
|
| However, by far the easiest way to stop spammers is a post queue.
| Lots of spammers will just create a burner account, fire off
| their spam, and start over. Given no actual reputation, give them
| the trust they deserve -- none.
|
| The other factor is building out a _fast_ backend. Besides
| benefiting your own users, it also means Googlebot or Ahrefsbot
| won 't absolutely cripple your site when they come knocking.
| Sometimes that is doable, sometimes not.
| marginalia_nu wrote:
| (Author) I'm running a search engine though. Do you propose I
| require users to register an account, and then not allow them
| to search?
|
| I think my backend is plenty fast given it's hosted on a PC off
| domestic broadband. Most searches complete sub-100ms.
| julianlam wrote:
| Hey, thanks for the reply!
|
| Specific scenarios require creative solutions. For a search
| engine, how do you differentiate between robots and
| legitimate users? It seems a rate limiting step is likely the
| best solution.
|
| If query rate from the same IP exceeds a threshold, throttle
| them creatively. +100ms the next time, +250ms the next, etc.
|
| The upside is these bots will adjust their strategies to hit
| your site slower, which is the whole point, isn't it.
|
| If they spread requests across IPs, perhaps try
| fingerprinting. I'm not sure how effective that is on the
| backend though.
| Animats wrote:
| What's hard to do now is host a lightly used but broadly
| interesting service that doesn't require a login.
|
| Although, surprisingly, I host such a service, and while it gets
| a constant stream of random hits, they're a minor nuisance.
| Probably because it's just the back end for a web page, and
| nobody bothers to target it specifically. Random web browsing
| won't find it, and the API will just return an error if called
| incorrectly. Even if it is called correctly, it has fair queuing
| on the service, so hammering on it from a small number of IP
| addresses won't do much.
|
| That did happen once. Someone from a university was making
| requests at a high rate and not even reading the results. I
| noticed after a month, and wrote to their department chair, which
| stopped the problem.
| closedloop129 wrote:
| >What's hard to do now is host a lightly used but broadly
| interesting service that doesn't require a login.
|
| Which other broadly interesting services do exist? The owners
| of those services could come together and offer a VPN that gets
| preferred treatment for these services. This could be more
| precise than https://www.abuseipdb.com/.
| Aachen wrote:
| Same! I also got like 20 requests every second from a
| university IP. I tried a few things to make it error out, like
| returning 404, but no dice. In my case, it was my own fault
| though, a page with a few lines of JS to periodically check for
| updates got into a crazy state (I never found out how) and they
| didn't notice because it was a remote desktop system where they
| left the page open. Went on for months but didn't impact my
| service (I just noticed it in access logs while looking for
| something else) so I left it and remembered again a few months
| later, then it was gone.
| s1k3s wrote:
| Yes, this is why I plan to take down my hobby projects. And
| it's not only bots, real people do it as well. Apparently some
| people have a passion for screwing up other people's work. Some
| even email me afterwards asking for money to disclose a bug
| they found.
| [deleted]
| FrenchDevRemote wrote:
| Does anyone know how google/linkedin manage to block bots who are
| using SSO?
|
| Trying to login to a linkedin account using a google account from
| an automated browser(like puppeteer+puppeteer-stealth or
| fakebrowser), will open a white empty window instead of the
| normal google login window, I could be a limitation of those
| libraries, but I doubt it, smells like something they detect,
| maybe looking into might lead some interesting insights on how to
| limit modern bots.
| goatcode wrote:
| >large resources causing bot spam
|
| >large resources are the solution
|
| To those who have been recently pondering the history of
| antivirus companies of the 90s and 00s, and suspiciously
| wondering how they were always able to so quickly come up with
| definitions for the newest infections, this all feels so
| familiar. What a sad world we live in, sometimes.
| djohnston wrote:
| I work in this space at a company you've heard of - even at our
| scale and with our resources the proportionally larger attack
| incentives mean we are constantly firefighting.
|
| > The other alternatives all suck to the extent of my knowledge,
| they're either prohibitively convoluted, or web3 cryptocurrency
| micro-transaction nonsense that while sure it would work, also
| monetizes every single interaction in a way that is more
| dystopian than the actual skull-crushing robot apocalypse.
|
| I understand the drawback here but I would like to see monetized
| transactions employed as a defense layer a little more before we
| make a final decision. It is undemocratic, to be sure, but maybe
| for those of us who can afford it it's still better than the
| cesspool we currently sift through on every major platform.
| Anyone aware of any platforms taking this approach?
|
| Maybe the fediverse will help - by fragmenting networks attackers
| may have less incentive to attack a particular one.
| reaperducer wrote:
| _Anyone aware of any major platforms taking this approach?_
|
| The Postal Service?
|
| Sure, there's junk mail, but imagine how much junk mail there
| would be if it were delivered for free. It wasn't until phone
| calls became so cheap as to be "unlimited" that we ended up
| flooded with billions of junk calls.
|
| Microtransactions (non-crypto, thankyouverymuch) would solve a
| certain number of today's problems.
| marginalia_nu wrote:
| I do think it would help, like even if a transaction cost
| 0.05c, it would add up very quickly for a bot operator but
| stay cheap for everyone else. But I think the problem is it
| would inevitably introduce the need for a middle man, shaving
| 0.01c off that 0.05c, with a dubious incentive to increase
| the amount of money changing hands as much as possible. What
| you've invented at that point is basically Cloudflare with
| worse incentives.
|
| You get either that, or yucky defi web3 crap.
| djohnston wrote:
| Yes for sure, I have thought about making an email "stamp"
| web-3 service that would implement this. I even wanted to
| make some fun "pony express" animations whenever a letter was
| arriving to your inbox.
| kube-system wrote:
| Let's finally implement HTTP 402
| SmileyJames wrote:
| I thought a plan for spam had solved this one?
| http://www.paulgraham.com/spam.html
|
| Has NLP progressed rendering Paul's plan a failure?
|
| Am I a bot? How about you? Does it matter if I make valuable
| contributions?
| greazy wrote:
| Spam and bots eating traffic are two different things.
| timmaxw wrote:
| I wonder if proof-of-work would help. Suppose every form
| submission requires an expensive calculation, calibrated to take
| about 1 second on a typical modern computer/smartphone. For human
| users, this happens in the background, although it makes the
| website feel slower. But for bots, it dramatically limits how
| many submissions each botnet host can make to random websites.
| protoduction wrote:
| I'm the co-founder of Friendly Captcha [0], we offer a proof of
| work-based captcha since two years or so. Happy to answer any
| questions.
|
| A big part of what makes our captcha successful in fighting
| abuse is that we scale the difficulty of the proof-of-work
| puzzle based on the user's previous behavior and other signals
| (e.g. minus points if their IP address is a known datacenter
| IP).
|
| The nice thing about a scaling PoW setup is that it's not all-
| or-nothing unlike other captcha's. Most captcha's can be solved
| by "most" humans, but that means that there is still some
| subset of all humans that you are excluding. In our case if we
| do get it wrong and wrongly think the user is a bot, the user
| may have to solve a puzzle for a while, but after that they are
| accepted nonetheless.
|
| [0]: https://friendlycaptcha.com
| tmikaeld wrote:
| While your service is of high quality, the pricing is
| completely unreasonable for private use cases, many times
| higher than hosting the site in the first place.
| protoduction wrote:
| I'm sorry to hear that. We offer free and small plans for
| small use-cases, but I also understand that some projects
| don't have a budget at all.
|
| There is a blessed source-available version of the server
| that you can self-host [0]. It is more limited in its
| protection, but it is probably good enough for hobby
| projects.
|
| [0]: https://github.com/FriendlyCaptcha/friendly-lite-
| server
| dj_mc_merlin wrote:
| I think the it depends on what counts as a "request" in
| terms of pricing. Is it only successful checks? Pricing
| would be fine then. If it also includes failed checks then
| there is no point in the service, including the Advanced
| plan. Would eat through the entire credit in a day.
| tmikaeld wrote:
| If it was on successful validations, they would called it
| so, no it's on every request, even failed ones.
| tmikaeld wrote:
| "mCaptcha uses SHA256 based proof-of-work(PoW) to rate limit
| users."
|
| https://github.com/mCaptcha/mCaptcha
| cmjs wrote:
| I'm curious whether this can actually be considered to be a
| "CAPTCHA" in the true sense of the term. It doesn't seem to
| be intended to "tell computers and humans apart", but rather
| to force the client _computer_ (not the human user) to do
| some work in order to slow down DOS attacks.
|
| Of course slowing down DOS attacks is a great goal in itself,
| and it's very often what captchas have been (ab)used for, but
| it doesn't seem to me to replace all or most use cases for a
| captcha. In particular, since it can be completed by an
| automated system _at least_ as easily as by a human, it doesn
| 't seem like it would limit spambot signups or spambot
| comment or contact form submissions in any meaningful way.
|
| Or am I misunderstanding, @realaravinth?
| realaravinth wrote:
| Thanks for the ping!
|
| I used "captcha" to simplify mCaptcha's application,
| calling it a captcha is much simpler to say than calling it
| a PoW-powered rate limiter :D
|
| That said, yes it doesn't do spambot form-abuse detection.
| Bypassing captchas like hCaptcha and reCAPTCHA with
| computer vision is difficult but its is stupid easy to do
| it with services offered by CAPTCHA farms(employ humans to
| solve captchas; available via API calls), which are
| sometimes cheaper than what reCAPTCHA charges.
|
| So IMHO, reCAPTCHA and hCaptcha are only making it
| difficult for visitors to access web services without
| hurting bots/spammers in any reasonable way.
| cmjs wrote:
| Thanks for the reply! That's basically what I thought
| then - but as you say, traditional captchas are deeply
| flawed and ineffective anyway, and I totally agree that
| in many cases the cost to real users outweighs any
| benefit. So I'm excited to see alternatives such as
| mCaptcha popping up. It'll be interesting to see how it
| works out for people in real-world use.
| Aissen wrote:
| How does that work without becoming a SPOF for taking down
| the website ? Can't a user/botnet with more CPU power than
| the server simply send more captchas than can be processed ?
|
| In addition, using sha256 for this is IMHO a mistake, calling
| for ASIC abuse.
| realaravinth wrote:
| > How does that work without becoming a SPOF for taking
| down the website ? Can't a user/botnet with more CPU power
| than the server simply send more captchas than can be
| processed ?
|
| Glad you asked! This is theoretically possible, but the
| adversary will have to be highly motivated with
| considerable resources to choke mCaptcha.
|
| For instance, to generate Proof of Work(PoW), the client
| will have to generate 50k hashes(can be configured for
| higher difficulty) whereas the mCaptcha server will only
| have to generate 1 hash to validate the PoW. So a really
| powerful adversary can overwhelm mCaptcha, but at that
| point there's very little any service can do :D
|
| > In addition, using sha256 for this is IMHO a mistake,
| calling for ASIC abuse.
|
| Good point! Codeberg raised the same issue before they
| decided to try mCaptcha. There are protections against ASIC
| abuse: each captcha challenge has a lifetime and also,
| variable difficulty scaling implemented which increases
| difficulty when abuse is detected.
|
| That said, the project is in alpha, I'm willing to wait and
| see if ASIC abuse is prevalent before moving to more
| resource-intensive hashing algorithms like Scrypt. Any
| algorithm that we choose will also impact legitimate
| visitors so it'll have to be done with care. :)
| Aissen wrote:
| > the client will have to generate 50k hashes(can be
| configured for higher difficulty)
|
| I completely forgot how PoW worked, it's clearer now. You
| should probably add that this is a probabilistic average,
| so people will have to be ready for much longer (and
| faster) resolutions.
|
| With what you said, an adversary can probably just DoS
| mCaptcha without any computation, if verification is
| stateless (by sending garbage at line rate); if it is
| stateful (e.g CSRF token), you'll have to do a cache
| query, which is probably on the same order of magnitude
| of a single hash.
| realaravinth wrote:
| Hello!
|
| I'm the author of mCaptcha, I'd be happy to answer any
| questions that people might have :)
| titaniczero wrote:
| It looks great, as a suggestion: Instead of an easy mode
| and advanced one I would use a single mode with a
| calculator, that way it is more transparent to the user and
| it would make the process of learning the advance mode and
| concepts easier.
|
| Also, here: https://mcaptcha.org/, under the "Defend like
| Castles" section, I think you meant "expensive", not
| "experience".
|
| Keep up the good work!
| realaravinth wrote:
| Thank you for the kind words!
|
| > Instead of an easy mode and advanced one I would use a
| single mode with a calculator, that way it is more
| transparent to the user and it would make the process of
| learning the advance mode and concepts easier.
|
| Makes sense, I'll definitely think about it. The
| dashboard UX needs polishing and this is certainly one
| area where it can be improved.
|
| > Also, here: https://mcaptcha.org/, under the "Defend
| like Castles" section, I think you meant "expensive", not
| "experience".
|
| Fixed! There are a bunch of other typos on the website
| too, I can't type even if my life depended on it :D
| luckylion wrote:
| The results of the PoW are just thrown away, right? I
| wonder if you could couple that with something useful, e.g.
| what SETI@home used to do, but the intentionally small size
| of the work probably makes it difficult to be useful.
| realaravinth wrote:
| I'd love to do something useful with the PoW result but
| like you say, the PoW should be able to work in browsers,
| so they are intentionally small.
|
| The maximum advisable delay is ~10s but even then it
| might not be enough for it to be useful.
| rapnie wrote:
| See also dedicated submission at:
| https://news.ycombinator.com/item?id=32340305
| timmaxw wrote:
| Nice! Yeah, mCaptcha looks like just what I had in mind.
|
| I wonder why this approach hasn't been widely adopted?
| tmikaeld wrote:
| Probably due to "PoW" being power-hungry, but that's
| largely false because you only apply PoW here on users that
| are abusing the system.
|
| Allowing abusers to freely abuse would cost even more power
| than just forcing them to do the work.
| rapnie wrote:
| mCaptcha is in the process of being adopted in Gitea and
| Codeberg. See recent Fediverse post from the project
| account: https://gts.batsense.net/@mcaptcha/statuses/01G9KR
| BRC8CRC9M3...
| realaravinth wrote:
| The project is very new, I haven't started promoting yet.
| The Codeberg development was purely from word of mouth :)
|
| disclosure: I'm the author of mCaptcha
| the8472 wrote:
| This is an old idea known as hashcash.
| https://en.wikipedia.org/wiki/Hashcash
|
| Newer variations (such as argon2) are tunable so you can
| include memory footprint and cpu-parallelism. There also are
| time-lock puzzles or verifiable delay functions that negate any
| parallelism because there's a single answer which can't be
| arrived at sooner by throwing more cores at the problem.
| zozbot234 wrote:
| For small scale self-hosted forums, bespoke CAPTCHA questions
| can work quite well in practice. Make it weird enough and it
| just isn't worth that much for malicious users to break, while
| most humans can pass easily. Spammers benefit from volume.
| rapnie wrote:
| > most humans can pass easily
|
| Beware when choosing a CAPTCHA that serving "most humans"
| _might_ exclude those with accessibility issues, like the
| visually impaired.
| rrwo wrote:
| I run a website for a small company. The site has been around
| since the mid-1990s, and bots are a minor annoyance, but not a
| problem.
|
| We also use some simple heuristics to reject obvious bot traffic.
|
| One of the simplest is to have a form field that is hidden via
| CSS. Humans don't see it and it stays blank. Bots fill it in.
|
| Bots tend to fill in every form fields with random garbage, even
| checkboxes. Validating checkboxes rather than checking they have
| a value is another good way to detect bots.
|
| Many bots have a hard time with CSRF tokens in hidden fields.
|
| Many bots also don't handle session cookies properly. If someone
| submits a registration form without an existing session, we
| reject it. (So we don't get as far as checking the CSRF token.)
|
| After a certain number of failed attempts to register or login,
| we block the IP for a period of time.
| lifeisstillgood wrote:
| I would suggest that bots are actually not the underlying
| problem. For most things I would like a bot acting for me.
| Telling me as and when that I need to visit the dentist, who has
| slots free next weds and friday. Friday is best because I am also
| WFH that day. The bot apocalypse is only one because we are
| trying to make a "web for humans" when actually a "web for bots,
| and a bot for a human" is a much better idea :/)
|
| We need to redesign a web based on APIs, certificates, rate
| limits etc. And stop having "engagement" as a goal, and have
| "getting things done" as a goal
|
| Edit: mucked up formatting
| IfOnlyYouKnew wrote:
| So there's one service keeping this search engine online, and
| it's probably doing it for free, and the author can't even think
| of a better way to do it.
|
| Yet Cloudflare still gets two paragraphs of complaints in the
| face? Because the author wants to "own" something instead of
| "renting"?
| marginalia_nu wrote:
| I'm doing it for free because I don't want this to be a
| commercial service. I get that HN is startup city, but I'm not
| running a startup, it's just a hobby.
| SahAssar wrote:
| > There has been upwards of 15 queries per second from bots.
| There is just no way to deal with that sort of traffic, barely
| even to reject it.
|
| I don't really understand, is that a lot? 15qps does not sound
| like a lot, especially for a blocking/rejection function.
| marginalia_nu wrote:
| It's 15 search queries per second, not requests per second. RPS
| is usually 10-20x higher.
| SahAssar wrote:
| But you said "barely even to reject it", rejecting 15 QPS
| should not be heavy on any resource, right? Or is the actual
| problem identifying the bot traffic?
| Xeoncross wrote:
| I have considered skipping the regular fingerprinting,
| geolocation, captcha, hashcash, email verification, payment
| required, etc... mitigations and instead requiring people to drop
| into a public chat room (or pm/chat the support team) to have
| their account activated.
|
| The number of languages supported would be small to match whoever
| helped moderate this, but it would at least require speaking to
| someone. A PM thread or live chat would be an instant way to find
| out if someone can string two sentences together and might be
| worth allowing into the site. You could even have them create an
| account and solve a single captcha prior to getting access to the
| chat.
|
| It's not perfect by any stretch, but might be worth exploring
| having humans-verify-humans.
| pphysch wrote:
| The only real solution to the abuse of anonymous protocols is to
| stop using anonymous protocols and use protocols where clients
| can be held accountable. But that's politically nonviable in the
| West.
| JZerf wrote:
| I don't see any good reason why people can't be allowed to
| remain anonymous while still allowing website operators from
| taking measures to stop bot abuse. CAPTCHAs can already stop
| many bots. Other commenters have also mentioned that things
| like Proof of Work systems and micro-transactions could also
| stop bot abuse. These don't necessarily require giving up
| anonymity.
| pphysch wrote:
| It's not just bots, it's troll farms as well which are "real"
| people destroying public discourse in bad faith.
| JZerf wrote:
| Website operators could still take measures to stop abuse
| from troll farms as well while still allowing people to
| remain anonymous. A website operator like Twitter for
| instance could perhaps require users to make a small micro-
| transaction before allowing someone to make a post. Some
| equilibrium for the cost of a post could probably be found
| where most legitimate users would still be willing to pay
| that cost but most troll farms would not.
| pphysch wrote:
| Are you serious? The problematic troll farms are the ones
| backed by states and multinational corporations. Gating
| speech behind money only makes the problem worse.
|
| The correct approach is to deanonymize reasonably
| "public" online behavior. This is the only way to hold
| abusers accountable, and, ironically, democratize free
| speech.
|
| 1 person, 1 voice.
|
| Not 1 rich person, 100 troll accounts.
| JZerf wrote:
| Yeah, I'm serious. Even with Twitter, for example,
| currently allowing accounts to be created and posts to be
| made for essentially free, real accounts and posts still
| outnumber those of troll farms and bots from what I've
| seen. If those troll farms and bots actually had to pay,
| I imagine there would be far less. I also imagine that
| those troll farms and bots are less influential than real
| people. I believe that the endgame is that if a website
| operator takes enough measures to stop troll farms and
| bots, the operators of those troll farms and bots will
| eventually run out of resources and be forced to curtail
| their activity.
|
| You're right that gating speech behind money could
| potentially be bad and make problems worse but I only
| offered that as one suggestion. Instead of or in addition
| to using money, you could perhaps make a system that uses
| some type of karma/reputation for instance. Those could
| still be done anonymously.
| z3t4 wrote:
| My trick is to have one field that should always be blank and one
| field that should always have an value, this stops all automated
| bots. No "CAPTCHA" needed.
| paulmd wrote:
| > They're a major part in killing off web forums, and a
| significant wet blanket on any sort of fun internet creativity or
| experimentation.
|
| > The only ones that can survive the robot apocalypse is large
| web services. Your reddits, and facebooks, and twitters, and
| SaaS-comment fields, and discords. They have the economies of
| scale to develop viable countermeasures, to hire teams of people
| to work on the problem full time and maybe at least keep up with
| the ever evolving bots.
|
| This is not true at all. There are web forums that are not "web-
| scale" and don't spend all day fighting bot spam. The solution is
| real simple: it costs 10 bux to register an account, if you're a
| nuisance your account is banned and you pay 10bux to get back on.
|
| Even the sites that don't require payment for explicit
| registration - often succeed by gating functionality or content
| behind paywalls. Requiring a "premium membership" to post in the
| classifieds forum is an extremely extremely common thing on small
| interest-based web-boards (photrio, pentaxforums, homebrewtalk,
| etc). That income supports the site and supports the anti-bot
| efforts as a whole. The customer isn't advertisers - it's the
| community itself, and you're providing the _service_ of high-
| quality content and access to people with similar interests.
|
| You need to bootstrap a community first, of course, but it
| doesn't need to be a large community, just a high-value one.
|
| The twitters and facebooks of the world just don't like that
| solution because they value growth above all other
| considerations. They'd rather be kings of a billion user website
| with 200 million bots than a 1k-100k user forum with 100% organic
| membership and content. And they value engagement over content
| quality, which is the entire reason comment-tree/vote-based
| systems have been pushed heavily over web-1.0 threaded forum
| discussions as well.
|
| This botpocalypse is the inevitable outcome _of the systems that
| social-media giants have created_ , not inherent outcomes of the
| internet as a whole.
| NickRandom wrote:
| > The solution is real simple
|
| Uhhmmm, I beg to differ and so do a lot of very smart people
| with many more servers and users than you or I are likely to
| see.
|
| As with most 'Oh, its' Simple - Just Do XYZ' solutions there
| are often very good reasons for not doing the 'Easy/Simple/One-
| Liner' and here are a few with yours -
|
| Firstly - The '10 bux' could exclude a vast swathe of the
| poorest. Skipping a couple of Starbuck coffees vs. the local
| currency equivalent of whatever you are charging equating to a
| month's worth of food or being able to send at least one of
| your children to the local village school. I mean - your forum
| / site so you can gate it anyway you wish, I'm just pointing
| out that it could and would be exclusionary (perhaps
| unintentionally so).
|
| Next Problem: Accepting and Processing the 'Good Behavior'
| deposit. Congratulations, you now need to become a Payment
| Processor and as such have certain legal requirements regarding
| payment details and storage and also tax returns. 'Oh, just
| Off-Load it to Stripe' someone might suggest. Do-able I guess
| but anyone who has taken payments over the internet will tell
| you that its a Royal Pain in The Ass. Also, now all a 'Griefer'
| needs to do is run a few dodgy cards through your registration
| system and 'Poof' there goes your payment processor and/or the
| fees go sky high.
|
| Most 'oh its simple - why don't they just...' overlook (or are
| not aware) of the many, many good reasons why greater minds
| than yours or mine haven't already implemented it.
|
| Sure - sometimes people do come up with novel solutions to old
| problems so theres no harm in spit-balling and I'm certainly
| not directing any scorn or ill-intent in my reply.
| Kalium wrote:
| > Firstly - The '10 bux' could exclude a vast swathe of the
| poorest. Skipping a couple of Starbuck coffees vs. the local
| currency equivalent of whatever you are charging equating to
| a month's worth of food or being able to send at least one of
| your children to the local village school. I mean - your
| forum / site so you can gate it anyway you wish, I'm just
| pointing out that it could and would be exclusionary (perhaps
| unintentionally so).
|
| It _is_ intentionally exclusionary. Not necessarily of the
| poorest among us, but of those who expect free service.
| Botters and spammers are disproportionately likely to look
| for free service. Pretty much any level of required spending
| in any currency will have a similar effect. By cutting off
| the abuse-prone free tier that many bad actors depend on, you
| dramatically decrease your exposure to abuse.
|
| The point is not to keep out the poor people. The point is to
| make it far more work to get over the hurdle than it's worth
| for abusers. If you have a way to do the latter without the
| former that doesn't hinge on pushing a bunch of extra work
| onto administrators, I suspect quite a lot of people would be
| very curious to hear about it.
| [deleted]
| gnome_chomsky wrote:
| > Most 'oh its simple - why don't they just...' overlook (or
| are not aware) of the many, many good reasons why greater
| minds than yours or mine haven't already implemented it.
|
| The forums they are referring to have been operating with the
| "10 bux" model implemented on top of a highly customized
| version of vBulletin for over 20 years.
| tablespoon wrote:
| > This is not true at all. There are web forums that are not
| "web-scale" and don't spend all day fighting bot spam. The
| solution is real simple: it costs 10 bux to register an
| account, if you're a nuisance your account is banned and you
| pay 10bux to get back on.
|
| That doesn't work at all unless your service is already pretty
| popular. Who would pay $5 to access a new, empty forum?
|
| You mention "you need to bootstrap a community first," but
| that's basically an admission that this solution doesn't solve
| the problem at all, because you have to solve the problem in
| some other way to use this solution. 10bux was a solution
| limited to a _very specific time_.
| ineptech wrote:
| Step 1: Plant a tree twenty years ago...
| sircastor wrote:
| >The solution is real simple: it costs 10 bux to register an
| account, if you're a nuisance your account is banned and you
| pay 10bux to get back on.
|
| Many years ago there was a public server called SDF (Super
| Dimensional Fortress). It was a BSD system and anyone could get
| a user account for $1. The theory was even the least of us, a
| kid scrounging for money on the street, could come up with a
| dollar (and presumably the postage to mail it). To a certain
| person, access to this kind of server was invaluable - the only
| situation you could hope to get close to this kind of system.
| As time went on, the number of people interested in this was
| dwindling.
|
| Jumping through hoops is a useful gateway, but if your hoops
| are too complex or arduous, you miss out on people who you
| genuinely want to include in your community.
| bayindirh wrote:
| SDF is still alive and kicking, though. I have an account
| with them.
| [deleted]
| [deleted]
| RalfWausE wrote:
| >As time went on, the number of people interested in this was
| dwindling.
|
| I don't think so...
|
| First and foremost, SDF is well alive and there is a constant
| stream of people registering on it...
| Nextgrid wrote:
| > As time went on, the number of people interested in this
| was dwindling.
|
| That's mostly because access to computers has became easier -
| you can either get your own Linux box or get a proper VPS for
| extremely cheap (if not free - see cloud provider free tiers)
| nowadays so why bother with a non-root account on a _BSD_
| system?
|
| IMO it doesn't have anything to do with the barrier to entry.
| paulmd wrote:
| > Many years ago there was a public server called SDF (Super
| Dimensional Fortress). It was a BSD system and anyone could
| get a user account for $1. The theory was even the least of
| us, a kid scrounging for money on the street, could come up
| with a dollar (and presumably the postage to mail it). To a
| certain person, access to this kind of server was invaluable
| - the only situation you could hope to get close to this kind
| of system. As time went on, the number of people interested
| in this was dwindling.
|
| SDF is still around though, and still operates on the same
| model - pay once to get in, and you can stay as long as you
| want unless you become a nuisance.
|
| > Jumping through hoops is a useful gateway, but if your
| hoops are too complex or arduous, you miss out on people who
| you genuinely want to include in your community.
|
| It is certainly not impossible for communities with this
| model to die - that's not what I'm saying at all. Small
| social media sites die all the time, including with Reddit-
| style gamification bullshit. Or they turn into cesspits like
| Digg or Voat.
|
| But yes, increasing the friction of engagement is literally
| the point, you are losing some users but increasing the
| quality of the ones who remain. It's the old "fire your bad
| customers" routine, but for social media.
|
| "Oh no, we are all losing out on your valuable shitposting,
| how will this community ever go on?"
| Kye wrote:
| They even have a Mastodon instance.
|
| https://mastodon.sdf.org/about
| ticviking wrote:
| SDF is still around. I recently recovered my account and
| spent a lovely afternoon hanging out on their chat
| varispeed wrote:
| > There are web forums that are not "web-scale" and don't spend
| all day fighting bot spam. The solution is real simple: it
| costs 10 bux to register an account, if you're a nuisance your
| account is banned and you pay 10bux to get back on.
|
| There is a big problem with all sorts of activists though.
| Their modus operandi is finding forums they don't like and the
| go on post illegal material and reporting it to hosting
| provider. Many forums stopped accepting new users because of
| that and for instance only way to sign up is to find the owner
| and speak directly to them.
| paulmd wrote:
| few edits I wanted to make but couldn't while HN was down: this
| comes to a question of intentions, right? Like are you trying
| to _build a high-value community_ , or are you trying to make a
| billion-dollar company? Photrio or Pentaxforums is never going
| to sell for a billion dollars like Reddit, and that's not the
| kind of community that Reddit is trying to build.
|
| The highly-chaotic multithreaded model of Reddit/HN/etc is
| directly _designed_ to be impenetrable and chaotic, where
| everyone is just responding to everyone rather than having a
| "flow of conversation" in which everyone is involved. The
| "everyone responding to everyone" is literally engagement, and
| that's what those sites/companies want to drive, not community-
| building. It's _designed_ to suck as a medium for serious
| discourse, because making 27 slight variations on the same
| response to 27 different comments keeps you on the site.
|
| As long as we persist in having engagement be the primary
| metric, that's what you will get, and as long as we persist in
| the idea that the objective of social media needs to be making
| a couple people into billionaires, engagement is going to be
| the focus.
|
| And again, it's a fundamental shift in "who the customers are".
| Are the customers the people using the site, who want a great
| place to discuss the nuances of parrot taxonomy, or are your
| customers the advertisers? Those lead to different ways you
| build the community.
|
| And you can still make a six-figure income being a webmaster of
| a smaller community too. You're just not going to make Reddit
| money off Pentaxforums.
| closewith wrote:
| > The highly-chaotic multithreaded model of Reddit/HN/etc is
| directly designed to be impenetrable and chaotic, where
| everyone is just responding to everyone rather than having a
| "flow of conversation" in which everyone is involved.
|
| I couldn't disagree more with this characterisation. Part of
| the reason that sites like Reddit and HN are preferred to
| traditional fora (which have their own engagement mechanisms)
| is because it's possible to have a different set of
| discussions on a topic.
|
| Single-threaded fora result in many conversations on various
| sub-topics and of varying quality being multiplexed in a
| single chaotic and incomprehensible comment chain. Comment
| trees allow sub-topics and sub-conversions to be grouped in a
| reasonable fashion, and voting allows junk contributions to
| be pushed to the bottom.
|
| It's not just webscale entrepreneurs - users love comment
| trees. It's part of why sites like Reddit and HN succeeded in
| gaining traction, and why subreddits are now the de facto
| replacement for fora (unfortunately, as it centralises
| editorial power).
| ori_b wrote:
| It's something borrowed directly from email, at least in
| traditional clients, and it works.
| Macha wrote:
| This wasn't always through - the first web forums were
| threaded more often than not, and even vbulletin supported
| a threaded mode as a user preference well into the 00s
| (though as it was no longer the default, others were not
| using linked replies, which hurt its usefulness at that
| point). This is probably related to imitating the way
| mailing lists worked, as some of these early UIs were like
| the older style of mailing list presentation with a few
| more forms attached.
|
| The flat forums were considered an innovation for a while
| because "normal users will never understand this nerdy
| threading model". Arguably to some extent they were right,
| as mass market products like youtube, facebook, etc. still
| limit to one level of replies.
|
| The real innovation of the social news sites was the voting
| and scoring algorithms, which made it manageable by
| presenting users with the most popular subthreads first,
| rather than the chronological order the forums had used.
| And gaining those points had a kind of skinner box effect
| on keeping users hooked on the sites, which helped their
| growth too - especially when points used to have a much
| more prominent display.
| mjr00 wrote:
| > I couldn't disagree more with this characterisation. Part
| of the reason that sites like Reddit and HN are preferred
| to traditional fora (which have their own engagement
| mechanisms) is because it's possible to have a different
| set of discussions on a topic.
|
| Preferred by whom, and in what contexts?
|
| The Reddit/HN threaded comment styles work well in
| scenarios that are relatively high-traffic, ephemeral, and
| mostly anonymous, in the sense that you generally don't
| notice or care about the username of people with whom
| you're having a conversation. You're right that it makes it
| easier to comprehend because you can just read from the
| parent downwards to get the full context, and that's rarely
| going to get to double digits.
|
| But this has a lot of drawbacks. It's a method of
| communication that's built for a burst of posts on a topic
| for a day or so, then effectively archived as people move
| onto the next topic.
|
| This doesn't mean that users prefer this method of
| communication universally, though. Even though forums are
| dying, Discord continues to grow and offer smaller
| communities that actually _have_ a "community" aspect to
| them. Depends on the server, but I pretty much never see
| extensive use of threads in Discord, if at all; most people
| are happy to have a constant flow of conversation in one of
| the channels. It's closer to how people have conversations
| in person.
| GTP wrote:
| You're right that requiring a small fee to register can solve
| the problem for small communities (and so OP is wrong in the
| paragraphs that you quoted), but still this solution doesn't
| work for OP's case as this requires login and he says that he
| doesn't want to know who is using his search engine (I guess
| for privacy reasons) and in this case indeed the bots can
| harm the proliferation of this kind of small scale services.
| Tehdasi wrote:
| > And they value engagement over content quality, which is the
| entire reason comment-tree/vote-based systems have been pushed
| heavily over web-1.0 threaded forum discussions as well.
|
| While they certainly do value engagement over quality, I
| suspect that the systems are put in place because they don't
| scale in terms of manpower, and they don't trust their users to
| do formalize the structure of the site.
| Nextgrid wrote:
| There are (at least) 2 kinds of spam - "technical" spam such as
| bots hammering the web service with requests and consuming
| resources, and the commonly-accepted definition of spam where
| bots post promotional or other obnoxious content.
|
| I feel like the article here talks more about the first kind. I
| do agree with your solution for the second kind of spam though.
| remram wrote:
| > bots hammering the web service with requests and consuming
| resources
|
| I've never seen this referred as "spam". Denial of service,
| botting, scraping, sure, but does anyone call that spam?
| Nextgrid wrote:
| The author seems to be referring to this as "spam" so I've
| reused their definition. In general I agree, what the
| author is experiencing is more akin to a DoS than a spam
| attack.
| tpoacher wrote:
| It's spam from a server owner's point of view in the
| broader sense, in that it is "junk requests" instead of
| legitimate requests, they can be sent as a flood at no cost
| or consequence to the senders, and it's up to you as the
| recipient to find a way to filter it all to separate the
| wheat from the chaff.
|
| It's certainly not denial of service, that means something
| far more specific.
|
| One _could_ call it "scraping", but I'd argue the meaning
| / emphasis is different (it'd be like describing trolling
| as 'typing').
|
| And "botting" is not a word. :)
| dtgriscom wrote:
| > And "botting" is not a word. :)
|
| It is now! OED, here we come.
| bambax wrote:
| But supposing it's not purely malicious, what's the
| benefit to the spammer?
| seydor wrote:
| Alternatively, allow $0 signups but approve every new account.
| It's rather easy to spot spam signups
| gostsamo wrote:
| A bit below the author talks how creating any form of account
| is going counter to their goal of free and anonymous, so they
| are looking only for a behavioral sift of bots.
| tlholaday wrote:
| Anonymous proof of stake?
| jart wrote:
| > it costs 10 bux to register an account, if you're a nuisance
| your account is banned and you pay 10bux to get back on.
|
| You've highlighted its biggest tradeoff which is that it
| creates an economic incentive to ban people. The only way to
| make more money, is to have more rules and culture for
| ostracizing people. It would have been smarter of Something
| Awful (since that's the site we're talking about) to charge
| $4/month or something.
| bluedino wrote:
| You used to have to really, really try to get banned or even
| probated on SA, now it doesn't take much at all.
| eropple wrote:
| I can't remember the last time I saw a ban on SA that
| wasn't after a string of "stop trashing discussion" probes
| --sometimes in the dozens--or wasn't some flavor of FYAD-
| escapee bigot or death-threat-spewing weirdo. Even
| _extremely_ tedious, thread-killing arguers will often be
| left alone unless a IK is ignored when they say "drop it"
| or whatever, and that's usually just a sixer.
|
| There _are_ people who get really mad that they eat probes
| for, say, misgendering trans people, and I for one would
| like those people to be madder still. And preferably no
| longer on the site.
| coldpie wrote:
| Nah, there are other ways to raise money. The forums have you
| pay for avatar changes (or to change others' avatars! which
| is a fun chunk of the forums culture), there are various
| upgrades like no-ads and unlocking private messaging and
| stuff. And the forums now have a Patreon, too.
| paulmd wrote:
| > It would have been smarter of Something Awful (since that's
| the site we're talking about) to charge $4/month or
| something.
|
| The innovative thing off that model is driving the revenue
| off the misbehavers instead of good citizens. You don't want
| to have the shitters around _even if they are paying $4
| /month_, and you don't want to drive off good-faith users
| even if they're mediocre/hapless/etc. So run the site off the
| backs of the people you don't want to have around.
|
| People don't like paying monthly (this is even true of, say,
| app store revenue today) and if you apply recurring charges
| then when people don't think they're getting enough value
| they'll leave. You have a hard enough time on the user-
| acquisition side, why make it worse on the retention side by
| driving away the users who you're actually trying to keep?
|
| Billing good citizens works in some situations where you have
| some specific value that you provide to them - providing
| sales listings on classifieds boards inside interest-specific
| forums is a good example, since you are providing access to
| interested buyers, which is a value-add, same as ebay taking
| their fee - but just in terms of operating a forum, you
| aren't a big enough value-add that people are going to pay
| Netflix-level subscriptions to the Parrot-Ass Discussion
| Club. You need the users more than they need you at that
| point. But a one-time fee is viewed much differently by
| people. People will pay $5 for an app, they aren't going to
| pay you $5 a month for it though, or at least far fewer.
| echelon wrote:
| I thought this article was referring to the upcoming deluge of
| GPT-3/DALL-E bots that will eventually flood all of online
| discourse. And whatever future models that will be even more
| indistinguishable from people - perhaps even ones that are good
| at "signup flow".
|
| That's going to be way worse for humanity than spiders and
| automated scripts sending too much traffic. This article isn't
| imagining _apocalypse_ creatively enough.
| api wrote:
| We're coming up on the end of open forums and open social
| media. Everything will require intrusive verification.
| Anonymous forums could exist but they'll require something else
| like an anonymous payment, a ton of proof of work on your local
| machine, etc. to filter out crap.
| Avamander wrote:
| We're certainly heading towards a scenario where internet abuse
| (due to poor regulation against it, IMHO, it's digital
| pollution) becomes enough of a nuisance to require increasingly
| intrusive verification.
|
| Though we can all work against that by securing our own systems
| and preventing them from being abused. Used or unused domains
| should have a strict SPF policy, website registration (or
| newsletter signup forms) should have captchas, comments should
| have captchas. Wordpress or other CMS's plugins should be up-
| to-date and so on and on. Work on requiring 3DS everywhere,
| everything in-depth.
|
| That way malicious actors would be limited to the services they
| pay for and that makes their life significantly harder.
| viraptor wrote:
| > If Marginalia Search didn't use Cloudflare, it couldn't serve
| traffic.
|
| Cloudflare is not the only CDN/protection. It's the most popular
| and the most evil one. You have a choice.
| RL_Quine wrote:
| Why do you consider them to be "the most evil"? Their services
| seem to be completely fine in almost every regard, and their
| communication doesn't at all suggest that they might be evil.
| megous wrote:
| Not for the website visitors.
| viraptor wrote:
| They actively (including legally) protect groups which
| coordinate targeted abuse and swatting (basically murder
| attempts)
| https://twitter.com/stealthygeek/status/1485731083534667779
| matkoniecz wrote:
| What alternatives you recommend?
| viraptor wrote:
| It depends on your audience and regions you're most
| interested in. But if you're aiming for the EU, gcore labs
| may be interesting. Akamai is not bad, but a bit enterprisey
| - I don't think they even had an official api the last time I
| used them?
| randunel wrote:
| None of those are free, though.
| viraptor wrote:
| No, but they also don't actively help protect pages
| organising SWATing. It's your choice who to do business
| with.
| Aachen wrote:
| If you're not paying, what's the product they're selling?
| RL_Quine wrote:
| Their paid one when you go over the limits. Not
| everything has to be black and white and reduced down to
| a single, oft repeated catch phrase like that.
| nixcraft wrote:
| I run a popular blog and confirm that spam is a massive issue. I
| am trying to keep the independent web alive with an old-school
| commenting system because it helps readers and myself improve
| outdated posts. My domain is over 20+ years old and attracts all
| sorts of threats, including monthly DDoS and daily spam. Using
| Cloudflare solved all of these problems. Next, you need to add
| firewall rules inside Cloudflare WAF to trigger a captcha for
| /path/to/blog/wp-comments-post.php. That will not get rid of
| human spam tho. For that, you need to use another filtering
| service called Akismet.
| nicbou wrote:
| Same job, same problem. I simply don't allow comments anymore.
|
| This is unfortunate, because they're amazing feedback if you
| write about bureaucracy. People won't take the time to write to
| you about their experience, but they'll leave a comment.
| coldpie wrote:
| I came to the same solution, just disabling comments. There's
| a prompt to email me in the footer, but no one ever has.
| Shame, but that's the world we live in.
| baisq wrote:
| People here like to say that BigCo has ruined the independent
| web and that everything is now siloed and blablabla but the
| truth is that running an independent website fucking sucks in
| many regards
| jimmywetnips wrote:
| Exactly, sorry the current solutions don't live up to ideals,
| but if there's no better then we're already doing the ideal
| thing
| fariszr wrote:
| Hmmm, maybe offload your comments to something else ?
|
| Im thinking of using GitHub issues/discussions as a comment
| system. The website and everything will function normally
| without CloudFlare, but the comments are based on GitHub, which
| will deal with spam and hosting for me.
|
| And i personally think using it is better from centralizing the
| internet view, as you don't increase the absolutely crazy 20%
| that CF controls of the web.
|
| But that obviously doesn't solve the DDOS problem, which should
| be solved with big cloud providers which have included DDOS
| protection.
|
| Another workaround is using IPFS, but for a normal user, he/she
| will need a gateway, and guess who operates on of the biggest
| IPFS gateways ?, Yes CF. That without considering the tradeoffs
| is using IPFS.
|
| I think a static site + external comment provider like Github,
| might help with the attacks and spam without using Cloudflare,
| but I don't have any website close to your blogs size, so its
| all just a predication.
| seydor wrote:
| Unfortunately, well-known platforms with known URIs are
| targetted way more than any custom website. I think if
| wordpress just allowed rewriting all the URLs would reduce spam
| attacks by a lot
| toastal wrote:
| Putting everything 'behind Cloudflare' isn't a panacea. By
| merely living outside the West, I'm getting Geo blocked from
| 'normal' news sites and constantly having to solve hCAPTCHAs to
| solve riddles for some AI algo without compensation. It's such
| a burden and I find myself giving up pretty often. GeoIP
| blocking is what prevented me from getting my voter information
| out of my last domicile. Running everything through Cloudflare
| or similar also contributes to the concept of letting the
| internet be centralized around a few choke points that can hurt
| free speech (both the good and bad kind) and when they go out
| (which did recently) a large swath of the internet comes with
| it.
| jks wrote:
| Does Cloudflare's "Privacy Pass" browser plugin help at all?
| It's advertised as reducing the number of hCaptchas you need
| to solve by a factor of 30, but I rarely see hCaptchas
| anywhere on my connection so I can't really evaluate myself.
| nixcraft wrote:
| I agree with you. But, what solution do you propose for
| independent solo developers or people who wish to run a blog
| instead of using FB, Twitter and co to create content?
| Cloudflare may not be perfect, but it prevented me from
| shutting down my solo operation without putting a massive
| cost burden on me. When the first time DDoS hit, I had to beg
| one of those large cloud companies to reduce bandwidth costs.
| It took them forever to forgive that abuse and price, which
| was not my fault, and I was given a strong warning not to
| repeat such an issue again. There is no easy solution to this
| problem. At least with Cloudflare, people like me can stay
| online, but it does cause a problem for a bad IP reputation.
|
| TL;DR: I won't expose any of my projects or API directly
| these days due to spam, ddos and other abuse.
| lizardactivist wrote:
| End game: everything runs on US-owned services, and all users
| need to identify to be allowed to even raise a finger, so that
| "bad actors" can be kept out.
|
| All while we blame Russia and China, and say that their spambots
| and evil actions forced us to do this.
| 16amxn16 wrote:
| This actually makes sense. Or, at the very least, it wouldn't
| surprise me.
|
| Another end game: some sort of ID is required to use anything
| (that ID being a local phone number, which can be tracked down
| to you).
| djohnston wrote:
| Are you suggesting that the U.S. produces a proportionally
| similar volume of spam traffic as Russia, China, India,
| Vietnam?
| minimalist wrote:
| It is interesting to watch comments about this dance around the
| topic of barriers to entry. It wasn't exactly easy for the
| uninitiated to access various internet fora in the early days and
| with popularity comes the bots born out of the desire to profit
| for little work at the expense of the community garden. The
| recent story about VRchat embracing anticheat DRM is another
| example of this, as its ascending popularity led to more scammers
| [0].
|
| Does this extend to societies as well? One can think of a
| membrane that has selective permeability to ideas but resists
| antisocial actors and concepts. Alexander Bard has talked a lot
| about social membranics (it's a bit hard to search for).
|
| As odious as the web3 charlatanry is, I'm starting to yearn
| anything that raises the transaction costs for the dumbest bots.
| I remember reading something about new ideas with distributed
| moderation at some point--maybe someone can refresh my memory.
|
| [0]: https://news.ycombinator.com/item?id=32232974
| JohnJamesRambo wrote:
| What is the reason behind bots spamming marginalia? What's the
| motivation? What do they gain? I always wonder about these
| things.
| onefuncman wrote:
| I want to run a honeypot for doing more research on bots and
| the economics for them, but I get bogged down quickly in the
| planning stages. I should just start with a vulnerable
| wordpress site or something.
| SyneRyder wrote:
| Just make a site with a Contact page, with a comment form
| that logs the details of every request (IP address,
| timestamp, message content, email provided). You'll get
| plenty of data for research, once the page has been indexed
| into the database the comment form spammers use. For bonus
| points, put the contact form at the bottom of every page of
| your website.
|
| A couple of my toy/project websites accidentally became
| honeypots. Rather than shut down the comment forms, I now
| have those sites generate summary logfiles that I can upload
| daily to AbuseIPDB.
|
| EDIT: Forgot to mention, also log the Referer field and User-
| Agent on each request. Very, very useful information for
| research and detection.
| prox wrote:
| Wordpress is perfect for this. The amount of bots trying to
| get in is insane. Like up to 80 login tries on some days for
| a small potato website.
|
| There are also some vulnerable plugins still out there if you
| actually want them to hack it.
| marginalia_nu wrote:
| Simple answer is I don't know, but it appears to be happening
| to other search engines as well.
|
| My best guess is they're assuming it's backed by google, and
| are attempting to poison its search term suggestions. The
| queries I've been getting are fairly long and highly specific,
| often within e-pharma or online casino or similarly sketchy
| areas.
| Avamander wrote:
| There are multiple reasons - negative SEO, positive SEO,
| malware distribution, paid clicks, advertising and probably
| others I've forgotten at the moment.
| x-complexity wrote:
| > The other alternatives all suck to the extent of my knowledge,
| they're either prohibitively convoluted, or web3 cryptocurrency
| micro-transaction nonsense that while sure it would work, also
| monetizes every single interaction in a way that is more
| dystopian than the actual skull-crushing robot apocalypse.
|
| In the interest of practicality: There's a way to go the web3
| route without being laden with transactions:
|
| - Mint a fixed-cost non-transferrable NFT to an address, with
| ownership limit of 1 per address. - Use SIWE (sign-in with
| Ethereum) to verify ownership of address & therefore NFT. - If
| malicious behaviour is detected, mark the NFT as belonging to a
| malicious actor at the server's end & block the account. -
| Require non-malicious-marked NFTs in order to use the site/app.
|
| At most, the user only had to perform 1 transaction (minting the
| non-transferrable NFT) on any blockchain network where the
| contract resides, & the costs to do so can be made cheaply with
| Layer 2 networks. (Polygon PoS, Arbtirum, Optimism, zkSync 2.0,
| etc)
|
| Can this be done entirely without web3? Yes, but the added
| friction imposed onto malicious actors to generate new addresses
| & mint new non-transferrable NFTs increases the costs for them
| considerably.
|
| > If anyone could go ahead and find a solution to this mess, that
| would be great, because it's absolutely suffocating the internet,
| and it's painful to think about all the wonderful little projects
| that get cancelled or abandoned when faced with the reality of
| having to deal with such an egregiously hostile digital
| ecosystem.
|
| In all honesty, there's no perfect solution, just hard-to-make
| tradeoffs: The prevention of botspam inherently requires tracking
| in some form to resolve said issue, as there's no immediately-
| recognizable stateless solution for botspam tracking. Someone has
| to do the tracking to prevent botspam, which inherently involves
| in state being changed in order to mark an actor as malicious.
| me_again wrote:
| That seems "prohibitively convoluted" to me, if nothing else.
| birracerveza wrote:
| Because you don't have experience with it. There's nothing
| complicated about SIWE, minting an NFT and checking its
| validity, certainly not to describe it "prohibitively
| convoluted" aside from being scared of web3 keywords. Come on
| now.
|
| Not commenting on op's solution's validity or effectiveness,
| just replying to your comment.
| G3rn0ti wrote:
| This whole minting and transaction issueing is done by a
| single (or two) mouse clicks if you use a browser plug-in
| like the MetaMask wallet.
| nneonneo wrote:
| Do you mint one new NFT per site? Then you impose an excessive
| burden on users per new site they visit.
|
| Do you mint one NFT per address? If blocking only applies to
| the one site, a malicious operator can just spam the next site
| using that address - after all, they own tons of addresses and
| have many sites to spam, and they can surely spam for at least
| a bit before getting caught (per site and per address).
|
| If blocking is a public operation that gets you banned
| everywhere, well now one callous server owner can now disable
| your address's ability to access anything.
|
| I fail to see how Web3 NFTs solve any of these problems...
| x-complexity wrote:
| In all honesty, there's limited tooling to resolve the
| botspam issue: There's nothing special about a human that a
| curated algorithm/botfarm can't sufficiently mimic. No
| technology is the panacea to this solution.
|
| The only way out is to increase the costs for bots to such an
| extent that it becomes costly for them to operate. The
| methodology for performing this is still up for debate, but
| tracking of some form will have to exist (be it publicly-
| collated or privately-monitored or some blend of both).
| riscy wrote:
| > Mint a fixed-cost non-transferrable NFT to an address, with
| ownership limit of 1 per address. - Use SIWE (sign-in with
| Ethereum) to verify ownership of address & therefore NFT. - If
| malicious behaviour is detected, mark the NFT as belonging to a
| malicious actor at the server's end & block the account. -
| Require non-malicious-marked NFTs in order to use the site/app.
|
| In the real world that's called a paywall. Just requires users
| have an email address (wallet address) and proof of payment
| (NFT). Web3 contributes nothing but a different lingo for the
| existing concepts, yet has all of the same problems: nobody
| likes paywalls.
| Aachen wrote:
| > There has been upwards of 15 queries per second from bots.
| There is just no way to deal with that sort of traffic, barely
| even to reject it.
|
| If the queries are not a megabit each, you're doing _way_ too
| much processing before applying rate limiting. Rejecting traffic
| ought not to take more than 1-2 milliseconds, even if you need to
| look up an api key or IP address in the database.
|
| I, too, host services on a residential connection: 50 mbps shared
| with other users. My domains must host hundreds of separate
| scripts, a few of which have a database attached (I can think of
| six off the top of my head, but there's over a hundred databases
| in mariadb so I'm sure there's more that I've forgotten about).
| This is a ten-year-old laptop with a regular "apt install
| mariadb", no special configs.
|
| Yes, most traffic is bots, and yes sometimes they submit more
| than 1 q/s. But it comes nowhere near to exhausting resources to
| a noticeable extent. Load average is about 0.15, main peaks come
| from my own cronjobs. If you're having this much trouble
| rejecting traffic, you might want to spend some time looking at
| bottlenecks. You'll also notice the bots knock it off if it's
| unsuccessful.
| marginalia_nu wrote:
| That's queries per second (as searches), not requests per
| second.
|
| You know how blogs sometimes can't handle being on the hacker
| news front page from all the traffic? Well my search engine has
| survived that. That was 1-2 QPS.
|
| Unmitigated bot-traffic is roughly 10x the traffic of a hacker
| news death hug, as a sustained load.
| danrl wrote:
| Off topic: Great to see more Gemini/Web dual hosted sites.
| hamilyon2 wrote:
| I experienced this firsthand with government immigration
| websites. The thing is there are only so many time slots and and
| people are forsed to use a certain web site to apply, so everyone
| is hunting for available time and generally none are available.
|
| So, some creative people set up bots which check periodically for
| them. They are paid services which will do that for you. Now we
| have bots hammering gatekeeper's website. Perhaps hundreds of
| bots.
|
| Which results in the website is being unavailable, serving a
| serious qps to bots. I think it is only a matter of time before
| someone will write bots that will apply to application bots
| hoping that more entries with their information will provide them
| with better probability of success.
|
| This is so dystopian and cruel to the average person, and I don't
| think there is a good solution besides a deep anti-bot expertise
| whithin the primary website development team
| BoxOfRain wrote:
| This is pretty much the only way you can book a driving test in
| the UK at the moment unless you want to take your test in a
| random place in the Highlands.
| TomK32 wrote:
| I faced a website like this recently when booking a slot at my
| own German embassy, went a different route around the embassy
| instead. What I don't like about the slot system: You won't get
| a convenient time slot anyways so why do they bother setting it
| up like this in the first place? Why not just register with
| your contact details and receive an email with a guaranteed
| spot at a selection of three days instead. No more need to
| reload and no need for bots. The Upper Austrian government did
| this for covid vaccinations in the early phase and it worked
| very well that way (early, high vaccination rate amongst the
| elderly and certain professions).
| luckylion wrote:
| That reminds me of the chaos that ensued in my state in the
| early days of Covid-19 vaccination when they were still
| having centralized systems where the elderly could book an
| appointment. Of course, they had way more demand than supply
| but still insisted on First Come, First Served, so you ended
| up with every member of the extended family being asked to
| try and book a slot, quickly overwhelming their booking
| systems.
|
| At least you didn't need to worry about it all day: there
| wasn't a chance in hell to get something 3 minutes after the
| booking system opened each day.
|
| Friends described how stressful it was for them, their
| parents being completely helpless and essentially fearing
| that their health depended on getting one of those elusive
| appointments, and being devastated each day they didn't
| succeed.
| rightbyte wrote:
| In some miracle of competence my district alotted shots by
| decreasing age limit so that it were no "shortage" or
| lagfest for those eligible in the booking system.
| luckylion wrote:
| We have all the data about everyone in the registers, but
| god forbid we use them to lower friction in case of
| emergencies! Good to hear that some got it right.
| probably_wrong wrote:
| > _I don 't think there is a good solution besides a deep anti-
| bot expertise whithin the primary website development team_
|
| But there is a solution: the website team should get their act
| together and remove the "first come first served" aspect
| altogether.
|
| Do you, citizen, want to register? Cool - leave your e-mail and
| we'll call you. Is the service optional? Then we'll pick at
| random from the pool of applicants and e-mail them. Is the
| service mandatory? Then sign up and we'll call you once you
| reach the top of the queue. Add a quick ID/credit card/whatever
| check on top (like good concert venues do), regular e-mail
| updates to let people know they haven't been forgotten, and
| you're done.
|
| Any second year CS student could write such a system. The
| difficult part is accepting that the current approach doesn't
| work and looking for alternatives.
| jaclaz wrote:
| >Do you, citizen, want to register? Cool - leave your e-mail
| and we'll call you. Is the service optional? Then we'll pick
| at random from the pool of applicants and e-mail them. Is the
| service mandatory? Then sign up and we'll call you once you
| reach the top of the queue.
|
| Wait a minute, wouldn't that be more work for "us"?
|
| Let me think ....
| jaredsohn wrote:
| >Then we'll pick at random from the pool of applicants and
| e-mail them
|
| This is where having your own email domain with unlimited
| accounts is useful.
| probably_wrong wrote:
| The thought did cross my mind, which is why I'd also ask
| for an ID or equivalent. If you don't show up with that
| specific ID on the day of your appointment, you lose the
| appointment. And you can use that ID number to deduplicate
| requests.
|
| If you don't do that, I 100% agree with you - scalpers
| could then register with hundreds of accounts for
| reselling, and we would be back where we started.
| MauranKilom wrote:
| "Thank you for waiting three weeks for your appointment
| selection. We are happy to offer you a time slot next Friday,
| from 1 pm to 1:15 pm. Click here to accept: [button]. If this
| does not suit you, click here to get sent back to the queue:
| [button]."
|
| Half the point of these services tends to be giving users
| some choice in when they have to show up somewhere. Because
| not everyone can make time in the middle of business hours of
| an arbitrary day. Not to mention that you might simply be out
| of town.
| mobiclick wrote:
| Applicants specify their preferred time slots in decending
| order and the government agency chooses the time. This is
| how the my country does it.
| nicbou wrote:
| Berlin?
| TheCapeGreek wrote:
| It gets more fun when there are arbitrary restrictions on
| process. Fun anecdote: South Africa's Home Affairs website,
| where you make bookings for passports, only lets you book
| appointment dates 2 weeks ahead normally. It's effectively
| permanently booked even without bots this way.
|
| Luckily, if you're technically inclined, editing the value of
| the input element via dev tools is accepted by the form.
| lordgrenville wrote:
| It's not filtered out on the back end?
| rightbyte wrote:
| Heh, I've done the same thing to calculate my tax.
|
| The tax agency had grayed out the "calc tax" button until the
| declaration period started, but you could just enable it with
| the enable flag.
|
| Did nothing but read querry their servers though.
___________________________________________________________________
(page generated 2022-08-04 23:02 UTC)