hngopher.com

       [HN Gopher] Using Cloudflare on your website could be blocking R...
       ___________________________________________________________________
        
       Using Cloudflare on your website could be blocking RSS users
        
       Author : campuscodi
       Score  : 480 points
       Date   : 2024-10-16 22:46 UTC (1 days ago)
        
 (HTM) web link (openrss.org)
 (TXT) w3m dump (openrss.org)
        
       | kevincox wrote:
       | I dislike advice of whitelisting specific readers by user-agent.
       | Not only is this endless manual work that will only solve the
       | problem for a subset of users but it also is easy to bypass by
       | malicious actors. My recommendation would be to create a page
       | rule that disables bot blocking for your feeds. This will fix the
       | problem for all readers with no ongoing maintenance.
       | 
       | If you are worried about DoS attacks that may hammer on your
       | feeds then you can use the same configuration rule to ignore the
       | query string for cache keys (if your feed doesn't use query
       | strings) and overriding the caching settings if your server
       | doesn't set the proper headers. This way Cloudflare will cache
       | your feed and you can serve any number of visitors without
       | putting load onto your origin.
       | 
       | As for Cloudflare fixing the defaults, it seems unlikely to
       | happen. It has been broken for years, Cloudflare's own blog is
       | affected. They have been "actively working" on fixing it for at
       | least 2 years according to their VP of product:
       | https://news.ycombinator.com/item?id=33675847
        
         | vaylian wrote:
         | I don't know if cloudflare offers it, but whitelisting the URL
         | of the RSS feed would be much more effective than filtering
         | user agents.
        
           | derkades wrote:
           | Yes it supports it, and I think that's what the parent
           | comment was all about
        
             | BiteCode_dev wrote:
             | Specifically, whitelisting the URL for the bot protection,
             | but not the cache, so that you are still somewhat protected
             | against adversarial use.
        
               | londons_explore wrote:
               | An adversary can easily send no-cache headers to bust the
               | cache.
        
               | acdha wrote:
               | The CDN can choose whether to honor those. That hasn't
               | been an effective adversarial technique since the turn of
               | the century.
        
               | londons_explore wrote:
               | does cloudflare give such an option? Even for non-paid
               | accounts?
        
           | jks wrote:
           | Yes, you can do it with a "page rule", which the parent
           | comment mentioned. The CloudFlare free tier has a budget of
           | three page rules, which might mean that you have to bundle
           | all your rss feeds in one folder so they share a path prefix.
        
         | a-french-anon wrote:
         | And for those of us using sfeed, the default UA is Curl's.
        
         | benregenspan wrote:
         | AI crawlers have changed the picture significantly and in my
         | opinion are a much bigger threat to the open web than
         | Cloudflare. The training arms race has drastically increased
         | bot traffic, and the value proposition behind that bot traffic
         | has inverted. Previously many site operators could rely on the
         | average automated request being net-beneficial to the site and
         | its users (outside of scattered, time-limited DDoS attacks) but
         | now most of these requests represent value extraction. Combine
         | this with a seemingly related increase in high-volume bots that
         | don't respect robots.txt and don't set a useful User-Agent, and
         | using a heavy-handed firewall becomes a much easier business
         | decision, even if it may target some desirable traffic (like
         | valid RSS requests).
        
       | veeti wrote:
       | I believe that disabling "Bot Fight Mode" is not enough, you may
       | also need to create a rule to disable "Browser Integrity Check".
        
       | belkinpower wrote:
       | I maintain an RSS reader for work and Cloudflare is the bane of
       | my existence. Tons of feeds will stop working at random and
       | there's nothing we can do about it except for individually
       | contacting website owners and asking them to add an exception for
       | their feed URL.
        
         | sammy2255 wrote:
         | Unfortunately its not really Cloudflare but webadmins who have
         | configured it to block everything thats not a browser, whether
         | unknowingly or not
        
           | afandian wrote:
           | If Cloudflare offer a product, for a particular purpose, that
           | breaks existing conventions of that purpose, then it's
           | Cloudflare.
        
             | sammy2255 wrote:
             | Not really. You wouldn't complain to a fence company for
             | blocking a path if there were hired to do exactly that
        
               | gsich wrote:
               | They are enablers. They get part of the blame.
        
               | shakna wrote:
               | Yes, I would. Experts are expected to relay back to their
               | client with their thoughts on a matter, not just blindly
               | do as they're told. Your builder is meant to do their due
               | diligence, which includes making recommendations.
        
             | echoangle wrote:
             | Well it doesn't break the conventions of the purpose they
             | offer it for. Cloudflare attempts to block non-human users,
             | and this is supposed to be used for human-readable
             | websites. If someone puts cloudflare in front of a RSS
             | feed, that's user error. It's like someone putting a
             | captcha in front of an API and then complaining that the
             | Captcha provider is breaking conventions.
        
           | nirvdrum wrote:
           | I contend this wasn't an issue prior to Cloudflare making
           | that an option. Sure, some IDS would block some users and geo
           | blocks have been around forever. But, Cloudflare is so
           | prolific and makes it so easy to block things inadvertently,
           | that I don't think they get a pass and blame the downstream
           | user.
           | 
           | It's particularly frustrating that they give their own WARP
           | service a pass. I've run into many sites that will block VPN
           | traffic, including iCloud Privacy Relay, but WARP traffic
           | goes through just fine.
        
         | stanislavb wrote:
         | I was recently contacted by one of my website users as their
         | RSS reader was blocked by Cloudflare.
        
       | amatecha wrote:
       | I get blocked from websites with some regularity, running Firefox
       | with strict privacy settings, "resist fingerprinting" etc. on
       | OpenBSD. They just give a 403 Forbidden with no explanation, but
       | it's only ever on sites fronted by CloudFlare. Good times. Seems
       | legit.
        
         | BiteCode_dev wrote:
         | Cloudflare is a fantastic service with an unmatched value
         | proposition, but it's unfortunately slowly killing web privacy,
         | with 1000s paper cuts.
         | 
         | Another problem is "resist fingerprinting" prevents some canvas
         | processing, and many websites like bluesky, linked in or
         | substack uses canvas to handle image upload, so your images
         | appear to be stripes of pixel.
         | 
         | Then you have mobile apps that just don't run if you don't have
         | a google account, like chatgpt's native app.
         | 
         | I understand why people give up, trying to fight for your
         | privacy is an uphill battle with no end in sight.
        
           | madeofpalk wrote:
           | > Then you have mobile apps that just don't run if you don't
           | have a google account, like chatgpt's native app.
           | 
           | Is that true? At least on iOS you can log into the ChatGPT
           | with same email/password as the website.
           | 
           | I never use Google login for stuff and ChatGPT works fine for
           | me.
        
             | BiteCode_dev wrote:
             | See other comment.
        
           | KomoD wrote:
           | > Then you have mobile apps that just don't run if you don't
           | have a google account, like chatgpt's native app.
           | 
           | That's not true, I use ChatGPT's app on my phone without
           | logging into a Google account.
           | 
           | You don't even need any kind of account at all to use it.
        
             | BiteCode_dev wrote:
             | On Android at least, even if you don't need to log in to
             | your google account when connecting to chatgpt, the app
             | won't work if your phone isn't signed in into google play,
             | which doesn't work if your phone isn't linked to a google
             | account.
             | 
             | An android phone asks you to link a google account when you
             | use it for the first time. It takes a very dedicated user
             | to refuse that, then to avoid logging in into the gmail,
             | youtube or app store apps which will all also link your
             | phone to your google account when you sign in.
             | 
             | But I do actively avoid this, I use Aurora, F-droid, K9 and
             | NewPipeX, so no link to google.
             | 
             | But then no ChatGPT app. When I start it, I get hit with a
             | logging page to the app store and it's game over.
        
               | acdha wrote:
               | So the requirement is to pass the phone's system
               | validation process rather than having a Google account. I
               | don't love that but I can understand why they don't want
               | to pay the bill for the otherwise ubiquitous bots, and
               | it's why it's an Android-specific issue.
        
               | BiteCode_dev wrote:
               | You can make a very rational case for each privacy
               | invasive technical decision ever made.
               | 
               | In the end, the fact remain: no chatgpt app without
               | giving up your privacy, to google none the less.
        
               | acdha wrote:
               | "Giving up your privacy" is a pretty sweeping claim - it
               | sounds like you're saying that Android inherently leaks
               | private data to Google, which is broader than even Apple
               | fans tend to say.
        
               | michaelt wrote:
               | A person who was maximally distrustful of Google would
               | assume they link your phone and your IP through the
               | connection used to receive push notifications, and the
               | wifi-network-visibility-to-location API, and the software
               | update checker, and the DNS over HTTPS, and suchlike. As
               | a US company, they could even be forced to do this in
               | secret against their will, and lie about it.
               | 
               | Of course as Google doesn't _claim_ they do this, many
               | people would consider it unreasonably fearful /cynical.
        
               | acdha wrote:
               | Sure, but that says you shouldn't have a phone, not that
               | ChatGPT is forcing you to give up your privacy.
        
               | BiteCode_dev wrote:
               | Google and Apple were both part of the PRISM program, of
               | course I'm making this claim.
               | 
               | That's the opposite stance that would be bonkers.
        
               | acdha wrote:
               | PRISM covered communications through U.S. company's
               | servers. It was not a magic back door giving them access
               | to your device's local data, and even if you did believe
               | that it was the answer would be not using a phone. A
               | major intelligence agency does not need you to have a
               | Google account so they can spy on you.
        
               | ForHackernews wrote:
               | > it sounds like you're saying that Android inherently
               | leaks private data to Google, which is broader than even
               | Apple fans tend to say.
               | 
               | Yes? I mean, not "leaks" - it's designed to upload your
               | private data to Google and others.
               | 
               | https://www.tcd.ie/news_events/articles/study-reveals-
               | scale-...
               | 
               | > Even when minimally configured and the handset is idle,
               | with the notable exception of e/OS, these vendor-
               | customised Android variants transmit substantial amounts
               | of information to the OS developer and to third parties
               | such as Google, Microsoft, LinkedIn, and Facebook that
               | have pre-installed system apps. There is no opt-out from
               | this data collection.
        
               | ForHackernews wrote:
               | You might like: https://e.foundation/e-os/
        
               | BiteCode_dev wrote:
               | That won't make chatgpt's app work thought.
        
               | ForHackernews wrote:
               | It might well do, depending on what ChatGPT's app is
               | asking the OS for. /e/OS is an Android fork that removes
               | Google services and replaces them with open source
               | stubs/re-implementations from https://microg.org/
               | 
               | I haven't tried the ChatGPT app, but I know that, for
               | example my bank and other financial services apps work
               | with on-device fingerprint authentication and no Google
               | account on /e/OS.
        
               | __MatrixMan__ wrote:
               | I have a similar experience with the pager duty app. It
               | loads up and then exits with "security problem detected
               | by app" because I've made it more secure by isolating it
               | from Google (a competitor). Workaround is to just control
               | it via slack instead.
        
               | BiteCode_dev wrote:
               | Well you can use the web base chagpt so there is a
               | workaround. Except it's worse a worse experience.
        
           | pjc50 wrote:
           | The privacy battle _has_ to be at the legal layer. GDPR is
           | far from perfect (bureaucratic and unclear with weak
           | enforcement), but it 's a step in the right direction.
           | 
           | In an adversarial environment, especially with both AI
           | scrapers and AI posters, websites have to be able to identify
           | and ban persistent abusers. Which unfortunately implies
           | having some kind of identification of _everybody_.
        
             | BiteCode_dev wrote:
             | That's another problem, we want cheap easy solutions like
             | tracking people, instead of more targetteed or systemic
             | ones.
        
             | nonameiguess wrote:
             | No, it's more than that. Cloudflare's bot protection has
             | blocked me from sites where I have a paid account, paid for
             | by my real checking account with my real name attached.
             | Even when I am perfectly willing to give out my identity
             | and be tracked, I still can't because I can't even get to
             | the login page.
        
               | HappMacDonald wrote:
               | They block such visits because their pragma suspects that
               | your visit is the account of a real human that was hacked
               | by a bot.
        
             | wbl wrote:
             | You notice that Analogue Devices puts their (incredibly
             | useful) information up for free. That's because they make
             | money other ways. Ad supported content farm Internet had a
             | nice run but we will get on without it.
        
             | Gormo wrote:
             | > The privacy battle has to be at the legal layer.
             | 
             | I couldn't disagree more. The way to protect privacy is to
             | make privacy the standard at the implementation layer, and
             | to make it costly and difficult to breach it.
             | 
             | Trying to rely on political institutions without the
             | practical and technical incentives favoring privacy will
             | inevitably result in the political institutions themselves
             | becoming the main instrument that erodes privacy.
        
               | HappMacDonald wrote:
               | Yet without regulation nothing stops large companies from
               | simply changing the implementation layer for one that
               | pads their bottom line better, or just rebuild it from
               | scratch.
               | 
               | If people who valued privacy really controlled the
               | implementation layer we wouldn't have gotten to this
               | point in the first place.
        
               | Gormo wrote:
               | The point we're at is one in which privacy is still
               | attainable via implementation-layer measures, even if it
               | requires investing some effort and making some trade-offs
               | to sustain. The alternative -- placing trust in
               | regulation, which _never_ works in the long run -- will
               | inevitably result in regulatory capture that eliminates
               | those remaining practical measures and replaces them
               | with, at best, a performative illusion.
        
         | viraptor wrote:
         | I know it's not a solution for you specifically here, but if
         | anyone has access to the CF enterprise plan, they can report
         | specific traffic as non-bot and hopefully improve the
         | situation. They need to have access to the "Bot Management"
         | feature though. It's a shitty situation, but some of us here
         | _can_ push back a little bit - so do it if you can.
         | 
         | And yes, it's sad that the "make internet work again" is behind
         | an expensive paywall..
        
           | meeby wrote:
           | The issue here is that RSS readers _are_ bots. Obviously
           | perfectly sensible and useful bots, but they're not "real
           | people using a browser". I doubt you could get RSS readers
           | listed on Cloudflare's "good bots" list either which would
           | allow them the default bot protection feature given they'll
           | all run off random residential IPs.
        
             | j16sdiz wrote:
             | They can't whitelist useragent, otherwise bot will pass
             | just using agent spoofing.
             | 
             | If you have enterprise plan, you can have custom rules
             | including allowing by url
        
             | sam345 wrote:
             | Not sure if I get this.It seems to me an RSS reader is as
             | much of a bot as a browser is for HTML. It just reads RSS
             | rather than HTML.
        
               | kccqzy wrote:
               | The difference is that RSS readers usually do background
               | fetches on their own rather than waiting for a human to
               | navigate to a page. So in theory, you could just set up a
               | crontab (or systemd timer) that simply xdg-open various
               | pages on a schedule and not be treated as bots.
        
             | viraptor wrote:
             | I was responding to a person with Firefox issues, not RSS.
             | 
             | I'm not sure either if RSS bots could be added to good
             | bots, but if anyone has traffic from them, we can
             | definitely try. (No high hopes though, given the responses
             | I got from support so far)
        
         | Jazgot wrote:
         | My rss reader was blocked on kvraudio.com by cloudflare. This
         | issue wasn't solved for months. I simply stopped reading
         | anything on kvraudio. Thank you cloudflare!
        
         | wakeupcall wrote:
         | Also running FF with strict privacy settings and several
         | blockers. The annoyances are constantly increasing. Cloudflare,
         | captchas, "we think you're a bot", constantly recurring cookie
         | popups and absurd requirements are making me hate most of the
         | websites and services I hit nowdays.
         | 
         | I tried for a long time to get around it, but now when I hit a
         | website like this just close the tab and don't bother anymore.
        
           | afh1 wrote:
           | Same, but for VPN (either corporate or personal). Reddit
           | blocks it completely, requires you to sign-in but even the
           | sign-in page is "network restricted"; LinkedIn shows you a
           | captcha but gives an error when submitting the result
           | (several reports online); and overall a lot of 403's. All go
           | magically away when turning off the VPN. Companies, specially
           | adtechs like Reddit and LinkedIn, do NOT want you to browse
           | privately, to the point they rather you don't use their
           | website at all unless without a condom.
        
             | appendix-rock wrote:
             | I don't follow the logic here. There seems to be an
             | implication of ulterior motive but I'm not seeing what it
             | is. What aspect of 'privacy' offered by a VPN do you think
             | that Reddit / LinkedIn are incentivised to bypass? From a
             | privacy POV, your VPN is doing nothing to them, because
             | your IP address means very little to them from a tracking
             | POV. This is just FUD perpetuated by VPN advertising.
             | 
             | However, the undeniable reality is that accessing the
             | website with a non-residential IP is a very, very strong
             | indicator of sinister behaviour. Anyone that's been in a
             | position to operate one of these services will tell you
             | that. For every...let's call them 'privacy-conscious' user,
             | there are 10 (or more) nefarious actors that present
             | largely the same way. It's easy to forget this as a user.
             | 
             | I'm all but certain that if Reddit or LinkedIn could
             | differentiate, they would. But they can't. That's kinda the
             | whole point.
        
               | bo1024 wrote:
               | Not following what could be sinister about a GET request
               | to a public website.
               | 
               | > From a privacy POV, your VPN is doing nothing to them,
               | because your IP address means very little to them from a
               | tracking POV.
               | 
               | I disagree. (1) Since I have javascript disabled, IP
               | address is generally their next best thing to go on. (2)
               | I don't want to give them IP address to correlate with
               | the other data they have on me, because if they sell that
               | data, now someone else who only has my IP address
               | suddenly can get a bunch of other stuff with it too.
        
               | zahllos wrote:
               | SQL injection?
               | 
               | Get parameters can be abused like any parameter. This
               | could be sql, could be directory traversal attempts,
               | brute force username attempts, you name it.
        
               | kam wrote:
               | If your site is vulnerable to SQL injection, you need to
               | fix that, not pretend Cloudflare will save you.
        
               | hombre_fatal wrote:
               | At the very least, they're wasting bandwidth to a
               | (likely) low quality connection.
               | 
               | But anyone making malicious POST requests, like spamming
               | chatGPT comments, first makes GET requests to load the
               | submission and find comments to reply to. If they think
               | you're a low quality user, I don't see why they'd bother
               | just locking down POSTs.
        
               | afh1 wrote:
               | IP address is a fingerprint to be shared with third
               | parties, of course it's relevant. It's not ulterior
               | motive, it's explicit, it's not caring about your traffic
               | because you're not good product. They can and do
               | differentiate by requiring a sign-in. They just don't
               | care enough to make it actually work. Because they are
               | adtechs and not interested in you as a user.
        
               | homebrewer wrote:
               | It's equally easy to forget about users from countries
               | with way less freedom of speech and information sharing
               | than in Western rich societies. These anti-abuse measures
               | have made it much more difficult to access information
               | blocked by my internet provider during the last few
               | years. I'm relatively competent and can find ways around
               | it, but my friends and relatives who pursue other career
               | choices simply don't bother anymore.
               | 
               | Telegram channels have been a good alternative, but even
               | that is going downhill thanks to French authorities.
               | 
               | Cloudflare and Google also often treat us like bots
               | (endless captchas, etc) which makes it even more
               | difficult.
        
               | miki123211 wrote:
               | > For every...let's call them 'privacy-conscious' user,
               | there are 10 (or more) nefarious actors that present
               | largely the same way.
               | 
               | And each one of these could potentially create thousands
               | of accounts, and do 100x as many requests as a normal
               | user would.
               | 
               | Even if only 1% of the people using your service are
               | fraudsters, a normal user has at most a few accounts,
               | while fraudsters may try to create thousands per day.
               | This means that e.g. 90% of your signups are fraudulent,
               | despite the population of fraudsters being extremely
               | small.
        
               | ruszki wrote:
               | Was anybody stopped to do nefarious actions by these
               | annoyances?
               | 
               | It's like at my current and previous companies. They make
               | a lot of security restrictions. The problem is, if
               | somebody wants to get data out, they can get out anytime
               | (or in). Security department says that it's against
               | "accidental" leaks. I'm still waiting a single instance
               | when they caught an "accidental" leak, and they are just
               | not introducing extra steps, when at the end I achieve
               | the exact same thing. Even when I caused a real potential
               | leak, nobody stopped me to do it. The only reason why
               | they have these security services/apps is to push
               | responsibility to other companies.
        
             | acdha wrote:
             | > Companies, specially adtechs like Reddit and LinkedIn, do
             | NOT want you to browse privately, to the point they rather
             | you don't use their website at all unless without a condom.
             | 
             | That's true in some cases, I'm sure, but also remember that
             | most site owners deal with lots of tedious abuse. For
             | example, some people get really annoyed about Tor being
             | blocked but for most sites Tor is a tiny fraction of total
             | traffic but a fairly large percentage of the abuse probing
             | for vulnerabilities, guessing passwords, spamming contact
             | forms, etc. so while I sympathize for the legitimate users
             | I also completely understand why a busy site operator is
             | going to flip a switch making their log noise go down by a
             | double-digit percentage.
        
               | rolph wrote:
               | funny thing, when FF is blocked i can get through with
               | TOR.
        
             | anthk wrote:
             | For Reddit I just use it r/o under gopher://gopherddit.com
             | 
             | A good client it's either Lagrange (multiplatform), the old
             | Lynx or Dillo with the Gopher plugin.
        
             | Adachi91 wrote:
             | > Reddit blocks it completely, requires you to sign-in but
             | even the sign-in page is "network restricted";
             | 
             | I've been creating accounts every time I need to visit
             | Reddit now to read a thread about [insert subject]. They do
             | not validate E-Mail, so I just use `example@example.com`,
             | whatever random username it suggests, and `example` as a
             | password. I've created at least a thousand accounts at this
             | point.
             | 
             | Malicious Compliance, until they disable this last effort
             | at accessing their content.
        
               | hombre_fatal wrote:
               | Most subreddits worth posting on usually have a minimum
               | account age + minimum account karma. I've found it
               | annoying to register new accounts too often.
        
               | zargon wrote:
               | They verify signup emails now. At least for me.
        
               | immibis wrote:
               | I've created a few thousand accounts through a VPN
               | (random node per account). After doing that, I found out
               | Reddit accounts created through VPNs are automatically
               | shadow banned the second time they comment (I think the
               | first is also shadow deleted in some way). But they allow
               | you to browse from a shadow banned account just fine.
        
           | lioeters wrote:
           | Same here. I occasionally encounter websites that won't work
           | with ad blockers, sometimes with Cloudflare involved, and I
           | don't even bother with those sites anymore. Same with sites
           | that display a cookie "consent" form without an option to not
           | accept. I reject the entire site.
           | 
           | Site owners probably don't even see these bounced visits, and
           | it's such a tiny percentage of visitors who do this that it
           | won't make a difference. Meh, it's just another annoyance to
           | be able to use the web on our own terms.
        
             | capitainenemo wrote:
             | It's a tiny percentage of visitors, but a tech savvy one,
             | and depending on your website, they could be a higher than
             | average percentage of useful users or product purchasers.
             | The impact could be disproportionate. What's frustrating is
             | many websites don't even realise it is happening because
             | the reporting from the intermediate (Cloudflare say) is
             | inaccurate or incorrectly represents how it works.
             | Fingerprinting has become integral to bot "protection".
             | It's also frustrating when people think this can be drop
             | in, and put it in front of APIs that are completely
             | incapable of handling the challenge with no special casing
             | (encountered on FedEx, GoFundMe), much like the RSS reader
             | problem.
        
           | orbisvicis wrote:
           | I have to solve captchas for Amazon while logged into my
           | Amazon account.
        
             | tenken wrote:
             | Why?! ... I've had 404 pages on Amazon, but never a
             | captcha...
        
             | m463 wrote:
             | at one point I couldn't access amazon at night.
             | 
             | I would get different captcha, one convoluted that wouldn't
             | even load the required images.
             | 
             | And I would get the oops sorry dog page for _everything_.
             | 
             | I finally contacted amazon, gave them my (static) ip
             | address and it was good.
             | 
             | In other locations, I have to solve a 6-distorted-letter
             | captcha to log in, but that's the extent of it.
        
           | anilakar wrote:
           | Heck, I cannot even pass ReCAPTCHA nowadays. No amount of
           | clicking buses, bicycles, motorcycles, traffic lights,
           | stairs, crosswalks, bridges and fire hydrants will suffice.
           | The audio transcript feature is the only way to get past a
           | prompt.
        
             | josteink wrote:
             | Just a heads up that this is how Google treat connections
             | it suspects to originate from bots. Silently keeping you in
             | an endless loop promising reward if you can complete it
             | correctly.
             | 
             | I discovered this when I set up IPv6 using hurricane
             | electric as a tunnel broker for IPv6 connectivity.
             | 
             | Seemingly Google has all HEnet IPv6tunnel subnets listed
             | for such behaviour without it being documented anywhere. It
             | was extremely annoying until I figured out what was going
             | on.
        
               | n4r9 wrote:
               | > Silently keeping you in an endless loop promising
               | reward if you can complete it correctly.
               | 
               | Sounds suspiciously like how product managers talk to
               | developers as well.
        
               | anilakar wrote:
               | Sadly my biggest crime is running Firefox with default
               | privacy settings and uBlock Origin installed. No VPNs or
               | IPv6 tunnels, no Tor traffic whatsoever, no Google search
               | history poisoning plugins.
               | 
               | If only there was a law that allowed one to be excluded
               | from automatic behavior profiling...
        
             | marssaxman wrote:
             | There's a pho restaurant near where I work which wants you
             | to scan a QR code at the table, then order and pay through
             | their website instead of talking to a person. In three
             | visits, I have not once managed to get past their captcha!
             | 
             | (The _actual_ process at this restaurant is to sit down,
             | fuss with your phone a bit, then get up like you 're about
             | to leave; someone will arrive promptly to take your order.)
        
               | eddythompson80 wrote:
               | I've only seen that at Asian restaurants near a
               | university in my city. When I asked I was told that this
               | is a common way in China and they get a lot of
               | international students who prefer/expect it that way.
        
           | amanda99 wrote:
           | Yes and the most infuriating thing is the "we need to verify
           | the security of your connection" text.
        
           | JohnFen wrote:
           | > when I hit a website like this just close the tab and don't
           | bother anymore.
           | 
           | Yeah, that's my solution as well. I take those annoyances as
           | the website telling me that they don't want me there, so I
           | grant them their wish.
        
             | immibis wrote:
             | That's fine. You were an obstacle to their revenue
             | gathering anyway.
        
           | SoftTalker wrote:
           | Same. If a site doesn't want me there, fine. There's no
           | website that's so crucial to my life that I will go through
           | those kinds of contortions to access it.
        
           | doctor_radium wrote:
           | Hey, same here! For better or worse, I use Opera Mini for
           | much of my mobile browsing, and it fares far worse than
           | Firefox with uBlock Origin and ResistFingerprinting. I
           | complained about this roughly a year ago on a similar HN
           | thread, on which a Cloudflare rep also participated. Since
           | then something changed, but both sides being black boxes, I
           | can't tell if Cloudflare is wising up or Mini has stepped up.
           | I still get the same challenge pages, but Mini gets through
           | them automatically now, more often than not.
           | 
           | But not always. My most recent stumbling block is
           | https://www.napaonline.com. Guess I'm buying oxygen sensors
           | somewhere else.
        
         | anal_reactor wrote:
         | On my phone Opera Mobile won't be allowed into some websites
         | behind CloudFlare, most importantly 4chan
        
           | dialup_sounds wrote:
           | 4chan's CF config is so janky at this point it's the only
           | site I have to use a VPN for.
        
         | mzajc wrote:
         | I randomize my User-Agent header and many websites outright
         | block me, most often with no captcha and no useless error
         | message.
         | 
         | The most egregious is Microsoft (just about every Microsoft
         | service/page, really), where all you get is a "The request is
         | blocked." and a few pointless identifiers listed at the bottom,
         | purely because it thinks your browser is too old.
         | 
         | CF's captcha page isn't any better either, usually putting me
         | in an endless loop if it doesn't like my User-Agent.
        
           | charrondev wrote:
           | Are you sending an actual random string as your UA or sending
           | one of a set of actual user agents?
           | 
           | You're best off just picking real ones. We've got hit by a
           | botnet sending 10k+ requests from 40 different ASNs with
           | 1000s of different IPs. The only way we're able to
           | identify/block the traffic was excluding user agents matching
           | some regex (for whatever reason they weren't spoofing real
           | user agents but weren't sending actual ones either).
        
             | RALaBarge wrote:
             | I worked at an anti-spam email security company in the
             | aughts, and we had a perl engine that would rip apart the
             | MIME boundaries and measure everything - UA, SMTP client
             | fingerprint headers, even the number of anchor or paragraph
             | tags. A large combination of IF/OR evaluations with a regex
             | engine did a pretty good job since the botnets usually
             | don't bother to fully randomize or really opsec the
             | payloads they are sending since it is a cannon instead of a
             | flyswatter.
        
               | kccqzy wrote:
               | Similar techniques are known in the HTTP world too. There
               | were things like detecting the order of HTTP request
               | headers and matching them to known software, or even just
               | comparing the actual content of the Accept header.
        
               | miki123211 wrote:
               | And then there's also TLS fingerprinting.
               | 
               | Different browsers use TLS in slightly different ways,
               | send data in a slightly different order, have a different
               | set of supported extensions / algorithms etc.
               | 
               | If your user agent says Safari 18, but your TLS
               | fingerprint looks like Curl and not Safari, sophisticated
               | services will immediately detect that something isn't
               | right.
        
             | mzajc wrote:
             | I use the Random User-Agent Switcher[1] extension on
             | Firefox. It does pick real agents, but some of them might
             | show a really outdated browser (eg. Firefox 5X), which I
             | assume is the reason I'm getting blocked.
             | 
             | [1]: https://addons.mozilla.org/en-
             | US/firefox/addon/random_user_a...
        
           | pushcx wrote:
           | Rails is going to make this much worse for you. All new apps
           | include naive agent sniffing and block anything "old"
           | https://github.com/rails/rails/pull/50505
        
             | GoblinSlayer wrote:
             | def blocked?         user_agent_version_reported? &&
             | unsupported_browser?       end
             | 
             | well, you know what to do here :)
        
             | mzajc wrote:
             | This is horrifying. What happened to simply displaying a
             | "Your browser is outdated, consider upgrading" banner on
             | the website?
        
               | shbooms wrote:
               | idk, even that seems too much to me, but maybe I'm just
               | being too senstive.
               | 
               | but like, why is it a website's job to tell me what
               | browser version to use? unless my outdated browser is
               | lacking legitmate functionality which is required by your
               | website, just serve the page and be done with it.
        
               | michaelt wrote:
               | Back when the sun was setting on IE6, sites deployed
               | banners that basically meant "We don't test on this,
               | there's a good chance it's broken, but we don't know the
               | specifics because we don't test with it"
        
               | freedomben wrote:
               | Wow. And this is now happening right as I've blacklisted
               | google-chrome due to manifest v3 removal :facepalm:
        
               | whoopdedo wrote:
               | The irony being you can get around the block by
               | pretending to be a bot.
               | 
               | https://github.com/rails/rails/pull/52531
        
           | lovethevoid wrote:
           | Not sure a random UA extension is giving you much privacy.
           | Try your results on coveryourtracks eff, and see. A random UA
           | would provide a lot of identifying information despite being
           | randomized.
           | 
           | From experience, a lot of the things people do in hopes of
           | protecting their privacy only makes them far easier to
           | profile.
        
             | mzajc wrote:
             | coveryourtracks.eff.org is a great service, but it has a
             | few limitations that apply here:
             | 
             | - The website judges your fingerprint based on how unique
             | it is, but assumes that it's otherwise persistent.
             | Randomizing my User-Agent serves the exact opposite - a
             | given User-Agent might be more unique than using the
             | default, but I randomize it to throw trackers off.
             | 
             | - To my knowledge, its "One in x browsers" metric (and by
             | extension the "Bits of identifying information" and the
             | final result) are based off of visitor statistics, which
             | would likely be skewed as most of its visitors are privacy-
             | conscious. They only say they have a "database of many
             | other Internet users' configurations," so I can't verify
             | this.
             | 
             | - Most of the measurements it makes rely on javascript
             | support. For what it's worth, it claims my fingerprint is
             | not unique when javascript is disabled, which is how I
             | browse the web by default.
             | 
             | The other extreme would be fixing my User-Agent to the most
             | common value, but I don't think that'd offer me much
             | privacy unless I also used a proxy/NAT shared by many
             | users.
        
               | HappMacDonald wrote:
               | I would just fingerprint you as "the only person on the
               | internet who is scrambling their UA string" :)
        
         | neilv wrote:
         | Similar here. It's not unusual to be blocked from a site by
         | CloudFlare when I'm running Firefox (either ESR or current
         | release) on Linux.
         | 
         | I suspect that people operating Web sites have no idea how many
         | legitimate users are blocked by CloudFlare.
         | 
         | And. based on the responses I got when I contacted two of the
         | companies whose sites were chronically blocked by CloudFlare
         | for months, it seemed like it wasn't worth any employee's time
         | to try to diagnose.
         | 
         | Also, I'm frequently blocked by CloudFlare when running Tor
         | Browser. Blocking by Tor exit node IP address (if that's what's
         | happening) is much more understandable than blocking Firefox
         | from a residential IP address, but still makes CloudFlare not a
         | friend of people who want or need to use Tor.
        
           | pjc50 wrote:
           | > CloudFlare not a friend of people who want or need to use
           | Tor
           | 
           | The adversarial aspect of all this is a problem:
           | P(malicious|Tor) is much higher than P(malicious|!Tor)
        
           | jorams wrote:
           | > I suspect that people operating Web sites have no idea how
           | many legitimate users are blocked by CloudFlare.
           | 
           | I sometimes wonder if all Cloudflare employees are on some
           | kind of whitelist that makes them not realize the ridiculous
           | false positive rate of their bot detection.
        
           | amatecha wrote:
           | Yeah, I've contacted numerous owners of personal/small sites
           | and they are usually surprised, and never have any idea why I
           | was blocked (not sure if it's an aspect of CF not revealing
           | the reason, or the owner not knowing how to find that
           | information). One or two allowlisted my IP but that doesn't
           | strike me as a solution.
           | 
           | I've contacted companies about this and they usually just
           | tell me to use a different browser or computer, which is like
           | "duh, really?" , but also doesn't solve the problem for me or
           | anyone else.
        
           | lovethevoid wrote:
           | What are some examples? I've been running ff on linux for
           | quite some time now and am rarely blocked. I just run it with
           | ublock origin.
        
             | capitainenemo wrote:
             | Odds are they have Resist Fingerprinting turned on. When I
             | use it in a Firefox profile I encounter this all over the
             | place. Drupal, FedEx.. some sites handle it better than
             | others. Some it's a hard block with a single terse error.
             | Some it is a challenge which gets blocked due to using
             | remote javascript. Some it's a local challenge you can get
             | past. But it has definitely been getting worse.
             | Fingerprinting is being normalised, and the excuse of "bot
             | protection" (bots can make unique fingerprints too, though)
             | means that it can now be used maliciously (or by ad
             | networks like google, same diff) as a standard feature.
        
           | johnklos wrote:
           | I've had several discussions that were literally along the
           | lines of, "we don't see what you're talking about in our
           | logs". Yes, you don't - traffic is blocked _before_ it gets
           | to your servers!
        
         | pessimizer wrote:
         | Also, Cloudflare won't let you in if you forge your referer
         | (it's nobody's business what site I'm coming from.) For years,
         | you could just send the root of the site you were visiting,
         | then last year somebody at Cloudflare flipped a switch and took
         | a bite out of everyone's privacy. Now it's just endless
         | reloading captchas.
        
           | zamadatix wrote:
           | Why go through that hassle instead of just removing the
           | referer?
        
             | bityard wrote:
             | Lots of sites see an empty referrer and send you to their
             | main page or marketing page. Which means you can't get
             | anywhere else on their site without a valid referrer. They
             | consider it a form of "hotlink" protection.
             | 
             | (I'm not saying I agree with it, just that it exists.)
        
               | zamadatix wrote:
               | Fair and valid answer to my wording. Rewritten for what I
               | meant to ask: "Why set referrer to the base of the
               | destination origin instead of something like Referrer-
               | Policy: strict-origin?". I.e. remove it completely for
               | cross-origin instead of always making up that you came
               | from the destination.
               | 
               | Though what you mention does beg the question "is there
               | really much privacy gain in that over using Referrer-
               | Policy: same-origin and having referrer based pages work
               | right?" I suppose so if you're randomizing your identity
               | in an untrackable way for each connection it could be
               | attractive... though I think that'd trigger being
               | suspected as a bot far before the lack of proper same
               | origin info :p.
        
           | philsnow wrote:
           | Ah, maybe this is what's happening to me.. I use Firefox with
           | uBlock origin, privacy badger, multi-account containers, and
           | temporary containers.
           | 
           | Whenever I click a link to another site, i get a new tab in
           | either a pre-assigned container or else in a "tmpNNNN"
           | container, and i think either by default or I have it
           | configured to omit Referer headers on those new tab
           | navigations.
        
         | anthk wrote:
         | Or any Dillo user, with a PSP User Agent which is legit for
         | small displays.
        
         | jasonlotito wrote:
         | Cloudflare has always been a dumpster fire in usability. The
         | number of times it would block me in that way was enough to
         | make me seriously question anyones technical knowledge that
         | used it. It's a dumpster fire. Friends don't let friend use
         | Cloudflare. To me, it's like the Spirit airlines of CDNs.
         | 
         | Sure, tech wise it might work great, but from your users
         | perspective: it's trash.
        
           | immibis wrote:
           | It's got the best vendor lock-in enshittification story -
           | it's free - and that's all that matters.
        
         | DrillShopper wrote:
         | Maybe after the courts break up Amazon the FTC can turn its eye
         | to Cloudflare.
        
           | gjsman-1000 wrote:
           | A. Do you think courts give a darn about the 0.1% of users
           | that are still using RSS? We might as well care about the
           | 0.1% of users who want the ability to set every website's
           | background color to purple with neon green anchor tags. RSS
           | never caught on as a standard to begin with, peaking at 6%
           | adoption by 2005.
           | 
           | B. Cloudflare has healthy competition with AWS, Akamai,
           | Fastly, Bunny.net, Mux, Google Cloud, Azure, you name it,
           | there's a competitor. This isn't even an Apple vs Google
           | situation.
        
             | HappMacDonald wrote:
             | Cloudflare doesn't offer the same product suite as the
             | other companies you mention, though. Cloudflare is
             | primarily DDoS prevention while the others are primarily
             | cloud hosting.
             | 
             | And it is the DDoS prevention measures at issue here.
        
         | KPGv2 wrote:
         | Reddit seems to do this to me (sometimes) when I use Zen
         | browser. Switching over to Safari or Chrome and the site always
         | works great.
        
         | kjkjadksj wrote:
         | Reddit has been bad about it as of late too
        
       | rcarmo wrote:
       | Ironically, the site seems to currently be hugged to death, so
       | maybe they should consider using Cloudflare to deal with HN
       | traffic?
        
         | timeon wrote:
         | If it is unintentional DDoS, we can wait. Not everything needs
         | to be on demand.
        
           | dewey wrote:
           | The website is built to get attention, the attention is here
           | right now. Nobody will remember to go back tomorrow and read
           | the site again when it's available.
        
             | BlueTemplar wrote:
             | I'm not sure an open web can exist under this kind of
             | assumption...
             | 
             | Once you start chasing views, it's going to come at the
             | detriment of everything else.
        
               | dewey wrote:
               | This happened at least 15 years ago and we are doing
               | okay.
        
         | sofixa wrote:
         | Doesn't have to be using CloudFlare, just a static web host
         | that will be able to scale to infinity (of which CloudFlare is
         | one with Pages, but there's also Google with Firebase Hosting,
         | AWS with Amplify, Microsoft with something in Azure with a
         | verbose name, Netlify, Vercel, GitHub Pages, etc etc etc).
        
           | kawsper wrote:
           | Or just add Varnish or Nginx configured with a cache in
           | front.
        
             | sofixa wrote:
             | That can still exhaust system resources on the box it's
             | running on (file descriptors, inodes, ports,
             | CPU/memory/bandwidth, etc) if you hit it too big.
             | 
             | For something like entirely static content, it's so much
             | easier (and cheaper, all of the static hosting providers
             | have an extremely generous free tier) to use static
             | hosting.
             | 
             | And I say this as an SRE by heart who runs Kubernetes and
             | Nomad for fun across a number of nodes at home and in
             | various providers - my blog is on a static host. Use the
             | appropriate solution for each task.
        
             | vundercind wrote:
             | I used to serve low-tens-of-MB .zip files--worse than a web
             | page and a few images or what have you--statically from
             | Apache2 on a boring Linux server that'd qualify as potato-
             | tier today, with traffic spikes into the hundreds of
             | thousands per minute. Tens of thousands per minute against
             | other endpoints gated by PHP setting a header to tell
             | Apache2 to serve the file directly if the client
             | authenticated correctly, and I think that one could have
             | gone a lot higher, never really gave it a workout. Wasn't
             | even really taxing the hardware that much for either
             | workload.
             | 
             | Before that, it was on a mediocre-even-at-the-time
             | dedicated-cores VM. That caused performance problems...
             | because its Internet "pipe" was straw-sized, it turned out.
             | The server itself was fine.
             | 
             | Web server performance has regressed amazingly badly in the
             | world of the Cloud. Even "serious" sites have decided the
             | performance equivalent of shitty shared-host Web hosting is
             | a great idea and that introducing all the problems of
             | distributed computing at the architecture level will help
             | their moderate-traffic site work better (LOL; LMFAO), so
             | now they need Cloudflare and such just so their "scalable"
             | solution doesn't fall over in a light breeze.
        
       | erikrothoff wrote:
       | As the owner of an RSS reader I love that they are making this
       | more public. 30% of our support requests are "my feed doesn't"
       | work. It sucks that the only thing we can say is "contact the
       | site owner, it's their firewall". And to be fair it's not only
       | Cloudflare, so many different firewall setups cause issues. It's
       | ironic that a public API endpoint meant for bots is blocked for
       | being a bot.
        
       | ricardo81 wrote:
       | iirc even if you're listed as a "good bot" with Cloudflare, high
       | security settings by the CF user can still result in 403s.
       | 
       | No idea if CF already does this, but allowing users to generate
       | access tokens for 3rd party services would be another way of
       | easing access alongside their apparent URL and IP whitelisting.
        
       | mbo wrote:
       | This is an active issue with Rate Your Music right now:
       | https://rateyourmusic.com/rymzilla/view?id=6108
       | 
       | Unfixed for 4 months.
        
       | jgrahamc wrote:
       | My email is jgc@cloudflare.com. I'd like to hear from the owners
       | of RSS readers directly on what they are experiencing. Going to
       | ask team to take a closer look.
        
         | viraptor wrote:
         | It's cool and all that you're making an exception here, but how
         | about including a "no, really, I'm actually a human" link on
         | the block page rather than giving the visitor a puzzle: how to
         | report the issue to the page owner (hard on its own for
         | normies) if you can't even load the page. This is just
         | externalising issues that belong to the Cloudflare service.
        
           | methou wrote:
           | Some clients are more like a bot/service, imagine google
           | reader that fetches and caches content for you. The client
           | I'm currently using is miniflux, it also works in this way.
           | 
           | I understand that there are some more interactive rss
           | readers, but from personal experience it's more like "hey I'm
           | a good bot, let me in"
        
             | _Algernon_ wrote:
             | An rss reader is a user agent (ie. a software acting on
             | behalf of its users). If you define rss readers as a bot
             | (even if it is a good bot), you may as well call Firefox a
             | bot (it also sends off web requests without explicit
             | approval of each request by the browser).
        
               | sofixa wrote:
               | Their point was that the RSS reader does the scraping on
               | its own in the background, without user input. If it
               | can't read the page, it can't; it's not initiated by the
               | user where the user can click on a "I'm not a bot, I
               | promise" button.
        
             | viraptor wrote:
             | It was a mental skip, but the same idea. It would awesome
             | if CF just allowed reporting issues at the point something
             | gets blocked - regardless if it's a human or a bot. They're
             | missing an "I'm misclassified" button for people actually
             | affected without the third-party runaround.
        
               | fluidcruft wrote:
               | Unfortunately, I would expect that queue of reports to
               | get flooded by bad faith actors.
        
               | viraptor wrote:
               | Sure, but now they say that queue should go to the
               | website owner instead, who has less global visibility on
               | the traffic. So that's just ignoring something they don't
               | want to deal with.
        
           | jgrahamc wrote:
           | I am not trying to "make an exception", I'm asking for
           | information external to Cloudflare so I can look at what
           | people are experiencing and compare with what our systems are
           | doing and figure out what needs to improve.
        
             | robertlagrant wrote:
             | This is useful info:
             | https://news.ycombinator.com/item?id=33675847
        
             | PaulRobinson wrote:
             | Some "bots" are legitimate. RSS is intended for machine
             | consumption. You should not be blocking content intended
             | for machine consumption because a machine is attempting to
             | consume it. You should not expect a machine, consuming
             | content intended for a machine, to do some sort of step to
             | show they aren't a machine, because they are in fact a
             | machine. There is a lot of content on the internet that is
             | not used by humans, and so checking that humans are using
             | it is an aggressive anti-pattern that ruins experiences for
             | millions of people.
             | 
             | It's not that hard. If the content being requested is RSS
             | (or Atom, or some other syndication format intended for
             | consumption by software), just don't do bot checks, use
             | other mechanisms like rate limiting if you must stop abuse.
             | 
             | As an example: would you put a captcha on robots.txt as
             | well?
             | 
             | As other stories here can attest to, Cloudflare is slowly
             | killing off independent publishing on the web through poor
             | product management decisions and technology
             | implementations, and the fix seems pretty simple.
        
               | jamespo wrote:
               | From another post, if the content-type is correct it gets
               | through. If this is the case I don't see the problem.
        
               | Scramblejams wrote:
               | It's a very common misconfiguration, though, because it
               | happens by default when setting up CF. If your customers
               | are, by default, configuring things incorrectly, then
               | it's reasonable to ask if the service should surface the
               | issue more proactively in an attempt to help customers
               | get it right.
               | 
               | As another commenter noted, not even CF's own RSS feed
               | seems to get the content type right. This issue could
               | clearly use some work.
        
           | doctor_radium wrote:
           | I had a conversation with a web site owner about this once.
           | There apparently is such a feature, a way for sites to
           | configure a "Please contact us here if you're having trouble
           | reaching our site" page...usage of which I assume Cloudflare
           | could track and then gain better insight into these issues.
           | The problem? It requires a Premium Plan.
        
         | kalib_tweli wrote:
         | There are email obfuscation and managed challenge script tags
         | being injected into the RSS feed.
         | 
         | You simply shouldn't have any challenges whatsoever on an RSS
         | feed. They're literally meant to be read by a machine.
        
           | kalib_tweli wrote:
           | I confirmed that if you explicitly set the Content-Type
           | response header to application/rss+xml it seems to work with
           | Cloudflare Proxy enabled.
           | 
           | The issue here is that Cloudflare's content type check is
           | naive. And the fact that CF is checking the content-type
           | header directly needs to be made more explicit OR they need
           | to do a file type check.
        
             | londons_explore wrote:
             | I wonder if popular software for _generating_ RSS feeds
             | might not be setting the correct content-type header? Maybe
             | this whole issue could be mostly-fixed by a few github PR
             | 's...
        
               | kalib_tweli wrote:
               | It wouldn't. It's the role of the HTTP server to set the
               | correct content type header.
        
               | djbusby wrote:
               | The number of feeds with crap headers and other non-spec
               | stuff going on; and loads of clients missing useful
               | headers. Ugh. It seems like it should be simple; maybe
               | that's why there are loads of naive implementations.
        
               | onli wrote:
               | Correct might be debatable here as well. My blog for
               | example sets Content-Type to text/xml, which is not
               | exactly wrong for an RSS feed (after all, it is text and
               | XML) and IIRC was the default back then.
               | 
               | There were compatibility issues with other type headers,
               | at least in the past.
        
               | johneth wrote:
               | I think the current correct content types are:
               | 
               | 'application/rss+xml' (for RSS)
               | 
               | 'application/atom+xml' (for Atom)
        
               | londons_explore wrote:
               | Sounds like a kind samaritan could write a scanner to
               | find as many RSS feeds as possible which look like
               | RSS/Atom and _don 't_ have these content types, then go
               | and patch the hosting software those feeds use to have
               | the correct content types, or ask the webmasters to fix
               | it if they're home-made sites.
               | 
               | As soon as a majority of sites use the correct types,
               | clients can start requiring it for newly added feeds,
               | which in turn will make webmasters make it right if they
               | want their feed to work.
        
               | onli wrote:
               | Not even Cloudflares own blog uses those,
               | https://blog.cloudflare.com/rss/, or am I getting a wrong
               | content-type shown in my dev tools? For me it is
               | `application/xml`. So even if `application/rss+xml` were
               | the correct type by an official spec, it's not something
               | to rely on if it's not used commonly.
        
               | johneth wrote:
               | I just checked Wikipedia and it says Atom's is
               | 'application/atom+xml' (also confirmed in the IANA
               | registry), and RSS's is 'application/rss+xml' (but it's
               | not registered yet, and 'text/xml' is also used widely).
               | 
               | 'application/rss+xml' seems to be the best option though
               | in my opinion. The '+xml' in the media type tells (good)
               | parsers to fall back to using an XML parser if they don't
               | understand the 'rss' part, but the 'rss' part provides
               | more accurate information on the content's type for
               | parsers that do understand RSS.
               | 
               | All that said, it's a mess.
        
           | o11c wrote:
           | Even outside of RSS, the injected scripts often make internet
           | security significantly _worse_.
           | 
           | Since the user-agent has no way to distinguish scripts
           | injected by cloudflare from scripts originating from the
           | actual website, in order to pass the challenge they are
           | forced to execute arbitrary code from an untrusted party. And
           | malicious Javascript is practically ubiquitous on the general
           | internet.
        
         | prmoustache wrote:
         | It is not only rss reader users that are affected. Any user
         | with some extension to block trackers get regularly forbidden
         | access to websites or have to deal with tons of captcha.
        
         | kevincox wrote:
         | I'll mail you as well but I think public discussion is helpful.
         | Especially since I have seem similar responses to this over the
         | years and it feels very disingenuous. The problem is very clear
         | (Cloudflare serves 403 blocks to feed readers for no reason)
         | you have all of the logs. The solution is maybe not trivial but
         | I fail to see how the perspective of someone seeing a 403 block
         | is going to help much. This just starts to sound like a way to
         | seem responsive without actually doing anything.
         | 
         | From the feed reader perspective it is a 403 response. For
         | example my reader has been trying to read
         | https://blog.cloudflare.com/rss/ and the last successful
         | response it got was on 2021-11-17. It has been backing off due
         | to "errors" but it still is checking every 1-2 weeks and gets a
         | 403 every time.
         | 
         | This obviously isn't limited to the Cloudflare blog, I see it
         | on many site "protected by" (or in this case broken by)
         | Cloudflare. I could tell you what public cloud IPs my reader
         | comes from or which user-agent it uses but that is besides the
         | point. This is a URL which is clearly intended for bots so it
         | shouldn't be bot-blocked by default.
         | 
         | When people reach out to customer support we tell them that
         | this is a bug for the site and there isn't much we can do. They
         | can try contacting the site owner but this is most likely the
         | default configuration of Cloudflare causing problems that the
         | owner isn't aware of. I often recommend using a service like
         | FeedBurner to proxy the request as these services seem to be on
         | the whitelist of Cloudflare and other scraping prevention
         | firewalls.
         | 
         | I think the main solution would be to detect intended-for-
         | robots content and exclude it from scraping prevention by
         | default (at least to a huge degree).
         | 
         | Another useful mechanism would be to allow these to be accessed
         | when the target page is cachable, as the cache will protect the
         | origin from overload-type DoS attacks anyways. Some care needs
         | to be taken to ensure that adding a ?bust={random} query
         | parameter can't break through to the origin but this would be a
         | powerful tool for endpoints that need protection from overload
         | but not against scraping (like RSS feeds). Unfortunately cache
         | headers for feeds are far from universal, so this wouldn't fix
         | all feeds on its own. (For example the Cloudflare blog's feed
         | doesn't set any caching headers and is labeled as `cf-cache-
         | status: DYNAMIC`.)
        
         | is_true wrote:
         | Maybe when you detect urls that return the rss mimetype notify
         | the owner of the site/CF account that it might be a good idea
         | to allow bots on that urls.
         | 
         | Ideally you could make it a simple switch in the config,
         | somethin like: "Allow automated access on RSS endpoints".
        
         | badlibrarian wrote:
         | Thank you for showing up here and being open to feedback. But I
         | have to ask: shouldn't Cloudflare be running and reviewing
         | reports to catch this before it became such a problem? It's
         | three clicks in Tableau for anyone who cares, and clearly
         | nobody does. And this isn't the first time something like this
         | has slipped through the cracks.
         | 
         | I tried reaching out to Cloudflare with issues like this in the
         | past. The response is dozens of employees hitting my LinkedIn
         | page yet no responses to basic, reproduceable technical issues.
         | 
         | You need to fix this internally as it's a reputational problem
         | now. Less screwing around using Salesforce as your private
         | Twitter, more leadership in triage. Your devs obviously aren't
         | motivated to fix this stuff independently and for whatever
         | reason they keep breaking the web.
        
           | 015a wrote:
           | The reality that HackerNews denizens need to accept, in this
           | case and in a more general form, is: RSS feeds are not
           | popular. They aren't just unpopular in the way that, say,
           | Peacock is unpopular relative to Netflix; they're _truly_
           | unpopular, used regularly by a number of people that could
           | fit in an american football stadium. There are younger
           | software engineers at Cloudflare that have never heard the
           | term  "RSS" before, and have no notion of what it is. It will
           | probably be dead technology in ten years.
           | 
           | I'm not saying this to say its a good thing; it isn't.
           | 
           | Here's something to consider though: Why are we going after
           | Cloudflare for this? Isn't the website operator far, far more
           | at-fault? They chose Cloudflare. They configure Cloudflare.
           | They, in theory, publish an RSS feed, which is broken because
           | of infrastructure decisions _they_ made. You 're going after
           | Ryobi because you've got a leaky pipe. But beyond that: isn't
           | this tool Cloudflare publishes doing exactly what the website
           | operators intended it to do? It blocks non-human traffic. RSS
           | clients are non-human traffic. Maybe the reason you don't
           | want to go after the website operators is because you know
           | you're in the wrong? Why can't these RSS clients detect when
           | they encounter this situation, and prompt the user with a
           | captive portal to get past it?
        
             | badlibrarian wrote:
             | I'm old enough to remember Dave Winer taking Feedburner to
             | task for inserting crap into RSS feeds that broke his code.
             | 
             | There will always be niche technologies and nascent
             | standards and we're taking Cloudflare to task today because
             | if they continue to stomp on them, we get nowhere.
             | 
             | "Don't use Cloudflare" is an option, but we can demand
             | both.
        
               | gjsman-1000 wrote:
               | "Old man yells at cloud about how the young'ns don't
               | appreciate RSS."
               | 
               | I mean that somewhat sarcastically; but there does come a
               | point where the demands are unreasonable, the technology
               | is dead. There are probably more people browsing with
               | JavaScript disabled than using RSS feeds. There are
               | probably more people browsing on Windows XP than using
               | RSS feeds. Do I yell at you because your personal blog
               | doesn't support IE6 anymore?
        
               | badlibrarian wrote:
               | Spotify and Apple Podcasts use RSS feeds to update what
               | they show in their apps. And even if millions of people
               | weren't dependent on it, suggesting that an
               | infrastructure provider not fix a bug only makes the web
               | worse.
        
               | 015a wrote:
               | I'm not backing down on this one: This is straight up an
               | "old man yelling at the kids to get off his lawn"
               | situation, and the fact that JGC from Cloudflare is in
               | here saying "we'll take a look at this" is so far and
               | beyond what anyone reasonable would expect of them that
               | they deserve praise and nothing else.
               | 
               | This is a matter between You and the Website Operators,
               | period. Cloudflare has nothing to do with this. This
               | article puts "Cloudflare" in the title because its fun to
               | hate on Cloudflare and it gets upvotes. Cloudflare is a
               | tool. These website operators are using Cloudflare The
               | Tool to block inhuman access to their websites. RSS
               | CLIENTS ARE NOT HUMAN. Let me repeat that: Cloudflare's
               | bot detection is working fully appropriately here,
               | because RSS Clients are Bots. Everything here is working
               | as expected. The part where change should be asked is:
               | Website operators should allow inhuman actors past the
               | Cloudflare bot detection firewall specifically for RSS
               | feeds. They can FULLY DO THIS. Cloudflare has many, many
               | knobs and buttons that Website Operators can tweak; one
               | of those is e.g. a page rule to turn off bot detection
               | for specific routes, such as `/feed.xml`.
               | 
               | If your favorite website is not doing this, its NOT
               | CLOUDFLARE'S FAULT.
               | 
               | Take it up with the Website Operators, Not Cloudflare.
               | Or, build an RSS Client which supports a captive portal
               | to do human authorization. God this is so boring, y'all
               | just love shaking your first and yelling at big tech for
               | LITERALLY no reason. I suspect its actually because half
               | of y'all are concerningly uneducated on what we're
               | talking about.
        
               | badlibrarian wrote:
               | As part of proxying what may be as much as 20% of the
               | web, Cloudflare injects code and modifies content that
               | passes between clients and servers. It is in their core
               | business interests to receive and act upon feedback
               | regarding this functionality.
        
               | 015a wrote:
               | Sure: Let's begin by not starting the conversation with
               | "Don't use Cloudflare", as you did. That's obviously not
               | only unhelpful, but it clearly points the finger at the
               | wrong party.
        
               | 627467 wrote:
               | What's does cloudflare do to search crawlers by default?
               | Does it block them too?
        
       | soraminazuki wrote:
       | This is an issue with techdirt.com. I contacted them about this
       | through their feedback form a long time ago, but the issue still
       | remains unfortunately.
        
       | dewey wrote:
       | I'm using Miniflix and I always run into that on a few blogs
       | which now I just stopped reading.
        
       | MarvinYork wrote:
       | In any case, it blocks German Telekom users. There is an ongoing
       | dispute between Cloudflare and Telekom as to who pays for the
       | traffic costs. Telekom is therefore throttling connections to
       | Cloudflare. This is the reason why we can no longer use
       | Cloudflare.
        
         | SSLy wrote:
         | as much as I am not a fan of cloudflare's practices, in this
         | particular case DTAG seems to be the party at fault.
        
       | hwj wrote:
       | I had problems accessing Cloudflare-hosted websites via the Tor
       | browser also. Don't know it that is still true.
        
       | whs wrote:
       | My company runs a tech news website. We offer RSS feed as any
       | Drupal website would, which content farm just scrape our RSS feed
       | to rehost our content in full. This is usually fine for us - the
       | content is CC-licensed and they do post the correct source. But
       | they run thousands of different WordPress instances on the same
       | IP and they individually fetch the feed.
       | 
       | In the end we had to use Cloudflare to rate limit the RSS
       | endpoint.
        
         | kevincox wrote:
         | > In the end we had to use Cloudflare to rate limit the RSS
         | endpoint.
         | 
         | I think this is fine. You are solving a specific problem and
         | still allowing some traffic. The problem with the Cloudflare
         | default settings is that they block _all_ requests leading to
         | users failing to get any updates even when fetching the feed at
         | a reasonable rate.
         | 
         | BTW in this case another solution may just be to configure
         | proper caching headers. Even if you only cache for 5min at a
         | time that will be at most 1 request every 5min per Cloudflare
         | caching location (I don't know the exact configuration but
         | typically use ~5 locations per origin, so that would be only
         | 1req/min which is trivial load and will handle both these
         | inconsiderate scrapers and regular users. You can also
         | configure all fetches to come from a single location and then
         | you would only need to actually serve the feed once per 5min)
        
         | yjftsjthsd-h wrote:
         | > In the end we had to use Cloudflare to rate limit the RSS
         | endpoint.
         | 
         | Isn't the correct solution to use CF to _cache_ RSS endpoints
         | aggressively?
        
       | prmoustache wrote:
       | I believe this also pose issues to people running adblockers. I
       | get tons of repetitive captchas on some websites.
       | 
       | Also other companies offering similar services like imperva seems
       | to be straight banning my ip after one visit to a website with
       | uBlock Origin I first get a captcha, then a page saying I am not
       | allowed, and whatever I do, even using an extensionless chrome
       | browser with a new profile I can't visit it anymore because my ip
       | is banned.
        
         | acdha wrote:
         | One thing to keep in mind is that the modern web sees a lot of
         | spam and scraping, and ad revenue has been sliding for years.
         | If you make your activity look like a not, most operators will
         | assume you're not generating revenue and block you. It sucks
         | but thank a spammer for the situation.
        
       | est wrote:
       | Hmmm, that's why "feedburner" is^H^Hwas a thing, right?
       | 
       | We have come to full circle.
        
         | kevincox wrote:
         | Yeah, this is the recommendation that I usually give people who
         | reach out to support. Feedburner tends to be on the whitelists
         | to avoids this problem.
        
       | pointlessone wrote:
       | I see this on a regular basis. My self-hosted RSS reader is
       | blocked by Cloudflare even after my IP address was explicitly
       | allowlisted by a few feed owners.
        
       | account42 wrote:
       | Or just normal human users with a niche browser like Firefox.
        
       | wraptile wrote:
       | Cloudflare has been the bane of my web existance on Thai IP and a
       | Linux Firefox fingerprint. I wonder how much traffic is lost
       | because of Cloudflare and of course none of that is reported to
       | the web admins so everyone continues with their jolly ignorance.
       | 
       | I wrote my own RSS bridge that scrapes websites using Scrapfly
       | web scraping API that bypasses all that because it's so annoying
       | that I can't even scrape some company's /blog that they are
       | literally buying ads for but somehow have an anti-bot enabled
       | that blocks all RSS readers.
       | 
       | Modern web is so anti social that the web 2.0 guys should be
       | rolling in their "everything will be connected with APIs" graves
       | by now.
        
         | vundercind wrote:
         | The late '90s-'00s solution was to blackhole address blocks
         | associated with entire countries or continents. It was easily
         | worth it for many US sites that weren't super-huge to lose the
         | the 0.1% of legitimate requests they'd get from, say, China or
         | Thailand or Russia, to cut the speed their logs scrolled at by
         | 99%.
         | 
         | The state of the art isn't much better today, it seems. Similar
         | outcome with more steps.
        
       | hkt wrote:
       | It also manages to break IRC bots that do things like show the
       | contents of the title tag when someone posts a link. Another
       | cloudy annoyance, albeit a minor one.
        
       | shaunpud wrote:
       | Namesilo are the same, their csv/rss behind Cloudflare so don't
       | even bother anymore with their auctions and their own interface
       | is meh
        
       | anilakar wrote:
       | ...and there is a good number of people who see this as a
       | feature, not a bug.
        
       | timnetworks wrote:
       | RSS is the future that is being kept from us for twenty years
       | already, fusion can kick bricks.
        
       | nfriedly wrote:
       | Liliputing.com had this problem a couple of years ago. I emailed
       | the author and he got it sorted out after a bit of back and
       | forth.
        
       | 015a wrote:
       | Suggesting that website operators should allowlist RSS clients
       | through the Cloudflare bot detection system via their user-agent
       | is a rather concerning recommendation.
        
       | artooro wrote:
       | This is a truly problematic issue that I've experienced as well.
       | The best solution is probably for Cloudflare to figure out what
       | normal RSS usage looks like and have a provision for that in
       | their bot detection.
        
       | idunnoman1222 wrote:
       | Yes, the way to retain your privacy is to not use the Internet
       | 
       | if you don't like it, make your own Internet: assumedly one not
       | funded by ads
        
       | hugoromano wrote:
       | "could be blocking RSS users" it says it all "could". I use RSS
       | on my websites, which are serviced by Cloudflare, and my users
       | are not blocked. For that, fine-tuning and setting Configuration
       | Rules at Cloudflare Dashboard are required. Anyone on a free has
       | access to 10 Configuration Rules. I prefer using Cloudflare
       | Workers to tune better, but there is a cost. My suggestion for
       | RSS these days is to reduce the info on RSS feed to teasers, AI
       | bots are using RSS to circumvent bans, and continue to scrape.
        
       | srmarm wrote:
       | I'd have thought the website owner whitelisting their RSS feed
       | URI (or pattern matching *.xml/*.rss) might be better than doing
       | it based on the users agent string. For one you'd expect bot
       | traffic on these end points and you're also not leaving a door
       | open to anyone who fakes their user agent.
       | 
       | Looks like it should be possible under the WAF
        
       | wenbin wrote:
       | At Listen Notes, we rely heavily on Cloudflare to manage and
       | protect our services, which cater to both human users and
       | scripts/bots.
       | 
       | One particularly effective strategy we've implemented is using
       | separate subdomains for services designed for different types of
       | traffic, allowing us to apply customized firewall and page rules
       | to each subdomain.
       | 
       | For example:
       | 
       | - www. listennotes.com is dedicated to human users. E.g.,
       | https://www.listennotes.com/podcast-realtime/
       | 
       | - feeds. listennotes.com is tailored for bots, providing access
       | to RSS feeds. Eg., https://feeds.listennotes.com/listen/wenbin-
       | fangs-podcast-pl...
       | 
       | - audio. listennotes.com serves both humans and bots, handling
       | audio URL proxies. E.g.,
       | https://audio.listennotes.com/e/p/1a0b2d081cae4d6d9889c49651...
       | 
       | This subdomain-based approach enables us to fine-tune security
       | and performance settings for each type of traffic, ensuring
       | optimal service delivery.
        
         | kevindamm wrote:
         | Where do you put your sitemap (or its equivalent)? Looking at
         | the site, I don't notice one in the metadata but I do see a
         | "site index" on the www subdomain, though possibly that's
         | intended for humans not bots? I think the usual recommendation
         | is to have a sitemap per subdomain and not mix them, but
         | clearly they're meant for bots not humans...
        
           | wenbin wrote:
           | Great question.
           | 
           | We only need to provide the sitemap (with custom paths, not
           | publicly available) in a few specific places, like Google
           | Search Console. This means the rules for managing sitemaps
           | are quite manageable. It's not a perfect setup, but once we
           | configure it, we can usually leave it untouched for a long
           | time.
        
       | butz wrote:
       | Not "could" but it is actually blocking. Very annoying when
       | government website does that, as usually it is next to impossible
       | to explain the issue and ask for a fix. And even if the fix is
       | made, it is reverted several weeks later. Other websites does
       | that too, it was funny when one website was asking RSS reader to
       | resolve captcha and prove they are human.
        
       | elwebmaster wrote:
       | Using Cloudflare on your website could be blocking Safari users,
       | Chrome users, or just any users. It's totally broken. They have
       | no way of measuring the false positives. Website owners are
       | paying for it in lost revenue. And poor users who lose access for
       | no fault of their own. Until some C-level exec at a BigTech
       | randomly gets blocked and makes noise. But even then, Cloudflare
       | will probably just whitelist that specific domain/IP. It is very
       | interesting how I have never been blocked when trying to access
       | Cloudflare itself, only blocked on their customer's sites.
        
       | pentagrama wrote:
       | Can you whitelists urls to be readead by bot on Cloudflare? Maybe
       | this is a good solution, and there you can put your RSS feeds,
       | sitemaps, and other content for bots. Also Cloudflare can make a
       | dedicated fields to whitelists RSS and Sitemaps on the admin
       | panel so users can discover more easily that they may don't want
       | block those bots.
       | 
       | Can you whitelist URLs to be read by bots on Cloudflare? Maybe
       | this is a good solution, where you as a site mantainer can
       | include your RSS feeds, sitemaps, and other content for bots.
       | 
       | Also, Cloudflare could ship a feature by creating a dedicated
       | section in the admin panel to let the user add and whitelist RSS
       | feeds and sitemaps, making it easier (and educate) users to avoid
       | blocking those bots who aren't a threat to your site, of course
       | sill considering rules to avoid DDOS on this urls, like massive
       | requests or stuff that common bots from RSS readers don't do.
        
       | ectospheno wrote:
       | I love that I get a cloudflare human check on almost every page
       | they serve for customers except for when I login to my cloudflare
       | account. Good times.
        
       | conesus wrote:
       | I run NewsBlur[0] and I've been battling this issue of NewsBlur
       | fetching 403s across the web for months now. My users are
       | revolting and asking for refunds. I've tried emailing dozens of
       | site owners and publishers and only two of them have done the
       | work of whitelisting their RSS feed. It's maddening and is having
       | a real negative effect on NewsBlur.
       | 
       | NewsBlur is an open-source RSS news reader (full source available
       | at [1]), something we should all agree is necessary to support
       | the open web! But Cloudflare blocking all of my feed fetchers is
       | bizarre behavior. And we're on the verified bots list for years,
       | but it hasn't made a difference.
       | 
       | Let me know what I can do. NewsBlur publishes a list of IPs that
       | it uses for feed fetching that I've shared with Cloudflare but it
       | hasn't made a difference.
       | 
       | I'm hoping Cloudflare uses the IP address list that I publish and
       | adds them to their allowlist so NewsBlur can keep fetching (and
       | archiving) millions of feeds.
       | 
       | [0]: https://newsblur.com
       | 
       | [1]: https://github.com/samuelclay/NewsBlur
        
         | AyyEye wrote:
         | Three consenting parties trying to use their internet blocked
         | by a single intermediary that's too big to care is just gross.
         | It's the web we deserve.
        
           | eddythompson80 wrote:
           | > Three consenting parties
           | 
           | Clearly they are not 100% consenting, or at best one of them
           | (the content publisher) is misconfiguring/misunderstanding
           | their setup. They enabled RSS on their service, then setup a
           | rule to require human verification for accessing that RSS
           | feed.
           | 
           | It's like a business advertising a singles only area, then
           | hiring a security company and telling them to only allow
           | couples in the building.
        
             | AyyEye wrote:
             | If Cloudflare was honest and upfront about the tradeoffs
             | being made and the fact that it's still going to require
             | configuration and maintenance work they'd have
             | significantly less customers.
        
         | srik wrote:
         | RSS is an essential component to modern web publishing and it
         | feels scary to see how one company's inconsideration might harm
         | its already fragile future. One day cloudflare will get big
         | enough to be subject to antitrust regulation and this instance
         | will be a strong data point working against them.
        
         | p4bl0 wrote:
         | I've been a paying NewsBlur user since the downfall of Google
         | Reader and I'm very happy with it. Thank you for NewsBlur!
        
         | wooque wrote:
         | You just bypass it with library like cloudscraper/hrequests.
        
         | brightball wrote:
         | I use Cloudflare and have home built RSS feeds on my site. If
         | you've run into any issues on mine, I'll be happy to look into
         | them.
         | 
         | https://www.brightball.com/
        
         | miohtama wrote:
         | Thank you for the hard work.
         | 
         | Newsblur was the first SaaS I could afford as a student. I have
         | been subscriber for something like 20 years now. And I will
         | keep doing it to the grave. Best money ever spent.
        
       | tandav wrote:
       | As an admin of my personal website, I completely disable all
       | Cloudflare features and use it only for DNS and domain
       | registration. I also stop following websites that use Cloudflare
       | checks or cookie popups (cookies are fine, but the popups are
       | annoying).
        
       | renewiltord wrote:
       | Ah, the Cloudflare free plan does not automatically turn these
       | on. I know since I use it for some small things and don't have
       | these on. I wouldn't use User-Agent filtering because those are
       | spoofable. But putting feeds on a separate URL is probably a good
       | idea. Right now the feed is actually generated on request for
       | these sites, so caching it is probably a good idea anyway. I can
       | just rudimentarily do that by periodically generating and copying
       | it over.
        
       | drudru wrote:
       | I noticed this a while back when I was trying to read
       | cloudflare's own blog. Periodically they would block my
       | newsreader. I ended up just dropping their feed.
       | 
       | I am glad to see other people calling out the problem. Hopefully,
       | a solution will emerge.
        
       ___________________________________________________________________
       (page generated 2024-10-17 23:01 UTC)