hngopher.com

       [HN Gopher] 13ft - A site similar to 12ft.io but self-hosted
       ___________________________________________________________________
        
       13ft - A site similar to 12ft.io but self-hosted
        
       Author : darknavi
       Score  : 191 points
       Date   : 2024-08-19 19:49 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | darknavi wrote:
       | I found this when looking for fun self hosted apps. It's pretty
       | bare bones but does seem to work well with articles I've found so
       | far.
        
       | j_maffe wrote:
       | has 12ft.io even been working anymore? I feel like the only
       | reliable way now is archive.is
        
         | 91bananas wrote:
         | I just had anecdotal success with it last week and the
         | atlantic, but before that it has been very hit and miss.
        
           | mvonballmo wrote:
           | I'm using the [Bypass
           | Paywalls](https://github.com/iamadamdev/bypass-paywalls-
           | chrome/blob/ma...) extension but it looks like that's been
           | DMCA-ed in the interim.
        
             | Kikawala wrote:
             | https://github.com/bpc-clone/bypass-paywalls-firefox-clean
             | or https://github.com/bpc-clone/bypass-paywalls-chrome-
             | clean
        
               | thecal wrote:
               | Which directs you to
               | https://gitflic.ru/project/magnolia1234/bpc_uploads which
               | is not ideal...
        
               | compuguy wrote:
               | I agree. Though there is a counterpoint that a Russian
               | host isn't going to respect a DMCA request. On the
               | flipside it's a Russian replacement for Github that is
               | based on Gogs, Gitea, or even Forgejo possibly. So yeah,
               | YMMV.
        
         | Gurathnaka wrote:
         | Very rarely works.
        
       | XCSme wrote:
       | I am not familiar with 12ft.io, I wanted to try it out, but I get
       | "Internal Server Error" when trying to visit a website.
        
       | refibrillator wrote:
       | Running a server just to set the user agent header to the
       | googlebot one for some requests feels a bit heavyweight.
       | 
       | But perhaps it's necessary, as it seems Firefox no longer has an
       | about:config option to override the user agent...am I missing it
       | somewhere?
       | 
       | Edit: The about:config option _general.useragent.override_ can be
       | created and will be used for all requests (I just tested). I was
       | confused because that config key doesn't exist in a fresh install
       | of Firefox. The user agent header string from this repo is: _"
       | Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
       | AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile
       | Safari/537.36 (compatible; Googlebot/2.1;
       | +http://www.google.com/bot.html)"_
        
         | Beijinger wrote:
         | This does not work anymore?
         | 
         | https://addons.mozilla.org/en-US/firefox/addon/random_user_a...
        
         | darknavi wrote:
         | Personally I find it nice for sending articles to friends.
        
           | samstave wrote:
           | You can make a search related function in FF by rightclicking
           | on that box and 'add keyword for this search'
           | https://i.imgur.com/AkMxqIj.png
           | 
           | and then in your browser just type the letter you assign it
           | to: for example, I have 'i' == the searchbox for IMDB, so I
           | type 'i [movie]' in my url and it brings up the IMDB search
           | of that movie . https://i.imgur.com/dXdwsbA.png
           | 
           | So you can just assign 'a' to that search box and type 'a
           | [URL]' in your address bar and it will submit it to your
           | little thing.
        
           | cortesoft wrote:
           | That would mean that your self-hosted install is exposed to
           | the internet. I don't think I want to run a publicly
           | accessible global relay.
        
             | KennyBlanken wrote:
             | Eh, pretty minimal risk unless you use a guessable hostname
             | and/or the URL gets published somewhere.
             | 
             | If the install is under
             | "eixk3.somedomain.com/ladderforfriends" and it sits behind
             | a reverse proxy, it might as well be invisible to the
             | internet, unless your DNS provider is an idiot and allows
             | zone transfers, or you are on a network where someone is
             | snarfing up DNS requests and then distributing that info to
             | third parties. If you restrict it to TLS 1.3, even someone
             | sniffing traffic from one of your friends won't be able to
             | glean anything useful, because the requested hostname is
             | never sent in plaintext.
             | 
             | Rotate the hostname if/when it becomes necessary...
        
               | WayToDoor wrote:
               | Your certificate will however show up in public
               | certificate transparency lists.
               | 
               | You could mitigate that with a wildcard cert, but still..
        
         | Zaheer wrote:
         | If this is all it's doing then you could also just use this
         | extension: https://requestly.com/
         | 
         | Create a rule to replace user agent with "Mozilla/5.0 (Linux;
         | Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36
         | (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36
         | (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
         | 
         | I just tried it and seems to work.
        
           | unethical_ban wrote:
           | I tried "User Agent Switcher" since it doesn't require a
           | login. Washingtonpost.com blocked me, and NYT won't load
           | article contents.
        
         | NoboruWataya wrote:
         | I use this extension which has a decent UI:
         | https://webextension.org/listing/useragent-switcher.html
        
         | codetrotter wrote:
         | > set the user agent header to the googlebot one
         | 
         | Also, how effective is this really? Don't the big news sites
         | check the IP address of the user agents that claim to be
         | GoogleBot?
        
           | mdotk wrote:
           | This. 12ft has never ever worked for me.
        
           | dutchmartin wrote:
           | If you would host that server on Google cloud, you would make
           | it a lot harder already.
        
             | jsheard wrote:
             | https://developers.google.com/search/docs/crawling-
             | indexing/...
             | 
             | They provide ways to verify Googlebot IPs specifically,
             | anyone who cares to check wouldn't be fooled by running a
             | fake Googlebot on Googles cloud.
             | 
             | Likewise with Bingbot:
             | https://www.bing.com/webmasters/help/how-to-verify-
             | bingbot-3...
        
               | KennyBlanken wrote:
               | yes, where "cares" means "the lost revenue is greater
               | than the cost of development, QA, and
               | computational/network/storage overhead, and the impact of
               | increased complexity, of a function that figures out
               | whether people are faking their user agent."
               | 
               | It's probably orders of magnitude _greater_ than the
               | revenue loss from the tiny minority of people doing such
               | things, especially given not everyone who uses tools like
               | these will become a subscriber if blocked, so that cuts
               | the  "lost" revenue down even further.
        
               | jsheard wrote:
               | Even if it's not worth an actual site operators time to
               | implement such a system themselves, WAFs like Cloudflare
               | could easily check the IP address of clients claiming to
               | be Googlebot/Bingbot and send them to CAPTCHA Hell on the
               | sites behalf if they're lying. That's pretty low hanging
               | fruit for a WAF, I would be surprised if they don't do
               | that.
               | 
               | edit: Indeed I just tried curling cloudflare.com with
               | Googlebots user agent and they immediately gave me the
               | finger (403) on the very first request.
        
               | ZoomerCretin wrote:
               | I sincerely hope the antitrust suit ends this practice
               | soon. This is so obviously anticompetitive.
        
               | cogman10 wrote:
               | How?
               | 
               | I also think the antitrust suit (and many more) need to
               | happen for more obvious things like buying out
               | competitors. However, how does publishing a list of valid
               | IPs for their web crawlers constitute anticompetitive
               | behavior? Anyone can publish a similar list, and any
               | company can choose to reference those lists.
        
               | arrosenberg wrote:
               | It allows Google to access data that is denied to
               | competitors. It's a clear example of Google using its
               | market power to suppress competition.
        
               | 8organicbits wrote:
               | Hmm, the robots.txt, IP blocking, and user agent blocking
               | are all policies chosen by the web server hosting the
               | data. If web admins choose to block Google competitors,
               | I'm not sure that's on Google. Can you clarify?
        
               | jsheard wrote:
               | In general I would agree, but Google recently struck a
               | data licensing deal with Reddit which resulted in Reddit
               | blocking _all_ scrapers and crawlers with the sole
               | exception of Googlebot (even Bing and by extension
               | DuckDuckGo are blocked). That is a case of Google
               | actively locking down the web so their competitors can 't
               | see it.
               | 
               | https://searchengineland.com/microsoft-confirms-reddit-
               | block...
        
         | judge2020 wrote:
         | I always do DevTools -> Network Conditions to set UA, at least
         | in Chrome.
        
       | xyst wrote:
       | I'm more inclined to use archive(.org|.ph). But this is a decent
       | workaround when archive is unavailable.
       | 
       | Side note: paywalls are annoying but most publications are often
       | available for free via public library.
       | 
       | For example, NYT is free via my public library. PL offers 3-day
       | subs. A few other decent publications are available as well.
       | Availability of publications is YMMV as well.
        
         | pmdr wrote:
         | NYT's onion site is also free.
        
         | latexr wrote:
         | > most publications are often available for free via public
         | library.
         | 
         | Via public library _in the USA_. Other countries exist and as
         | far as I've gathered aren't typically given this level of free
         | access.
        
           | xyst wrote:
           | Hence "ymmv" (your mileage may vary) ;)
        
             | elashri wrote:
             | Ironically this sentence origin is in the US context. And
             | the abbreviation is mostly used in American English slang
             | [1].
             | 
             | Please don't take it as attack or even criticism, I just
             | found it funny observation. That might be wrong
             | 
             | [1] https://en.wiktionary.org/wiki/your_mileage_may_vary
        
       | mattbillenstein wrote:
       | Counterpoint - if you like the content enough to go through this
       | - just pay for it. Monetary support of journalism or content you
       | like is a great way to encourage more of it.
        
         | ed wrote:
         | I fully agree with the sentiment! I support and do pay for
         | sources I read frequently.
         | 
         | Sadly payment models are incompatible with how most people
         | consume content - which is to read a small number of articles
         | from a large number of sources.
        
         | naltroc wrote:
         | as much as I circumvent paywalls myself, it does feel like
         | overkill to setup software to do it always. Sites spend money
         | to produce quality content.
         | 
         | Somewhat related comparison, Is a human choosing to do this
         | theft really better than a neural network scraping content for
         | its own purposes?
        
           | pjot wrote:
           | > Is a human choosing to do this theft really better than a
           | neural network scraping content
           | 
           | Probably so. I think the differentiation is in the scale at
           | which scraping is done for those kinds of systems.
        
           | fwip wrote:
           | The neural network is not scraping content for its own
           | purposes, it is for the purpose of the people who are
           | running/training it.
           | 
           | And yes, one person reading a piece of content without paying
           | money for it is far, far better than one person/corporation
           | scraping all of the world's content in order to profit off of
           | it.
        
           | latexr wrote:
           | > Somewhat related comparison, Is a human choosing to do this
           | theft really better than a neural network scraping content
           | for its own purposes?
           | 
           | Here's a similar comparison: "Is a human recording a movie at
           | the theatre to rewatch at home really better than the one who
           | shares the recording online?"
           | 
           | Seeing as you're calling it "theft", presumably what you mean
           | by "better" is "causes less harm / loss of revenue to the
           | work's author / publisher".
           | 
           | I'd say the answer is pretty clear. How many people go
           | through the trouble of bypassing paywalls VS how many use
           | LLMs?
           | 
           | Saying "a neural network scraping content for its own
           | purposes" doesn't even begin to tell the whole story. Setting
           | aside the neural network is unlikely to be doing the scraping
           | (but being trained on it), it's not "for its own purpose", it
           | didn't choose to willy nilly scrape the internet, it was
           | ordered to by a human (typically) intending to profit from
           | it.
        
         | setr wrote:
         | Paying for it doesn't make the site less miserable to use. One
         | of the stupid things about piracy is that it tends to also be
         | the best available version of the thing. You're actively worse
         | off having paid for it. (Ads, denial, DRM in general, MBs of
         | irrelevant JS, etc don't go away with money, but do with
         | piracy)
        
           | a1o wrote:
           | This right here. It would be nice to have some perk like you
           | can read the articles through Gopher.
        
           | janalsncm wrote:
           | Case in point: paying for the New York Times doesn't block
           | ads in their app.
        
         | cortesoft wrote:
         | I know of very few sites that let you pay to get zero ads or
         | affiliate links. The ones that let you pay still show you
         | affiliate links.
        
         | elondaits wrote:
         | I agree, but would like for a way to pay for an article, or a
         | single day, week, or month of access. Just like I could buy a
         | single one-off issue of a publication a couple of times before
         | starting a long term relationship with it. Not all publications
         | support this, and some like the NY Times require chatting with
         | a representative to cancel the subscription. I see a lot of
         | talk about physical media around film and music, but not being
         | able to buy single issues of any magazine or newspaper
         | anonymously when the circumstances call for it, is a great loss
         | for public discourse.
        
           | cflewis wrote:
           | I feel like there were companies in the past that did try
           | this, where you would chuck $5 or whatever in an account, and
           | then each page you went to that supported the service would
           | extract a micropayment from the account.
           | 
           | Never took off. Should have done. e.g. in Santa Cruz there is
           | https://lookout.co , which is pretty good, but extremely
           | pricy for what it is. There has to be a way between "pay and
           | get everything", "ignore/go straight to 12ft.io".
        
         | jszymborski wrote:
         | Countercounterpoint - Maybe I have news subscriptions for
         | periodicals I regularly read, but don't feel like paying for a
         | monthly subscription to read one random article from some news
         | outlet I don't regularly read that someone linked on social
         | media or HN.
        
           | spondylosaurus wrote:
           | ^ This describes my experience as well. And there are certain
           | outlets where I'll read an interesting article if someone
           | links it, but don't want to give them money due to my
           | objection with <xyz> editorial practices.
        
           | shanecleveland wrote:
           | So back out of the webpage and don't read it. That is a
           | constructive way of letting a content producer know their
           | user experience is not worth the "expense" of consuming their
           | product. But if the content is worth your time and energy to
           | consume, pay the "price" of admission.
        
             | cooper_ganglia wrote:
             | I back out of the webpage and go to 12ft.io, which allows
             | me to both, read the article, while simultaneously using
             | that constructive way of letting the publisher know that
             | their product is not worth it's price.
        
               | shanecleveland wrote:
               | And then 12ft-dot-io throws an error, but still shows its
               | own ad in the bottom right corner! But you probably knew
               | that since you constructively use them.
        
             | gmiller123456 wrote:
             | This assumes the their presence has no affect on me. It
             | takes time to click a page and let it load, and more time
             | to dig through all of the results when all of them are
             | unreadable. Maybe if there were a tag like
             | [ungodlyamountofads] on each, it would help. But even then
             | I'd still have to scroll through them.
        
               | shanecleveland wrote:
               | I guess I fail to see how one can entirely remove how
               | fully voluntary the visiting of a webpage is. It is how
               | the web works! And how all kinds of "free" media has
               | worked for eons.
               | 
               | I don't mean to excuse incredibly poor user experience
               | design, and certainly not abusive tactics. But sorry if I
               | have zero empathy for your clicking, loading and
               | scrolling pain. Leave the website! It is amazing how many
               | people are defending a site that claims to "Remove
               | popups, banners, and ads" while: 1 - failing to even
               | work. and: 2 - shows it's an ad on the resulting page!
        
         | tomrod wrote:
         | Pay walls don't get folks there, how ever noble the sellers of
         | information they to brand it.
        
         | Teever wrote:
         | Is there a way to pay for journalistic content that doesn't
         | involve participating in the extensive tracking that those
         | websites perform on their visitors?
         | 
         | I love to read the news but I don't love that the news reads
         | me.
        
           | adamomada wrote:
           | I actually loled when I went to CNN with Little Snitch on,
           | there were over one hundred different third-party domains it
           | wanted to connect to
        
           | Marsymars wrote:
           | > Is there a way to pay for journalistic content that doesn't
           | involve participating in the extensive tracking that those
           | websites perform on their visitors?
           | 
           | Well you could buy physical newspapers/magazines. (Or access
           | content via OTA TV / the library.)
        
         | ricardobeat wrote:
         | It would cost me about $500/month if I subscribe to every
         | paywall that appears in front of me.
        
         | shanecleveland wrote:
         | 100%. And sometimes that form of payment is putting up with
         | ads, etc. I routinely back out of sites that suddenly take over
         | the screen with a popup or take up large chunks with video or
         | animations. Same as opting not to go in a particular store. But
         | I also stick around and occasionally use products advertised to
         | me. Shocking, I know.
        
         | Timpy wrote:
         | I wasn't even thinking about paywalls, the first thing I did
         | was check to see if cookie banners and "Sign in with Google"
         | popups went away. There's so many user-unfriendly things that
         | you constantly deal with, any amount of browsing is just a bad
         | experience without putting up defenses like this.
        
         | cooper_ganglia wrote:
         | I will never willingly give a journalist my money.
        
           | linsomniac wrote:
           | I'm curious why. IMHO, they are true heroes.
        
         | 627467 wrote:
         | why pay a monthly subscription if we're going to be bombarded
         | by legally required popups and other internal promotional stuff
         | that hooks you to the site anyway?
        
         | notatoad wrote:
         | I pay for the sites I visit regularly.
         | 
         | But when somebody shares an article with me and I want to see
         | what I've been sent, I'm not going to buy a $15 monthly
         | subscription to some small-town newspaper in Ohio just because
         | they've decided to paywall their content in that way.
        
         | bubblethink wrote:
         | No. Paywalled content should not be indexed by search engines.
         | The implicit contract I have with the search engine is that it
         | is showing me things that I can see. The publishers and search
         | engines pulled a bait and switch here by whitelisting
         | googlebot. So it's fair game to view the publisher's website
         | with googlebot. That's what the engineers spent their time
         | working on. It would be unfair to let that go to waste.
        
           | Marsymars wrote:
           | I dunno, it seems more like there should be a user-
           | configurable setting to hide/show paywalled content.
           | 
           | If you're looking for something, and it's only available
           | behind a paywall (even a paywall you pay for!), how are you
           | going to find it if it's not indexed?
        
       | mgiampapa wrote:
       | Bypass Paywalls Clean has moved here btw, https://github.com/bpc-
       | clone?tab=repositories
        
         | pogue wrote:
         | BPC also has the option to spoof the User-agent as well when
         | using the "custom sites" option:
         | 
         | * set useragent to Googlebot, Bingbot, Facebookbot or custom
         | 
         | * set referer (to Facebook, Google, Twitter or custom; ignored
         | when Googlebot is set)
        
         | compuguy wrote:
         | Yes, but it may be taken down via DMCA soon. See this DMCA
         | request:
         | https://github.com/github/dmca/blob/master/2024/08/2024-08-0...
         | 
         | It mentions bpc_updates in the takedown request....
        
       | karmakaze wrote:
       | Missed opportunity to call it 2ft, as in standing on one's own.
        
         | trackofalljades wrote:
         | ...or 11ft8, which can open _anything_
        
           | spoonfeeder006 wrote:
           | 666ft....
        
       | sam_goody wrote:
       | It seems to me that google should not allow a site to serve
       | different content to their bot than they serve to their users. If
       | the content is unavailable to me, it should not be in the search
       | results.
       | 
       | It obviously doesn't seem that way to Google, or to the sites
       | providing the content.
       | 
       | They are doing what works for them without ethical constraints
       | (Google definitely, many content providers, eg NYT). Is it fair
       | game to do what works for you (eg. 13ft)?!
        
         | rurp wrote:
         | > It seems to me that google should not allow a site to serve
         | different content to their bot than they serve to their users.
         | 
         | That would be the fair thing to do and was Google's policy for
         | many years, and still is for all I know. But modern Google
         | stopped caring about fairness and similar concerns many years
         | ago.
        
         | efilife wrote:
         | This is called _cloaking_ [0] and is against Google's policies
         | for many years. But they don't care
         | 
         | [0] https://en.wikipedia.org/wiki/Cloaking
        
         | justinl33 wrote:
         | "organizing the world's information"
        
       | spoonfeeder006 wrote:
       | Why not just use uBlock Origin for the aspect of cleaning up the
       | popups / ads and such?
        
       | declan_roberts wrote:
       | 12ft.io doesn't really work anymore.
       | 
       | If you're on iOS + Safari I recommend the "open in internet
       | archive" shortcut, which is actually able to bypass most
       | paywalls.
       | 
       | https://www.reddit.com/r/shortcuts/comments/12fbk8m/ive_crea...
        
       | Zambyte wrote:
       | This is awesome! People who use Kagi can also set up a regex
       | redirect to automatically use this for problematic sites.
        
       | linsomniac wrote:
       | I'll gladly pay for journalist content, but not when a single
       | article is going to be $15/mo and hard to cancel.
       | 
       | Is there some way to support journalism across publications?
        
         | warkdarrior wrote:
         | Articles should come with license agreements, just like open
         | source software nowadays. Free for personal entertainment, but
         | if you try to make money from the information in the article or
         | otherwise commercialize it, you can fuck right off.
        
         | Marsymars wrote:
         | > Is there some way to support journalism across publications?
         | 
         | Apple News+, Inkl, PressReader, (maybe more). Others if you
         | want more magazine-focused subscriptions.
        
       | deskr wrote:
       | It once was Google's requirement that you'd serve the same
       | content to the Google crawler as to any other user. No surprise
       | that Google is full of shit these days.
        
         | justinl33 wrote:
         | "organizing the world's information" or 'maximizing revenue?' I
         | don't know - somehow either argument justifies this
        
       | bansheeps wrote:
       | I continue my search for a pay wall remover that will work with
       | The Information. I'm honestly impressed that I've never been able
       | to read an Information article in full.
        
       | BLKNSLVR wrote:
       | This could be used as a proxy to web interfaces on the same local
       | network couldn't it?
       | 
       | There are probably much better and more secure options, but this
       | might be an interesting temporary kludge.
        
       | ThinkBeat wrote:
       | Does it help when pretending to the google bot to be running on
       | an IP from inside the Google Cloud?
        
       | hammock wrote:
       | Now if someone could just package this into a browser extension
       | it would be great!
        
       ___________________________________________________________________
       (page generated 2024-08-19 23:00 UTC)