[HN Gopher] 13ft - A site similar to 12ft.io but self-hosted
___________________________________________________________________
13ft - A site similar to 12ft.io but self-hosted
Author : darknavi
Score : 191 points
Date : 2024-08-19 19:49 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| darknavi wrote:
| I found this when looking for fun self hosted apps. It's pretty
| bare bones but does seem to work well with articles I've found so
| far.
| j_maffe wrote:
| has 12ft.io even been working anymore? I feel like the only
| reliable way now is archive.is
| 91bananas wrote:
| I just had anecdotal success with it last week and the
| atlantic, but before that it has been very hit and miss.
| mvonballmo wrote:
| I'm using the [Bypass
| Paywalls](https://github.com/iamadamdev/bypass-paywalls-
| chrome/blob/ma...) extension but it looks like that's been
| DMCA-ed in the interim.
| Kikawala wrote:
| https://github.com/bpc-clone/bypass-paywalls-firefox-clean
| or https://github.com/bpc-clone/bypass-paywalls-chrome-
| clean
| thecal wrote:
| Which directs you to
| https://gitflic.ru/project/magnolia1234/bpc_uploads which
| is not ideal...
| compuguy wrote:
| I agree. Though there is a counterpoint that a Russian
| host isn't going to respect a DMCA request. On the
| flipside it's a Russian replacement for Github that is
| based on Gogs, Gitea, or even Forgejo possibly. So yeah,
| YMMV.
| Gurathnaka wrote:
| Very rarely works.
| XCSme wrote:
| I am not familiar with 12ft.io, I wanted to try it out, but I get
| "Internal Server Error" when trying to visit a website.
| refibrillator wrote:
| Running a server just to set the user agent header to the
| googlebot one for some requests feels a bit heavyweight.
|
| But perhaps it's necessary, as it seems Firefox no longer has an
| about:config option to override the user agent...am I missing it
| somewhere?
|
| Edit: The about:config option _general.useragent.override_ can be
| created and will be used for all requests (I just tested). I was
| confused because that config key doesn't exist in a fresh install
| of Firefox. The user agent header string from this repo is: _"
| Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
| AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile
| Safari/537.36 (compatible; Googlebot/2.1;
| +http://www.google.com/bot.html)"_
| Beijinger wrote:
| This does not work anymore?
|
| https://addons.mozilla.org/en-US/firefox/addon/random_user_a...
| darknavi wrote:
| Personally I find it nice for sending articles to friends.
| samstave wrote:
| You can make a search related function in FF by rightclicking
| on that box and 'add keyword for this search'
| https://i.imgur.com/AkMxqIj.png
|
| and then in your browser just type the letter you assign it
| to: for example, I have 'i' == the searchbox for IMDB, so I
| type 'i [movie]' in my url and it brings up the IMDB search
| of that movie . https://i.imgur.com/dXdwsbA.png
|
| So you can just assign 'a' to that search box and type 'a
| [URL]' in your address bar and it will submit it to your
| little thing.
| cortesoft wrote:
| That would mean that your self-hosted install is exposed to
| the internet. I don't think I want to run a publicly
| accessible global relay.
| KennyBlanken wrote:
| Eh, pretty minimal risk unless you use a guessable hostname
| and/or the URL gets published somewhere.
|
| If the install is under
| "eixk3.somedomain.com/ladderforfriends" and it sits behind
| a reverse proxy, it might as well be invisible to the
| internet, unless your DNS provider is an idiot and allows
| zone transfers, or you are on a network where someone is
| snarfing up DNS requests and then distributing that info to
| third parties. If you restrict it to TLS 1.3, even someone
| sniffing traffic from one of your friends won't be able to
| glean anything useful, because the requested hostname is
| never sent in plaintext.
|
| Rotate the hostname if/when it becomes necessary...
| WayToDoor wrote:
| Your certificate will however show up in public
| certificate transparency lists.
|
| You could mitigate that with a wildcard cert, but still..
| Zaheer wrote:
| If this is all it's doing then you could also just use this
| extension: https://requestly.com/
|
| Create a rule to replace user agent with "Mozilla/5.0 (Linux;
| Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36
| (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36
| (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
|
| I just tried it and seems to work.
| unethical_ban wrote:
| I tried "User Agent Switcher" since it doesn't require a
| login. Washingtonpost.com blocked me, and NYT won't load
| article contents.
| NoboruWataya wrote:
| I use this extension which has a decent UI:
| https://webextension.org/listing/useragent-switcher.html
| codetrotter wrote:
| > set the user agent header to the googlebot one
|
| Also, how effective is this really? Don't the big news sites
| check the IP address of the user agents that claim to be
| GoogleBot?
| mdotk wrote:
| This. 12ft has never ever worked for me.
| dutchmartin wrote:
| If you would host that server on Google cloud, you would make
| it a lot harder already.
| jsheard wrote:
| https://developers.google.com/search/docs/crawling-
| indexing/...
|
| They provide ways to verify Googlebot IPs specifically,
| anyone who cares to check wouldn't be fooled by running a
| fake Googlebot on Googles cloud.
|
| Likewise with Bingbot:
| https://www.bing.com/webmasters/help/how-to-verify-
| bingbot-3...
| KennyBlanken wrote:
| yes, where "cares" means "the lost revenue is greater
| than the cost of development, QA, and
| computational/network/storage overhead, and the impact of
| increased complexity, of a function that figures out
| whether people are faking their user agent."
|
| It's probably orders of magnitude _greater_ than the
| revenue loss from the tiny minority of people doing such
| things, especially given not everyone who uses tools like
| these will become a subscriber if blocked, so that cuts
| the "lost" revenue down even further.
| jsheard wrote:
| Even if it's not worth an actual site operators time to
| implement such a system themselves, WAFs like Cloudflare
| could easily check the IP address of clients claiming to
| be Googlebot/Bingbot and send them to CAPTCHA Hell on the
| sites behalf if they're lying. That's pretty low hanging
| fruit for a WAF, I would be surprised if they don't do
| that.
|
| edit: Indeed I just tried curling cloudflare.com with
| Googlebots user agent and they immediately gave me the
| finger (403) on the very first request.
| ZoomerCretin wrote:
| I sincerely hope the antitrust suit ends this practice
| soon. This is so obviously anticompetitive.
| cogman10 wrote:
| How?
|
| I also think the antitrust suit (and many more) need to
| happen for more obvious things like buying out
| competitors. However, how does publishing a list of valid
| IPs for their web crawlers constitute anticompetitive
| behavior? Anyone can publish a similar list, and any
| company can choose to reference those lists.
| arrosenberg wrote:
| It allows Google to access data that is denied to
| competitors. It's a clear example of Google using its
| market power to suppress competition.
| 8organicbits wrote:
| Hmm, the robots.txt, IP blocking, and user agent blocking
| are all policies chosen by the web server hosting the
| data. If web admins choose to block Google competitors,
| I'm not sure that's on Google. Can you clarify?
| jsheard wrote:
| In general I would agree, but Google recently struck a
| data licensing deal with Reddit which resulted in Reddit
| blocking _all_ scrapers and crawlers with the sole
| exception of Googlebot (even Bing and by extension
| DuckDuckGo are blocked). That is a case of Google
| actively locking down the web so their competitors can 't
| see it.
|
| https://searchengineland.com/microsoft-confirms-reddit-
| block...
| judge2020 wrote:
| I always do DevTools -> Network Conditions to set UA, at least
| in Chrome.
| xyst wrote:
| I'm more inclined to use archive(.org|.ph). But this is a decent
| workaround when archive is unavailable.
|
| Side note: paywalls are annoying but most publications are often
| available for free via public library.
|
| For example, NYT is free via my public library. PL offers 3-day
| subs. A few other decent publications are available as well.
| Availability of publications is YMMV as well.
| pmdr wrote:
| NYT's onion site is also free.
| latexr wrote:
| > most publications are often available for free via public
| library.
|
| Via public library _in the USA_. Other countries exist and as
| far as I've gathered aren't typically given this level of free
| access.
| xyst wrote:
| Hence "ymmv" (your mileage may vary) ;)
| elashri wrote:
| Ironically this sentence origin is in the US context. And
| the abbreviation is mostly used in American English slang
| [1].
|
| Please don't take it as attack or even criticism, I just
| found it funny observation. That might be wrong
|
| [1] https://en.wiktionary.org/wiki/your_mileage_may_vary
| mattbillenstein wrote:
| Counterpoint - if you like the content enough to go through this
| - just pay for it. Monetary support of journalism or content you
| like is a great way to encourage more of it.
| ed wrote:
| I fully agree with the sentiment! I support and do pay for
| sources I read frequently.
|
| Sadly payment models are incompatible with how most people
| consume content - which is to read a small number of articles
| from a large number of sources.
| naltroc wrote:
| as much as I circumvent paywalls myself, it does feel like
| overkill to setup software to do it always. Sites spend money
| to produce quality content.
|
| Somewhat related comparison, Is a human choosing to do this
| theft really better than a neural network scraping content for
| its own purposes?
| pjot wrote:
| > Is a human choosing to do this theft really better than a
| neural network scraping content
|
| Probably so. I think the differentiation is in the scale at
| which scraping is done for those kinds of systems.
| fwip wrote:
| The neural network is not scraping content for its own
| purposes, it is for the purpose of the people who are
| running/training it.
|
| And yes, one person reading a piece of content without paying
| money for it is far, far better than one person/corporation
| scraping all of the world's content in order to profit off of
| it.
| latexr wrote:
| > Somewhat related comparison, Is a human choosing to do this
| theft really better than a neural network scraping content
| for its own purposes?
|
| Here's a similar comparison: "Is a human recording a movie at
| the theatre to rewatch at home really better than the one who
| shares the recording online?"
|
| Seeing as you're calling it "theft", presumably what you mean
| by "better" is "causes less harm / loss of revenue to the
| work's author / publisher".
|
| I'd say the answer is pretty clear. How many people go
| through the trouble of bypassing paywalls VS how many use
| LLMs?
|
| Saying "a neural network scraping content for its own
| purposes" doesn't even begin to tell the whole story. Setting
| aside the neural network is unlikely to be doing the scraping
| (but being trained on it), it's not "for its own purpose", it
| didn't choose to willy nilly scrape the internet, it was
| ordered to by a human (typically) intending to profit from
| it.
| setr wrote:
| Paying for it doesn't make the site less miserable to use. One
| of the stupid things about piracy is that it tends to also be
| the best available version of the thing. You're actively worse
| off having paid for it. (Ads, denial, DRM in general, MBs of
| irrelevant JS, etc don't go away with money, but do with
| piracy)
| a1o wrote:
| This right here. It would be nice to have some perk like you
| can read the articles through Gopher.
| janalsncm wrote:
| Case in point: paying for the New York Times doesn't block
| ads in their app.
| cortesoft wrote:
| I know of very few sites that let you pay to get zero ads or
| affiliate links. The ones that let you pay still show you
| affiliate links.
| elondaits wrote:
| I agree, but would like for a way to pay for an article, or a
| single day, week, or month of access. Just like I could buy a
| single one-off issue of a publication a couple of times before
| starting a long term relationship with it. Not all publications
| support this, and some like the NY Times require chatting with
| a representative to cancel the subscription. I see a lot of
| talk about physical media around film and music, but not being
| able to buy single issues of any magazine or newspaper
| anonymously when the circumstances call for it, is a great loss
| for public discourse.
| cflewis wrote:
| I feel like there were companies in the past that did try
| this, where you would chuck $5 or whatever in an account, and
| then each page you went to that supported the service would
| extract a micropayment from the account.
|
| Never took off. Should have done. e.g. in Santa Cruz there is
| https://lookout.co , which is pretty good, but extremely
| pricy for what it is. There has to be a way between "pay and
| get everything", "ignore/go straight to 12ft.io".
| jszymborski wrote:
| Countercounterpoint - Maybe I have news subscriptions for
| periodicals I regularly read, but don't feel like paying for a
| monthly subscription to read one random article from some news
| outlet I don't regularly read that someone linked on social
| media or HN.
| spondylosaurus wrote:
| ^ This describes my experience as well. And there are certain
| outlets where I'll read an interesting article if someone
| links it, but don't want to give them money due to my
| objection with <xyz> editorial practices.
| shanecleveland wrote:
| So back out of the webpage and don't read it. That is a
| constructive way of letting a content producer know their
| user experience is not worth the "expense" of consuming their
| product. But if the content is worth your time and energy to
| consume, pay the "price" of admission.
| cooper_ganglia wrote:
| I back out of the webpage and go to 12ft.io, which allows
| me to both, read the article, while simultaneously using
| that constructive way of letting the publisher know that
| their product is not worth it's price.
| shanecleveland wrote:
| And then 12ft-dot-io throws an error, but still shows its
| own ad in the bottom right corner! But you probably knew
| that since you constructively use them.
| gmiller123456 wrote:
| This assumes the their presence has no affect on me. It
| takes time to click a page and let it load, and more time
| to dig through all of the results when all of them are
| unreadable. Maybe if there were a tag like
| [ungodlyamountofads] on each, it would help. But even then
| I'd still have to scroll through them.
| shanecleveland wrote:
| I guess I fail to see how one can entirely remove how
| fully voluntary the visiting of a webpage is. It is how
| the web works! And how all kinds of "free" media has
| worked for eons.
|
| I don't mean to excuse incredibly poor user experience
| design, and certainly not abusive tactics. But sorry if I
| have zero empathy for your clicking, loading and
| scrolling pain. Leave the website! It is amazing how many
| people are defending a site that claims to "Remove
| popups, banners, and ads" while: 1 - failing to even
| work. and: 2 - shows it's an ad on the resulting page!
| tomrod wrote:
| Pay walls don't get folks there, how ever noble the sellers of
| information they to brand it.
| Teever wrote:
| Is there a way to pay for journalistic content that doesn't
| involve participating in the extensive tracking that those
| websites perform on their visitors?
|
| I love to read the news but I don't love that the news reads
| me.
| adamomada wrote:
| I actually loled when I went to CNN with Little Snitch on,
| there were over one hundred different third-party domains it
| wanted to connect to
| Marsymars wrote:
| > Is there a way to pay for journalistic content that doesn't
| involve participating in the extensive tracking that those
| websites perform on their visitors?
|
| Well you could buy physical newspapers/magazines. (Or access
| content via OTA TV / the library.)
| ricardobeat wrote:
| It would cost me about $500/month if I subscribe to every
| paywall that appears in front of me.
| shanecleveland wrote:
| 100%. And sometimes that form of payment is putting up with
| ads, etc. I routinely back out of sites that suddenly take over
| the screen with a popup or take up large chunks with video or
| animations. Same as opting not to go in a particular store. But
| I also stick around and occasionally use products advertised to
| me. Shocking, I know.
| Timpy wrote:
| I wasn't even thinking about paywalls, the first thing I did
| was check to see if cookie banners and "Sign in with Google"
| popups went away. There's so many user-unfriendly things that
| you constantly deal with, any amount of browsing is just a bad
| experience without putting up defenses like this.
| cooper_ganglia wrote:
| I will never willingly give a journalist my money.
| linsomniac wrote:
| I'm curious why. IMHO, they are true heroes.
| 627467 wrote:
| why pay a monthly subscription if we're going to be bombarded
| by legally required popups and other internal promotional stuff
| that hooks you to the site anyway?
| notatoad wrote:
| I pay for the sites I visit regularly.
|
| But when somebody shares an article with me and I want to see
| what I've been sent, I'm not going to buy a $15 monthly
| subscription to some small-town newspaper in Ohio just because
| they've decided to paywall their content in that way.
| bubblethink wrote:
| No. Paywalled content should not be indexed by search engines.
| The implicit contract I have with the search engine is that it
| is showing me things that I can see. The publishers and search
| engines pulled a bait and switch here by whitelisting
| googlebot. So it's fair game to view the publisher's website
| with googlebot. That's what the engineers spent their time
| working on. It would be unfair to let that go to waste.
| Marsymars wrote:
| I dunno, it seems more like there should be a user-
| configurable setting to hide/show paywalled content.
|
| If you're looking for something, and it's only available
| behind a paywall (even a paywall you pay for!), how are you
| going to find it if it's not indexed?
| mgiampapa wrote:
| Bypass Paywalls Clean has moved here btw, https://github.com/bpc-
| clone?tab=repositories
| pogue wrote:
| BPC also has the option to spoof the User-agent as well when
| using the "custom sites" option:
|
| * set useragent to Googlebot, Bingbot, Facebookbot or custom
|
| * set referer (to Facebook, Google, Twitter or custom; ignored
| when Googlebot is set)
| compuguy wrote:
| Yes, but it may be taken down via DMCA soon. See this DMCA
| request:
| https://github.com/github/dmca/blob/master/2024/08/2024-08-0...
|
| It mentions bpc_updates in the takedown request....
| karmakaze wrote:
| Missed opportunity to call it 2ft, as in standing on one's own.
| trackofalljades wrote:
| ...or 11ft8, which can open _anything_
| spoonfeeder006 wrote:
| 666ft....
| sam_goody wrote:
| It seems to me that google should not allow a site to serve
| different content to their bot than they serve to their users. If
| the content is unavailable to me, it should not be in the search
| results.
|
| It obviously doesn't seem that way to Google, or to the sites
| providing the content.
|
| They are doing what works for them without ethical constraints
| (Google definitely, many content providers, eg NYT). Is it fair
| game to do what works for you (eg. 13ft)?!
| rurp wrote:
| > It seems to me that google should not allow a site to serve
| different content to their bot than they serve to their users.
|
| That would be the fair thing to do and was Google's policy for
| many years, and still is for all I know. But modern Google
| stopped caring about fairness and similar concerns many years
| ago.
| efilife wrote:
| This is called _cloaking_ [0] and is against Google's policies
| for many years. But they don't care
|
| [0] https://en.wikipedia.org/wiki/Cloaking
| justinl33 wrote:
| "organizing the world's information"
| spoonfeeder006 wrote:
| Why not just use uBlock Origin for the aspect of cleaning up the
| popups / ads and such?
| declan_roberts wrote:
| 12ft.io doesn't really work anymore.
|
| If you're on iOS + Safari I recommend the "open in internet
| archive" shortcut, which is actually able to bypass most
| paywalls.
|
| https://www.reddit.com/r/shortcuts/comments/12fbk8m/ive_crea...
| Zambyte wrote:
| This is awesome! People who use Kagi can also set up a regex
| redirect to automatically use this for problematic sites.
| linsomniac wrote:
| I'll gladly pay for journalist content, but not when a single
| article is going to be $15/mo and hard to cancel.
|
| Is there some way to support journalism across publications?
| warkdarrior wrote:
| Articles should come with license agreements, just like open
| source software nowadays. Free for personal entertainment, but
| if you try to make money from the information in the article or
| otherwise commercialize it, you can fuck right off.
| Marsymars wrote:
| > Is there some way to support journalism across publications?
|
| Apple News+, Inkl, PressReader, (maybe more). Others if you
| want more magazine-focused subscriptions.
| deskr wrote:
| It once was Google's requirement that you'd serve the same
| content to the Google crawler as to any other user. No surprise
| that Google is full of shit these days.
| justinl33 wrote:
| "organizing the world's information" or 'maximizing revenue?' I
| don't know - somehow either argument justifies this
| bansheeps wrote:
| I continue my search for a pay wall remover that will work with
| The Information. I'm honestly impressed that I've never been able
| to read an Information article in full.
| BLKNSLVR wrote:
| This could be used as a proxy to web interfaces on the same local
| network couldn't it?
|
| There are probably much better and more secure options, but this
| might be an interesting temporary kludge.
| ThinkBeat wrote:
| Does it help when pretending to the google bot to be running on
| an IP from inside the Google Cloud?
| hammock wrote:
| Now if someone could just package this into a browser extension
| it would be great!
___________________________________________________________________
(page generated 2024-08-19 23:00 UTC)