[HN Gopher] Compiler Explorer and the promise of URLs that last ...
___________________________________________________________________
Compiler Explorer and the promise of URLs that last forever
Author : anarazel
Score : 338 points
Date : 2025-05-28 16:28 UTC (1 days ago)
(HTM) web link (xania.org)
(TXT) w3m dump (xania.org)
| jimmyl02 wrote:
| This is great perspective about how assumptions play out over
| longer period of time. I think that this risk is much greater for
| _free_ third party services for critical infrastructure.
|
| Someone has to foot the bill somewhere and if there isn't a
| source of income then the project is bound to be unsupported
| eventually.
| tekacs wrote:
| I think I would struggle to say that free services die at a
| higher rate consistently...
|
| So many paid offerings, whether from startups or even from
| large companies, have been sunset over time, often with
| frustratingly short migration periods.
|
| If anything, I feel like I can think of more paid services that
| have given their users short migration periods than free ones.
| lqstuart wrote:
| Counterexample: the Linux kernel
| charcircuit wrote:
| How? Big tech foots the bill.
| shlomo_z wrote:
| But goo.gl is also big tech...
| 0x1ceb00da wrote:
| Google wasn't making money off of goo.gl
| johannes1234321 wrote:
| But what did it cost them? Especially in read only mode
|
| Sure, an service more to monitor, while for the most part
| "fix by restart" is a good enough approach. And then once
| in a while have an intern switching to latest backend
| choice.
| iainmerrick wrote:
| Linux isn't a service (in the SaaS sense).
| cortesoft wrote:
| Nah, businesses go under all the time, whether their services
| are paid or not.
| amiga386 wrote:
| https://killedbygoogle.com/
|
| > Google Go Links (2010-2021)
|
| > Killed about 4 years ago, (also known as Google Short Links)
| was a URL shortening service. It also supported custom domain for
| customers of Google Workspace (formerly G Suite (formerly Google
| Apps)). It was about 11 years old.
| zerocrates wrote:
| "Killing" the service in the sense of minting new ones is no
| big deal and hardly merits mention.
|
| Killing the existing ones is much more of a jerk move.
| Particularly so since Google is still keeping it around in some
| form for internal use by their own apps.
| ruune wrote:
| Don't they use https://g.co now? Or are there still new
| internal goo.gl links created?
|
| Edit: Google is using a g.co link on the "Your device is
| booting another OS" screen that appears when booting up my
| Pixel running GrapheneOS. Will be awkward when they kill that
| service and the hard coded link in the phones bios is just
| dead
| zerocrates wrote:
| Google Maps creates "maps.app.goo.gl" links; I don't know
| if there are others, they called Maps out specifically in
| their message.
|
| Possibly those other ones are just using the domain name
| and the underlying service is totally different, not sure.
| mananaysiempre wrote:
| May be worth cooperating with ArchiveTeam's project[1] on Goo.gl?
|
| > url shortening was a fucking awful idea[2]
|
| [1] https://wiki.archiveteam.org/index.php/Goo.gl
|
| [2] https://wiki.archiveteam.org/index.php/URLTeam
| MallocVoidstar wrote:
| IIRC ArchiveTeam were bruteforcing Goo.gl short URLs, not going
| through 'known' links, so I'd assume they have many/all of
| Compiler Explorer's URLs. (So, good idea to contact them)
| tech234a wrote:
| Real-time status for that project indicates 7.5 billion goo.gl
| URLs found out of 42 billion goo.gl URLs scanned:
| https://tracker.archiveteam.org:1338/status
| shepmaster wrote:
| As we all know, Cool URIs don't change [1]. I greatly appreciate
| the care taken to keep these Compiler Explorer links working as
| long as possible.
|
| The Rust playground uses GitHub Gists as the primary storage
| location for shared data. I'm dreading the day that I need to
| migrate everything away from there to something self-maintained.
|
| [1]: https://www.w3.org/Provider/Style/URI
| kccqzy wrote:
| Before 2010 I had this unquestioned assumption that links are
| supposed to last forever. I used the bookmark feature of my
| browser extensively. Some time afterwards, I discovered that a
| large fraction of my bookmarks were essentially unusable due to
| linkrot. My modus operandi after that was to print the webpage as
| a PDF. A bit afterwards when reader views became popular
| reliable, I just copy-pasted the content from the reader view
| into an RTF file.
| flexagoon wrote:
| By the way, if you install the official Web Archive browser
| extension, you can configure it to automatically archive every
| page you visit
| petethomas wrote:
| This a good suggestion with the caveat that entire domains
| can and do disappear: https://help.archive.org/help/how-do-i-
| request-to-remove-som...
| Akronymus wrote:
| That's especially annoying when a formerly useful site gets
| abandoned, a new owner picks up the domain, then gets IA to
| delete the old archives as well.
|
| Or even worse, when a domain parking company does that:
| https://archive.org/post/423432/domainsponsorcom-erasing-
| pri...
| vitorsr wrote:
| > you can configure it to automatically archive every page
| you visit
|
| What?? I am a heavy user of the Internet Archive services,
| not just the Wayback Machine, including official and
| "unofficial" clients and endpoints, and I had absolutely no
| idea the extension could do this.
|
| To bulk archive I would manually do it via the web interface
| or batch automate it. The limitations of manually doing it
| one by one are obvious, and the limitations of doing it in
| batches requires, well, keeping batches (lists).
| internetter wrote:
| recently I've come to believe even IA and especially
| archive.is are ephermal. I've watched sites I've saved
| disappear without a trace, except in my selfhosted archives.
|
| A technological conundrum, however, is the fact that I have
| no way to prove that _my_ archive is an accurate
| representation of a site at a point in time. Hmmm, or maybe I
| do? Maybe something funky with cert chains could be done.
| akoboldfrying wrote:
| There are timestamping services out there, some of which
| may be free. It should (I think) be possible to basically
| submit the target site's URL to the timestamping service,
| and get back a certificate saying "I, Timestamps-R-US,
| assert that the contents of https://targetsite.com/foo/bar
| downloaded at 12:34pm on 29/5/2025 hashes to abc12345 with
| SHA-1", signed with their private key and verifiable (by
| anyone) with their public key. Then you download the same
| URL, and check that the hashes match.
|
| IIUC the timestamping service needs to independently
| download the contents itself in order to hash it, so if you
| need to be logged in to see the content there might be
| complications, and if there's a lot of content they'll
| probably want to charge you.
| XorNot wrote:
| Websites don't really produce consistent content even
| from identical requests though.
|
| But you also don't need to do this: all you need is a
| service which will attest that it saw a particular
| hashsum at a particular time. It's up to other mechanisms
| to prove what that means.
| akoboldfrying wrote:
| > Websites don't really produce consistent content even
| from identical requests though.
|
| Often true in practice unfortunately, but to the extent
| that it is true, any approach that tries to use hashes to
| prove things to a third party is sunk. (We could imagine
| a timestamping service that allows some kind of post-
| download "normalisation" step to strip out content that
| varies between queries and then hash the results of that,
| but that doesn't seem practical to offer as a free
| service.)
|
| > all you need is a service which will attest that it saw
| a particular hashsum at a particular time
|
| Isn't that what I'm proposing?
| shwouchk wrote:
| sign it with gpg and upload the sig to bitcoin
|
| edit: sorry, that would only prove when it was taken, not
| that it wasn't fabricated.
| fragmede wrote:
| hash the contents
| shwouchk wrote:
| signing it is effectively the same thing. question is how
| to prove that what you hashed is what was there?
| chii wrote:
| you can't, because unless you're not the only one with a
| copy, your hash cannot be verified (since both hash and
| claim comes from you).
|
| One way to make this work is to have a mechanism like
| bitcoin (proof of work), where the proof of work is put
| into the webpage itself as a hash (made by the original
| author of that page). Then anyone can verify that the
| contents wasn't changed, and if someone wants to make
| changes to it and claim otherwise, they'd have to put in
| even more proof of work to do it (so not impossible, but
| costly).
| notpushkin wrote:
| I think there was a way to preserve TLS handshake
| information in a way that something something you can
| verify you got the exact response from the particular
| server? I can't look it up now though, but I think there
| was a Firefox add-on, even.
| fragmede wrote:
| what if instead of the proof of work being in the page as
| a hash, that the distributed proof of work is that some
| subset of nodes download a particular bit of html or json
| from a particular URI, and then each node hashes that,
| saves the contents and the hash to a blockchain-esque
| distributed database. Subject to 51% attack, same as any
| other chain, but still.
| 90s_dev wrote:
| My solution has been to just remember the important stuff, or
| at least where to find it. I'm not dead yet so I guess it
| works.
| TeMPOraL wrote:
| It was my solution too, and I liked it, but over the past
| decade or so, I noticed that even when I remember where to
| find some stuff, hell, even if I just remember _how_ to find
| it, when I actually try and find it, it often isn 't there
| anymore. "Search rot" is just as big a problem as link rot.
|
| As for being still alive, by that measure hardly anything
| anyone does is important in the modern world. It's pretty
| hard to fail at thinking or remembering so badly that it
| becomes a life-or-death thing.
| 90s_dev wrote:
| > hardly anything anyone does is important
|
| Agreed.
| mock-possum wrote:
| I've found that whenever I think "why don't other people just
| do X" it's because I'm misunderstanding what's involved in X
| for them, and that generally if they _could_ 'just' do X then
| they _would_.
|
| "Why don't you just" is a red flag now for me.
| chii wrote:
| this applies to basically any suggested solution to any
| problem.
|
| "Why don't you just ..." is just lazy idea suggestion from
| armchair internet warriors.
| 90s_dev wrote:
| Not always. I love it when people offer me a much simpler
| solution to a problem I overengineered, so I can throw away
| my solution and use the simpler one.
|
| Half the time people are suggested a better way, it's
| because they're actually doing it wrong, they've gotten the
| solution's requirements all wrong in the first place, and
| this perspective helps.
| nonethewiser wrote:
| A reference is a bet on continuity.
|
| At a fundamental level, broken website links and dangling
| pointers in C are the same.
| lappa wrote:
| I use the SingleFile extension to archive every page I visit.
|
| It's easy to set up, but be warned, it takes up a lot of disk
| space. $ du -h ~/archive/webpages
| 1.1T /home/andrew/archive/webpages
|
| https://github.com/gildas-lormeau/SingleFile
| davidcollantes wrote:
| How do you manage those? Do you have a way to search them, or
| a specific way to catalogue them, which will make it easy to
| find exactly what you need from them?
| nirav72 wrote:
| KaraKeep is a decent self hostable app that has support for
| receiving singlefile pages via singlefile browser extension
| and pointing to karakeep API. This allows me to search for
| archived pages. (Plus auto summarization and tagging via
| LLM).
| dotancohen wrote:
| Very naive question, surely. What does KaraKeep provide
| that grep doesn't?
| nirav72 wrote:
| jokes aside. It has a mobile app
| 90s_dev wrote:
| You must have several TB of the internet on disk by now...
| internetter wrote:
| storage is cheap, but if you wanted to improve this:
|
| 1. find a way to dedup media
|
| 2. ensure content blockers are doing well
|
| 3. for news articles, put it through readability and store
| the markdown instead. if you wanted to be really fancy,
| instead you could attempt to programatically create a
| "template" of sites you've visited with multiple endpoints so
| the style is retained but you're not storing the content.
| alternatively a good compression algo could do this, if you
| had your directory like /home/andrew/archive/boehs.org.tar.gz
| and inside of the tar all the boehs.org pages you visited are
| saved
|
| 4. add fts and embeddings over the pages
| ashirviskas wrote:
| 1 and partly 3 - I use btrfs with compression and deduping
| for games and other stuff. Works really well and is
| "invisible" to you.
| bombela wrote:
| dedup on btrfs requires to setup a cronjob. And you need
| to pick one of the dedup too. It's not completely
| invisible in my mind bwcause of this ;)
| windward wrote:
| >storage is cheap
|
| It is. 1.1TB is both:
|
| - objectively an incredibly huge amount of information
|
| - something that can be stored for the cost of less than a
| day of this industry's work
|
| Half my reluctance to store big files is just an irrational
| fear of the effort of managing it.
| IanCal wrote:
| > - something that can be stored for the cost of less
| than a day of this industry's work
|
| Far, far less even. You can grab a 1TB external SSD from
| a good name for less than a days work at minimum wage in
| the UK.
|
| I keep getting surprised at just how cheap large storage
| is every time I need to update stuff.
| shwouchk wrote:
| i was considering a similar setup, but i don't really trust
| extensions. Im curious;
|
| - Do you also archive logged in pages, infinite scrollers,
| banking sites, fb etc? - How many entries is that? - How
| often do you go back to the archive? is stuff easy to find? -
| do you have any organization or additional process (eg
| bookmarks)?
|
| did you try integrating it with llms/rag etc yet?
| eddd-ddde wrote:
| You can just fork it, audit the code, add your own changes,
| and self host / publish.
| snthpy wrote:
| Thanks. I didn't know about this and it looks great.
|
| A couple of questions:
|
| - do you store them compressed or plain?
|
| - what about private info like bank accounts or health
| issuance?
|
| I guess for privacy one could train oneself to use private
| browsing mode.
|
| Regarding compression, for thousands of files don't all those
| self-extraction headers add up? Wouldn't there be space
| savings by having a global compression dictionary and only
| storing the encoded data?
| genewitch wrote:
| By default, singlefile only saves when you tell it to, so
| there's no worry about leaking personal information.
|
| I haven't put the effort in to make a "bookmark server"
| that will accomplish what singlefile does but on the
| internet because of how well singlefile works.
| d4mi3n wrote:
| > do you store them compressed or plain?
|
| Can't speak to your other issues but I would think the
| right file system will save you here. Hopefully someone
| with more insight can provide color here, but my
| understanding is that file systems like ZFS were
| specifically built for use cases like this where you have a
| large set of data you want to store in a space efficient
| manner. Rather than a compression dictionary, I believe
| tech like ZFS simply looks at bytes on disk and compresses
| those.
| nyarlathotep_ wrote:
| Are you automating this in some fashion? Is there another
| extension you've authored or similar to invoke SingleFile
| functionality on a new page load or similar?
| dataflow wrote:
| Have you tried MHTML?
| RiverCrochet wrote:
| SingleFile is way more convenient as it saves to a standard
| HTML file. The only thing I know that easily reads
| MHTML/.mht files is Internet Explorer.
| dataflow wrote:
| Chrome and Edge read them just fine? The format is
| actually the same as .eml AFAIK.
| RiverCrochet wrote:
| I remember having issues but it could be because the
| .mht's I had were so old I think I used Internet
| Explorer's Save As... function to generate them.
| dataflow wrote:
| I've had such issues with them in the past too, yeah. I
| never figured out the root cause. But in recent times I
| haven't had issues, for whatever that's worth. (I also
| haven't really tried to open many of the old files
| either.)
| macawfish wrote:
| Use WARC: https://en.wikipedia.org/wiki/WARC_(file_format) with
| WebRecorder: https://webrecorder.net/
| shwouchk wrote:
| warc is not a panacea; for example, gemini makes it super
| annoying to get a transcript of your conversation, so i
| started saving those as pdf and warc.
|
| turns out that unlike most webpages, the pdf version is only
| a single page of what is visible on screen.
|
| turns out also that opening the warc immediately triggers a
| js redirect that is planted in the page. i can still extract
| the text manually - it's embedded there - but i cannot "just
| open" the warc in my browser and expect an offline "archive"
| version - im interacting with a live webpage! this sucks from
| all sides - usability, privacy, security.
|
| Admittedly, i don't use webrecorder - does it solve this
| problem? did you verify?
| weinzierl wrote:
| Not sure if you tried that. Chrome has a take full page
| screenshot command. Just open the command bar in dev tools
| and search for "full" and you will fund it. Firefox has it
| right in the context menu, no need for dev tools.
|
| Unfortunately there are sites where it does not work.
| eMPee584 wrote:
| Apart from small UX nits, FF's screenshot feature is
| great - it's just that storing a 2-15MiB bitmap copy of a
| text medium still feels dirty to me every time.. would
| much prefer a PDF export, page size matching the scroll
| port, with embedded fonts and vectors and without print
| CSS..
| andai wrote:
| Is there some kind of thing that turns a web page into a text
| file? I know you can do it with beautiful soup (or like 4 lines
| of python stdlib), but I usually need it on my phone, where I
| don't know a good option.
|
| My phone browser has a "reader view" popup but it only appears
| sometimes, and usually not on pages that need it!
|
| Edit: Just installed w3m in Termux... the things we can do
| nowadays!
| XorNot wrote:
| You want Zotero.
|
| It's for bibliographies, but it also archives and stores web
| pages locally with a browser integration.
| _huayra_ wrote:
| I frankly don't know how I'd collect any useful info
| without it.
|
| I'm sure there are bookmark services that also allow notes,
| but the tagging, linking related things, etc, all in the
| app is awesome, plus the ability to export bib tex for
| writing a paper!
| taeric wrote:
| That assumption isn't true of any sources? Things flat out
| change. Some literally, others more in meaning. Some because
| they are corrected, but there are other reasons.
|
| Not that I don't think there is some benefit in what you are
| attempting, of course. A similar thing I still wish I could do
| is to "archive" someone's phone number from my contact list. Be
| it a number that used to be ours, or family/friends that have
| passed.
| rubit_xxx16 wrote:
| > Before 2010 I had this unquestioned assumption that links are
| supposed to last forever
|
| Any site/company whatsoever of this world (and most) that
| promises that anything will last forever is seriously deluded
| or intentionally lying, unless their theory of time is
| different than that of the majority.
| mycall wrote:
| Is there some browser extension that automatically goes to
| web.archive.org if the link timesout?
| theblazehen wrote:
| I use the Resurrect Pages addon
| diggan wrote:
| URLs (uniform resource locator) cannot ever last forever, as it's
| a _location_ and locations can 't last forever :)
|
| URIs however, can be made to last forever! Also comes with the
| added benefit that if you somehow integrate content-addressing
| into the identifier, you'll also be able to safely fetch it from
| any computer, hostile or not.
| 90s_dev wrote:
| I've been making websites for almost 30 years now.
|
| I still don't know the difference between URI and URL.
|
| I'm starting to think it doesn't matter.
| diggan wrote:
| > I still don't know the difference between URI and URL.
|
| One is a location, the other one is a ID. Which is which is
| referenced in the name :)
|
| And sure, it doesn't matter as long as you're fine with
| referencing _locations_ rather than the actual data, and
| aware of the tradeoffs.
| Sesse__ wrote:
| It doesn't matter.
|
| URI is basically a format and nothing else. (foo://bar123
| would be a URI but not a URL because nothing defines what
| foo: is.)
|
| URLs and URNs are thingies using the URI format;
| https://news.ycombinator.com is a URL (in addition to being a
| URI) because there's an RFC that specifies that https: means
| and how to go out and fetch them.
|
| urn:isbn:0451450523 (example cribbed from Wikipedia) is an
| URN (in addition to being an URI) that uniquely identifies a
| book, but doesn't tell you how to go find that book.
|
| Mostly, the difference is pedantic, given that URNs never
| took off.
| 90s_dev wrote:
| It's almost like URNs were born in an _urn_! [1]
|
| [1]: _ba dum tss_
| marcosdumay wrote:
| An URI is an standard way to write names of documents.
|
| And URL is an URI that also tells you how to find the
| document.
| layer8 wrote:
| URLs in the strict sense are a subset of URIs. They specify a
| mechanism (like HTTP or FTP) for how to access the referenced
| resource. The other type of URIs are opaque IDs, like
| doi:10.1000/182 or urn:isbn:9780141036144. These technically
| can't expire, though that doesn't mean you'll be able to
| access what they reference.
|
| However, "URL" in the broader sense is used as an umbrella
| term for URIs and IRIs (internationalized resource
| identifiers), in particular by WHATWG.
|
| In practice, what matters is the specific URI scheme ("http",
| "doi", etc.).
| immibis wrote:
| A URL tells you where to get some data, like
| https://example.com/index.html
|
| A URN tells you which data to get (usually by hash or by some
| big centralized registry), but not how to get it. DOIs in
| academia, for example, or RFC numbers. Magnet links are
| borderline.
|
| URIs are either URLs or URNs. URNs are rarely used since
| they're less practical since browsers can't open them - but
| note that in any case each URL scheme (https) or URN scheme
| (doi) is unique - there's no universal way to fetch one
| without specific handling for each supported scheme. So it's
| not actually that unusual for a browser not to be able to
| open a certain scheme.
| postoplust wrote:
| For example: IPFS URI's are content addresses
|
| https://docs.ipfs.tech/
| bowsamic wrote:
| Does this have any actual grounding in reality or does your
| lack of suggestion for action confirm my suspicion that this is
| just a theoretical wish?
| diggan wrote:
| > Does this have any actual grounding in reality
|
| Depends on your use case I suppose. For things I want to
| ensure I can reference _forever_ (theoretical forever), then
| using _location_ for that reference feels less than ideal, I
| cannot even count the number of dead bookmarks on both hands
| and feet, so "link rot" is a real issue.
|
| If those bookmarks instead referenced the actual content (via
| content-addressing for example), rather than the location,
| then those would still work today.
|
| But again, not everyone cares about things sticking around,
| not all use cases require the reference to continue being
| alive, and so on, so if it's applicable to you or not is
| something only you can decide.
| olalonde wrote:
| > This article was written by a human, but links were suggested
| by and grammar checked by an LLM.
|
| This is the second time today I've seen a disclaimer like this.
| Looks like we're witnessing the start of a new trend.
| tester756 wrote:
| It's crazy that people feel that they need to put such
| disclaimers
| layer8 wrote:
| It's more a claimer than a disclaimer. ;)
| danadam wrote:
| I'd probably call it "disclosure".
| psychoslave wrote:
| This comment was written by a human with no check by any
| automaton, but how will you check that?
| acquisitionsilk wrote:
| Business emails, other comments here and there of a more
| throwaway or ephemeral nature - who cares if LLMs helped?
|
| Personal blogs, essays, articles, creative writing,
| "serious work" - please tell us if LLMs were used, if they
| were, and to what extent. If I read a blog and it seems
| human and there's no mention of LLMs, I'd like to be able
| to safely assume it's a human who wrote it. Is that so much
| to ask?
| qingcharles wrote:
| That's exactly what a bot would say!
| actuallyalys wrote:
| It makes sense to me. After seeing a bunch of AI slop, people
| started putting no AI buttons and disclaimers. Then some
| people using AI for little things wanted to clarify it wasn't
| AI generated wholesale without falsely claiming AI wasn't
| involved at all.
| chii wrote:
| i dont find the need to have such a disclaimer at all.
|
| If the content can stand on its own, then it is sufficient. If
| the content is slop, then why does it matter that it is an ai
| generated slop vs human generated slop?
|
| The only reason anyone wants to know/have the disclaimer is if
| they cannot themselves discern the quality of the contents, and
| is using ai generation as a proxy for (bad) quality.
| johannes1234321 wrote:
| For the author it matters. To which degree do they want to be
| associated with the resulting text.
|
| And I differentiate between "Matt Godbolt" who is an expert
| in some areas and in my experience careful about avoiding
| wrong information and an LLM which may produce additional
| depth, but may also make up things.
|
| And well, "discern the quality of the contents" - I often
| read texts to learn new things. On new things I don't have
| enough knowledge to qualify the statements, but I may have
| experience with regards to the author or publisher.
| chii wrote:
| and what do you do to make this differentiation if what
| you're reading is a scientific paper?
| johannes1234321 wrote:
| Same?
|
| (Some researcher's names I know, some institutions
| published good reports in the past and that I take into
| consideration on how much I trust it ... and since I'm
| human I trust it more if it confirms my view and less if
| it challenges it or put in different words: there are
| many factors going into subjective trust)
| 90s_dev wrote:
| Some famous programmer once wrote about how links should last
| forever.
|
| He advocated for /foo/bar with no extension. He was right about
| not using /foo/bar.php because the _implementation_ might change.
|
| But he was wrong, it should be /foo/bar.html because the _end-
| result_ will always be HTML when it 's served by a browser,
| whether it's generated by PHP, Node.js or by hand.
|
| It's pointless to prepare for some hypothetical new browser that
| uses an _alternate_ language _other than HTML_ and that _doesn
| 't_ use HTML.
|
| Just use .html for your pages and stop worrying about how to
| correctly convert foo.md to foo/index.html and configure nginx
| accordingly.
| Dwedit wrote:
| mod_rewrite means you can redirect the .php page to something
| else if you stop using php.
| shakna wrote:
| Unless mod_rewrite is disabled, because it has had a few
| security bugs over the years. Like last year. [0]
|
| [0] https://nvd.nist.gov/vuln/detail/CVE-2024-38475
| 90s_dev wrote:
| Found it: https://www.w3.org/Provider/Style/URI
|
| Why did I think Joel Spolsky or Jeff Atwood wrote it?
| Sesse__ wrote:
| > Some famous programmer once wrote about how links should last
| forever.
|
| You're probably thinking of W3C's guidance:
| https://www.w3.org/Provider/Style/URI
|
| > But he was wrong, it should be /foo/bar.html because the end-
| result will always be HTML
|
| 20 years ago, it wasn't obvious at all that the end-result
| would always be HTML (in particular, various styled forms of
| XML was thought to eventually take over). And in any case,
| there's no reason to have the content-type in the URL; why
| would the user care about that?
| 90s_dev wrote:
| There's strong precedence for associating file extensions
| with content types. And it allows static files to map 1:1 to
| URLs.
|
| I agree though that I was too harsh, I didn't realize it was
| written in 1998 when HTML was still new. I probably first
| read it around 2010.
|
| But now that we have hindsight, I think it's safe to say
| .html files will continue to be supported for the next 50
| years.
| crackalamoo wrote:
| I use /foo/bar/ with the trailing slash because it works better
| with relative URLs for resources like images. I could also use
| /foo/bar/index.html but I find the former to be cleaner
| 90s_dev wrote:
| It's always bothered me in a small way that github doesn't
| honor this:
|
| https://github.com/sdegutis/bubbles
|
| https://github.com/sdegutis/bubbles/
|
| No redirect, just two renders!
|
| It bothers me first because it's semantically different.
|
| Second and more importnatly, because it's always such a pain
| to configure that redirect in nginx or whatever. I eventually
| figure it out each time, after many hours wasted looking it
| up all over again and trial/error.
| esafak wrote:
| If it's always .html, it's cruft; get rid of it. And what if
| it's not HTML but JSON? Besides, does the user care? Berners-
| Lee was right.
|
| https://www.w3.org/Provider/Style/URI
| 90s_dev wrote:
| If it's JSON then name it /foo/bar.json, and as a bonus you
| can _also_ have /foo/bar.html!
|
| You say the extension is cruft. That's your opinion. I don't
| share it.
| marcosdumay wrote:
| The alternative is to declare what you want on the Accept
| header, what is way less transparent but is more flexible.
|
| I never saw any site where the extra flexibility added any
| value. So, right now I do favor the extension.
| kelnos wrote:
| At the risk of committing the appeal-to-authority fallacy,
| it's also the opinion of Tim Berners-Lee, which I would
| hope carries at least some weight.
|
| The way I look at it is that yes, the extension can be
| useful for requesting a particular file format (IMO the
| Accept header is not particularly accessible, especially if
| you are just a regular web browser user). But if you have a
| default/canonical representation, then you should give that
| representation in response to a URL that has no extension.
| And when you link to that document in a representation-
| neutral way, you should link without the extension.
|
| That doesn't stop you from _also_ serving that same content
| from a URL that includes the extension that describes the
| default /canonical representation. And people who want to
| link to you and ensure they get a particular representation
| can use the extension in their links. But someone who
| doesn't care, and just wants the document in whatever
| format the website owner recommends, should be able to get
| it without needing to know the extension. For those
| situations, the extension is an implementation detail that
| is irrelevant to most visitors.
| 90s_dev wrote:
| > it's also the opinion of Tim Berners-Lee, which I would
| hope carries at least some weight
|
| Not at all. He's famous for helping create the initial
| version of JavaScript, which was a fairly even mixture of
| great and terrible. Which means his initial contributions
| to software were not extremely noteworthy, and he just
| happened to be in the right time and right place, since
| something like JavaScript was apparently inevitable.
| Plus, I can't think of any of his major contributions to
| software in the decades since. So no, I don't even think
| that's really an appeal to authority.
| wolfgang42 wrote:
| _> [Tim Berners-Lee is] famous for helping create the
| initial version of JavaScript_
|
| You may be thinking of Brendan Eich? Berners-Lee is
| famous for HTML, HTTP, the first web browser, and the
| World Wide Web in general; as far as I know he had
| nothing to do with JS.
| swyx wrote:
| idk man how can URLs last forever if it costs money to keep a
| domain name alive?
|
| i also wonder if url death could be a good thing. humanity makes
| special effort to keep around the good stuff. the rest goes into
| the garbage collection of history.
| johannes1234321 wrote:
| Historians however would love to have more garbage from
| history, to get more insights on "real" life rather than just
| the parts one considered worth keeping.
|
| If I could time jump it would be interesting to see how
| historians inna thousand years will look back at our period
| where a lot of information will just disappear without traces
| as digital media rots.
| swyx wrote:
| we'd keep the curiosities around, like so much Ea Nasir Sells
| Shit Copper. we have room for like 5-10 of those per century.
| not like 8 billion. much of life is mundane.
| rightbyte wrote:
| Imagine being judged 1000s of year later by some Yelp
| reviews like poor Nasir.
| woodruffw wrote:
| > much of life is mundane.
|
| The things that make (or fail to make) life mundane at some
| point in history are themselves subjects of significant
| academic interest.
|
| (And of course we have no way to tell what things are
| "curiosities" or not. Preservation can be seen as a way to
| minimize survivorship bias.)
| cortesoft wrote:
| Today's mundane is tomorrow's fascination
| shakna wrote:
| We also have rooms full of footprints. In a thousand years,
| your mundane is the fascination of the world.
| johannes1234321 wrote:
| Yes, at the same time we'd be excited about more mundane
| sources from history. The legends about the mighty are
| interesting, but what do we actually know about every day
| love from people a thousand years ago? Very little. Most
| things are speculation based on objects (tools etc.), on
| structure of buildings and so on. If we go back just few
| hundred years there is (using European perspective) a
| somewhat interesting source in court cases from legal
| conflicts between "average" people, but in older times more
| or less all written material is on the powerful, be it
| worldly or religious power, which often describes the
| rulers in an extra positive way (from their perspective)
| and their opponents extra weak.
|
| Having more average sources certainly helps and we now
| aren't good judges on what will be relevant in future. We
| can only try to keep some of everything.
| mrguyorama wrote:
| I regularly wonder if modern educated people do not journal
| as much as previous century educated people who were kind of
| rare.
|
| Maybe we should get a journaling boom going.
|
| But it has to be written, because pen and paper is literally
| ten times more durable than even good digital storage.
| swyx wrote:
| > pen and paper is literally ten times more durable than
| even good digital storage.
|
| citation needed lol. data replication >>>> paper's single
| point of failure.
| johannes1234321 wrote:
| The question is: What is more likely in 1000 years to
| still exist and being readable. The papers caught in some
| lost ruins or some form of storage media?
|
| Sure, as long as the media is copied there is a chance of
| survival, but will this then be "average" material or
| things we now consider interesting, only? Will the chain
| hold or will it become as uninteresting as many other
| things were over time? Will the Organisation doing it be
| funded? Will the location where this happens be spares
| from war?
|
| For today's historians the random finds are important
| artifacts to understand "average" people's lives as the
| well preserved documents are legends on the mighty
| people.
|
| Having lots of material all over gives a chance for some
| to survive and from 40 years or so back we were in a good
| spot. Lots of paper allover about everything. Analog
| vinyl records, which might be readable in a future to
| learn about our music. But now all on storage media,
| where many forms see data loss, where the format is
| outdated and (when looking from a thousand years away)
| fast change of data formats etc.
| KPGv2 wrote:
| > What is more likely in 1000 years to still exist and
| being readable. The papers caught in some lost ruins or
| some form of storage media?
|
| The storage media. We have evidence to support this:
|
| * original paper works from 1000 years ago are _insanely
| rare_
|
| * more recent storage media provide much more content
|
| How many digital copies of Beowulf do we have? Millions?
|
| How many paper copies from 1000 years ago? _one_
|
| how many other works from 1000 years ago do we have zero
| copies of thanks to paper's fragility and thus don't even
| know existed? probably a _lot_
| johannes1234321 wrote:
| However that one paper, stating a random fact, might tell
| more about the people than an epic poem.
|
| You can't have a fully history without either.
| tredre3 wrote:
| > The question is: What is more likely in 1000 years to
| still exist and being readable. The papers caught in some
| lost ruins or some form of storage media?
|
| But that's just survivorship bias. The vast vast vast
| majority of all written sheets of paper have been lost to
| history. Those deemed worthy were carefully preserved,
| some of the rest was preserved by a fluke. The same is
| happening with digital media.
| internetter wrote:
| > i also wonder if url death could be a good thing. humanity
| makes special effort to keep around the good stuff. the rest
| goes into the garbage collection of history.
|
| agreed. formerly wrote some thoughts here:
| https://boehs.org/node/internet-evanescence
| s17n wrote:
| URLs lasting forever was a beautiful dream but in reality, it
| seems that 99% of URLs don't in fact last forever. Rather than
| endlessly fighting a losing battle, maybe we should build the
| technology around the assumption that infrastructure isn't
| permanent?
| nonethewiser wrote:
| >maybe we should build the technology around the assumption
| that infrastructure isn't permanent?
|
| Yes. Also not using a url shortener as infrastructure.
| hoppp wrote:
| Yes.
|
| domain names often exchange hands and a URL that is supposed to
| last forever can turn into malicious phishing link over time.
| emaro wrote:
| In theory a content-addressed system like IPFS would be the
| best: if someone online still has a copy, you can get it too.
| mananaysiempre wrote:
| It feels as though, much like cryptography in general
| reduces almost all confidentiality-adjacent problems to key
| distribution (which is damn near unsolvable in large
| uncoordinated deployments like Web PKI or PGP), content-
| addressable storage reduces almost all data-persistence-
| adjacent problems to maintenance of mutable name-to-hash
| mappings (which is damn near unsolvable in large
| uncoordinated deployments like BitTorrent, Git, or
| IP[FN]S).
| dreamcompiler wrote:
| DNS seems to solve the problem of a decentralized
| loosely-coordinated mapping service pretty well.
| emaro wrote:
| True, but then you're back on square one. Because it's
| not guaranteed that using a (DNS) name will point to the
| same content forever.
| hoppp wrote:
| But then all content should be static and never update?
|
| If you serve an SPA via IPFS, the SPA still needs to
| fetch the data from an endpoint which could go down or
| change
|
| Even if you put everything on a blockchain, an RPC
| endpoint to read the data must have a URL
| mananaysiempre wrote:
| > But then all content should be static and never update?
|
| And thus we arrive at the root of the conflict. Many
| users (that care about this kind of thing) want to
| publications that they've seen to stay where they've seen
| them; many publishers have become accustomed to being
| able to memory-hole things (sometimes for very real
| safety reasons; often for marketing ones). That on top of
| all the usual problems of maintaining a space of human-
| readable names.
| immibis wrote:
| Note that IPFS is now on the EU Piracy Watchlist which may
| be a precursor to making it illegal.
| jjmarr wrote:
| URL identify the location of a resource on a network, not the
| resource itself, and so are not required to be permanent or
| unique. That's why they're called "uniform resource locators".
|
| This problem was recognized in 1997 and is why the Digital
| Object Identifier was invented.
| dreamcompiler wrote:
| URNs were supposed to solve that problem by separating the
| identity of the thing from the location of the thing.
|
| But they never became popular and then link shorteners
| reimplemented the idea, badly.
|
| https://en.m.wikipedia.org/wiki/Uniform_Resource_Name
| devnullbrain wrote:
| >despite Google solemnly promising that "all existing links will
| continue to redirect to the intended destination," it went read-
| only a few years back, and now they're finally sunsetting it in
| August 2025
|
| It's become so trite to mention that I'm rolling my eyes at
| myself just for bringing it up again but... come on! How bad can
| it be before Google do something about the reputation this
| behaviour has created?
|
| Was Stadia not an expensive enough failure?
| iainmerrick wrote:
| I'm very surprised, even though I shouldn't be, that they're
| actually shutting the read-only goo.gl service down.
|
| For other obsolete apps and services, you can argue that they
| require some continual maintenance and upkeep, so keeping them
| around is expensive and not cost-effective if very few people
| are using them.
|
| But a URL shortener is super simple! It's just a database, and
| in this case we don't even need to write to it. It's literally
| one of the example programs for AWS Lambda, intentionally
| chosen because it's really simple.
|
| I guess the goo.gl link database is probably really big, but
| even so, this is Google! Storage is cheap! Shutting it down is
| such a short-sighted mean-spirited bean-counter decision, I
| just don't get it.
| creatonez wrote:
| There's something poetic about abusing a link shortener as a
| database and then later having to retrieve all your precious
| links from random corners of the internet because you've lost the
| original reference.
| nonethewiser wrote:
| Didnt they just use the link shortener to compress the url?
| They used their url as the "database" (ie holding the compiler
| state).
| Arcuru wrote:
| They didn't store anything themselves since they encoded the
| full state in the urls that were given out. So the link
| shortener was the only place where the "database", the urls,
| were being stored.
| nonethewiser wrote:
| Yeah but the purpose of the url shortener was not to store
| the data, it was to shorten the url. The fact that the data
| was persisted on google's sever somewhere is incidental.
|
| In other words, every shortened url is "using the url
| shortener as a database" in that sense. Taking a url with a
| long query parameter and using a url shortener to shorten
| it does not constitute "abusing a link shortener as a
| database."
| cortesoft wrote:
| Except in this case the url IS the data, so storing the
| url is the same as storing the data.
| nonethewiser wrote:
| Its incidental. The state is in the url which is only
| shortened because its so long. Google's url shortener is
| not needed to store the data.
|
| It's simply a normal use-case for a url shortener. A long
| url, usually because of some very large query parameter,
| which gets mapped to a short one.
| rs186 wrote:
| Shortening long URLs is the intended use case for a ... URL
| shortener.
|
| The real abusers are the people who use a shortener to hide
| scam/spam/illegal websites behind a common domain and post it
| everywhere.
| creatonez wrote:
| These are not just "long URLs". These are URLs where the
| _entire_ content is stored in the fragment suffix of the URL.
| They are blobs, and always have been.
| wrs wrote:
| I hate to say it, but unless there's a really well-funded
| foundation involved, Compiler Explorer and godbolt.org won't last
| forever either. (Maybe by then all the info will have been
| distilled into the 487 quadrillion parameter model of
| everything...)
| layer8 wrote:
| Thanks to the no-hiding theorem, the information will live
| forever. ;)
| mattgodbolt wrote:
| We've done alright so far: 13 years this week. I have funding
| for another year and change even assuming growth and all our
| current sponsors pull out.
|
| I /am/ thinking about a foundation or similar though: the
| single point of failure is not funding but "me".
| badmintonbaseba wrote:
| Well, that's true, but at least now compiler explorer links
| will stop working when compiler explorer vanishes, but not
| before that.
|
| I think the most valuable long-living compiler explorer links
| are in bug reports. I like to link to compiler explorer in bug
| reports for convenience, but I also include the code in the
| report itself, and specify what compiler I used with what
| version to reproduce the bug. I don't expect compiler explorer
| to vanish anytime soon, but making bug reports self-contained
| like this protects against that.
| layer8 wrote:
| I find it somewhat surprising that it's worth the effort for
| Google to shut down the read-only version. Unless they fear some
| legal risks of leaving redirects to private links online.
| actuallyalys wrote:
| Hard to say from the outside, but it's possible the service
| relies on some outdated or insecure library, runtime, service,
| etc. they want to stop running. Although frankly it seems just
| as possible it's a trivial expense and they're cutting it
| because it's still a net expense, goodwill and past promises be
| dammed.
| Scaevolus wrote:
| Typically services like these are side projects of just a few
| Google employees, and when the last one leaves they are shut
| down.
| mmooss wrote:
| Another possibility is that it's a distraction - whatever the
| marginal costs, there's a fixed cost to each system in terms
| of cognitive overhead, if not documentation, legal issues
| (which can change as laws and regulations change), etc.
| Removing distractions is basic management.
| mbac32768 wrote:
| yeah but nobody wants to put "spent two months migrating
| goo.gl url shortener to work with Sisyphus release manager
| and Dante 7 SRE monitoring" in their perf packet
|
| that's a negative credit activity
| sdf4j wrote:
| > One of my founding principles is that Compiler Explorer links
| should last forever.
|
| And yet... that was a very self-destructive decision.
| mattgodbolt wrote:
| I'm not sure why so?
| MyPasswordSucks wrote:
| Because URL shortening is relatively trivial to implement,
| and instead of just doing so on their own end, they decided
| to rely on a third-party service.
|
| Considering link permanence was a "founding principle",
| that's just unbelievably stupid. If I decide one of my
| "founding principles" is that I'm never going to show up at
| work with a dirty windshield, then I shouldn't rely on the
| corner gas station's squeegee and cleaning fluid.
| gwd wrote:
| First of all, _how the links are made permanent_ has
| nothing to do with the principle that _they should be made
| permanent_.
|
| There seemed to be two principles at play here:
|
| 1. Links should always work
|
| 2. We don't want to store any user data
|
| #2 is a bit complicated, because although it sounds nice,
| it has two potential justifications:
|
| 2a: For privacy reasons, don't store any user data
|
| 2b: To avoid having to think through the implications of
| storing all those things ourselves
|
| I'm not sure how much each played into their thinking;
| possibly because of a lack of clarity, 2a sounded nice and
| 2b was the real motivation.
|
| I'd say 2a is a reasonable aspiration; but using a link
| shortener changed it from "don't store any user data" to
| "store the user data somewhere we can't easily get at it",
| which isn't the same thing.
|
| 2b, when stated more clearly, is obviously just taking on
| technical debt and adding dependencies which may come back
| to bite you -- as it did.
| sedatk wrote:
| Surprisingly, purl.org URLs still work after a quarter century,
| thanks to Internet Archive.
| 2YwaZHXV wrote:
| Presumably there's no way to get someone at Google to query their
| database and find all the shortened links that go to godbolt.org?
| devrandoom wrote:
| > despite Google solemnly promising ...
|
| I'm pretty sure the lore says that a solemn promise from Google
| carries the exact same value as a prostitute saying she likes
| you.
| nssnsjsjsjs wrote:
| The collolary of URLs that last forever is we have both forever
| storage (costs money forever) and forever institutional care and
| memory.
|
| Where URLs may last longer is where they are not used for the RL
| bit. But more like a UUID for namespacing. E.g. in XML, Java or
| Go.
| mbac32768 wrote:
| it seems a bit crazy to try to avoid storing a relatively small
| amount of data when a link is shared when storage costs and
| bandwidth costs are rapidly dropping with time
|
| but perhaps I don't appreciate how much traffic godbolt gets
| mattgodbolt wrote:
| It was a simpler time and I didn't want the responsibility of
| storing other people's data. We do now though!
| mattgodbolt wrote:
| Oh and traffic: https://stats.compiler-explorer.com/
| Ericson2314 wrote:
| The only type of reference that lasts forever is a content
| address.
|
| We should be using more of them.
| rurban wrote:
| He missed the archive.org crawl for those links in the blog post.
| they have them stored also now. https://github.com/compiler-
| explorer/compiler-explorer/discu...
| sebstefan wrote:
| >Over the last few days, I've been scraping everywhere I can
| think of, collating the links I can find out in the wild, and
| compiling my own database of links1 - and importantly, the URLs
| they redirect to. So far, I've found 12,000 links from scraping:
|
| >Google (using their web search API)
|
| >GitHub (using their API)
|
| >Our own (somewhat limited) web logs
|
| >The archive.org Stack Overflow data dumps
|
| >Archive.org's own list of archived webpages
|
| You're an angel Matt
| 3cats-in-a-coat wrote:
| Nothing lasts forever.
|
| I've pondered that a lot in my system design which bears some
| resemblance to the principles of REST.
|
| I have split resources in ephemeral (and mutable), and immutable,
| reference counted (or otherwise GC-ed), which are persistent
| while referred to, but collected when no one refers to them.
|
| In a distributed system the former is the default, the latter can
| exist in little islands of isolated context.
|
| You can't track references throughout the entire world. The only
| thing that works is timeouts. But those are not reliable. Nor you
| can exist forever, years after no one needs you. A system needs
| its parts to be useful, or it dies full of useless parts.
___________________________________________________________________
(page generated 2025-05-29 23:01 UTC)