[HN Gopher] Compiler Explorer and the promise of URLs that last ...
       ___________________________________________________________________
        
       Compiler Explorer and the promise of URLs that last forever
        
       Author : anarazel
       Score  : 338 points
       Date   : 2025-05-28 16:28 UTC (1 days ago)
        
 (HTM) web link (xania.org)
 (TXT) w3m dump (xania.org)
        
       | jimmyl02 wrote:
       | This is great perspective about how assumptions play out over
       | longer period of time. I think that this risk is much greater for
       | _free_ third party services for critical infrastructure.
       | 
       | Someone has to foot the bill somewhere and if there isn't a
       | source of income then the project is bound to be unsupported
       | eventually.
        
         | tekacs wrote:
         | I think I would struggle to say that free services die at a
         | higher rate consistently...
         | 
         | So many paid offerings, whether from startups or even from
         | large companies, have been sunset over time, often with
         | frustratingly short migration periods.
         | 
         | If anything, I feel like I can think of more paid services that
         | have given their users short migration periods than free ones.
        
         | lqstuart wrote:
         | Counterexample: the Linux kernel
        
           | charcircuit wrote:
           | How? Big tech foots the bill.
        
             | shlomo_z wrote:
             | But goo.gl is also big tech...
        
               | 0x1ceb00da wrote:
               | Google wasn't making money off of goo.gl
        
               | johannes1234321 wrote:
               | But what did it cost them? Especially in read only mode
               | 
               | Sure, an service more to monitor, while for the most part
               | "fix by restart" is a good enough approach. And then once
               | in a while have an intern switching to latest backend
               | choice.
        
           | iainmerrick wrote:
           | Linux isn't a service (in the SaaS sense).
        
         | cortesoft wrote:
         | Nah, businesses go under all the time, whether their services
         | are paid or not.
        
       | amiga386 wrote:
       | https://killedbygoogle.com/
       | 
       | > Google Go Links (2010-2021)
       | 
       | > Killed about 4 years ago, (also known as Google Short Links)
       | was a URL shortening service. It also supported custom domain for
       | customers of Google Workspace (formerly G Suite (formerly Google
       | Apps)). It was about 11 years old.
        
         | zerocrates wrote:
         | "Killing" the service in the sense of minting new ones is no
         | big deal and hardly merits mention.
         | 
         | Killing the existing ones is much more of a jerk move.
         | Particularly so since Google is still keeping it around in some
         | form for internal use by their own apps.
        
           | ruune wrote:
           | Don't they use https://g.co now? Or are there still new
           | internal goo.gl links created?
           | 
           | Edit: Google is using a g.co link on the "Your device is
           | booting another OS" screen that appears when booting up my
           | Pixel running GrapheneOS. Will be awkward when they kill that
           | service and the hard coded link in the phones bios is just
           | dead
        
             | zerocrates wrote:
             | Google Maps creates "maps.app.goo.gl" links; I don't know
             | if there are others, they called Maps out specifically in
             | their message.
             | 
             | Possibly those other ones are just using the domain name
             | and the underlying service is totally different, not sure.
        
       | mananaysiempre wrote:
       | May be worth cooperating with ArchiveTeam's project[1] on Goo.gl?
       | 
       | > url shortening was a fucking awful idea[2]
       | 
       | [1] https://wiki.archiveteam.org/index.php/Goo.gl
       | 
       | [2] https://wiki.archiveteam.org/index.php/URLTeam
        
         | MallocVoidstar wrote:
         | IIRC ArchiveTeam were bruteforcing Goo.gl short URLs, not going
         | through 'known' links, so I'd assume they have many/all of
         | Compiler Explorer's URLs. (So, good idea to contact them)
        
         | tech234a wrote:
         | Real-time status for that project indicates 7.5 billion goo.gl
         | URLs found out of 42 billion goo.gl URLs scanned:
         | https://tracker.archiveteam.org:1338/status
        
       | shepmaster wrote:
       | As we all know, Cool URIs don't change [1]. I greatly appreciate
       | the care taken to keep these Compiler Explorer links working as
       | long as possible.
       | 
       | The Rust playground uses GitHub Gists as the primary storage
       | location for shared data. I'm dreading the day that I need to
       | migrate everything away from there to something self-maintained.
       | 
       | [1]: https://www.w3.org/Provider/Style/URI
        
       | kccqzy wrote:
       | Before 2010 I had this unquestioned assumption that links are
       | supposed to last forever. I used the bookmark feature of my
       | browser extensively. Some time afterwards, I discovered that a
       | large fraction of my bookmarks were essentially unusable due to
       | linkrot. My modus operandi after that was to print the webpage as
       | a PDF. A bit afterwards when reader views became popular
       | reliable, I just copy-pasted the content from the reader view
       | into an RTF file.
        
         | flexagoon wrote:
         | By the way, if you install the official Web Archive browser
         | extension, you can configure it to automatically archive every
         | page you visit
        
           | petethomas wrote:
           | This a good suggestion with the caveat that entire domains
           | can and do disappear: https://help.archive.org/help/how-do-i-
           | request-to-remove-som...
        
             | Akronymus wrote:
             | That's especially annoying when a formerly useful site gets
             | abandoned, a new owner picks up the domain, then gets IA to
             | delete the old archives as well.
             | 
             | Or even worse, when a domain parking company does that:
             | https://archive.org/post/423432/domainsponsorcom-erasing-
             | pri...
        
           | vitorsr wrote:
           | > you can configure it to automatically archive every page
           | you visit
           | 
           | What?? I am a heavy user of the Internet Archive services,
           | not just the Wayback Machine, including official and
           | "unofficial" clients and endpoints, and I had absolutely no
           | idea the extension could do this.
           | 
           | To bulk archive I would manually do it via the web interface
           | or batch automate it. The limitations of manually doing it
           | one by one are obvious, and the limitations of doing it in
           | batches requires, well, keeping batches (lists).
        
           | internetter wrote:
           | recently I've come to believe even IA and especially
           | archive.is are ephermal. I've watched sites I've saved
           | disappear without a trace, except in my selfhosted archives.
           | 
           | A technological conundrum, however, is the fact that I have
           | no way to prove that _my_ archive is an accurate
           | representation of a site at a point in time. Hmmm, or maybe I
           | do? Maybe something funky with cert chains could be done.
        
             | akoboldfrying wrote:
             | There are timestamping services out there, some of which
             | may be free. It should (I think) be possible to basically
             | submit the target site's URL to the timestamping service,
             | and get back a certificate saying "I, Timestamps-R-US,
             | assert that the contents of https://targetsite.com/foo/bar
             | downloaded at 12:34pm on 29/5/2025 hashes to abc12345 with
             | SHA-1", signed with their private key and verifiable (by
             | anyone) with their public key. Then you download the same
             | URL, and check that the hashes match.
             | 
             | IIUC the timestamping service needs to independently
             | download the contents itself in order to hash it, so if you
             | need to be logged in to see the content there might be
             | complications, and if there's a lot of content they'll
             | probably want to charge you.
        
               | XorNot wrote:
               | Websites don't really produce consistent content even
               | from identical requests though.
               | 
               | But you also don't need to do this: all you need is a
               | service which will attest that it saw a particular
               | hashsum at a particular time. It's up to other mechanisms
               | to prove what that means.
        
               | akoboldfrying wrote:
               | > Websites don't really produce consistent content even
               | from identical requests though.
               | 
               | Often true in practice unfortunately, but to the extent
               | that it is true, any approach that tries to use hashes to
               | prove things to a third party is sunk. (We could imagine
               | a timestamping service that allows some kind of post-
               | download "normalisation" step to strip out content that
               | varies between queries and then hash the results of that,
               | but that doesn't seem practical to offer as a free
               | service.)
               | 
               | > all you need is a service which will attest that it saw
               | a particular hashsum at a particular time
               | 
               | Isn't that what I'm proposing?
        
             | shwouchk wrote:
             | sign it with gpg and upload the sig to bitcoin
             | 
             | edit: sorry, that would only prove when it was taken, not
             | that it wasn't fabricated.
        
               | fragmede wrote:
               | hash the contents
        
               | shwouchk wrote:
               | signing it is effectively the same thing. question is how
               | to prove that what you hashed is what was there?
        
               | chii wrote:
               | you can't, because unless you're not the only one with a
               | copy, your hash cannot be verified (since both hash and
               | claim comes from you).
               | 
               | One way to make this work is to have a mechanism like
               | bitcoin (proof of work), where the proof of work is put
               | into the webpage itself as a hash (made by the original
               | author of that page). Then anyone can verify that the
               | contents wasn't changed, and if someone wants to make
               | changes to it and claim otherwise, they'd have to put in
               | even more proof of work to do it (so not impossible, but
               | costly).
        
               | notpushkin wrote:
               | I think there was a way to preserve TLS handshake
               | information in a way that something something you can
               | verify you got the exact response from the particular
               | server? I can't look it up now though, but I think there
               | was a Firefox add-on, even.
        
               | fragmede wrote:
               | what if instead of the proof of work being in the page as
               | a hash, that the distributed proof of work is that some
               | subset of nodes download a particular bit of html or json
               | from a particular URI, and then each node hashes that,
               | saves the contents and the hash to a blockchain-esque
               | distributed database. Subject to 51% attack, same as any
               | other chain, but still.
        
         | 90s_dev wrote:
         | My solution has been to just remember the important stuff, or
         | at least where to find it. I'm not dead yet so I guess it
         | works.
        
           | TeMPOraL wrote:
           | It was my solution too, and I liked it, but over the past
           | decade or so, I noticed that even when I remember where to
           | find some stuff, hell, even if I just remember _how_ to find
           | it, when I actually try and find it, it often isn 't there
           | anymore. "Search rot" is just as big a problem as link rot.
           | 
           | As for being still alive, by that measure hardly anything
           | anyone does is important in the modern world. It's pretty
           | hard to fail at thinking or remembering so badly that it
           | becomes a life-or-death thing.
        
             | 90s_dev wrote:
             | > hardly anything anyone does is important
             | 
             | Agreed.
        
           | mock-possum wrote:
           | I've found that whenever I think "why don't other people just
           | do X" it's because I'm misunderstanding what's involved in X
           | for them, and that generally if they _could_ 'just' do X then
           | they _would_.
           | 
           | "Why don't you just" is a red flag now for me.
        
             | chii wrote:
             | this applies to basically any suggested solution to any
             | problem.
             | 
             | "Why don't you just ..." is just lazy idea suggestion from
             | armchair internet warriors.
        
             | 90s_dev wrote:
             | Not always. I love it when people offer me a much simpler
             | solution to a problem I overengineered, so I can throw away
             | my solution and use the simpler one.
             | 
             | Half the time people are suggested a better way, it's
             | because they're actually doing it wrong, they've gotten the
             | solution's requirements all wrong in the first place, and
             | this perspective helps.
        
         | nonethewiser wrote:
         | A reference is a bet on continuity.
         | 
         | At a fundamental level, broken website links and dangling
         | pointers in C are the same.
        
         | lappa wrote:
         | I use the SingleFile extension to archive every page I visit.
         | 
         | It's easy to set up, but be warned, it takes up a lot of disk
         | space.                   $ du -h ~/archive/webpages
         | 1.1T /home/andrew/archive/webpages
         | 
         | https://github.com/gildas-lormeau/SingleFile
        
           | davidcollantes wrote:
           | How do you manage those? Do you have a way to search them, or
           | a specific way to catalogue them, which will make it easy to
           | find exactly what you need from them?
        
             | nirav72 wrote:
             | KaraKeep is a decent self hostable app that has support for
             | receiving singlefile pages via singlefile browser extension
             | and pointing to karakeep API. This allows me to search for
             | archived pages. (Plus auto summarization and tagging via
             | LLM).
        
               | dotancohen wrote:
               | Very naive question, surely. What does KaraKeep provide
               | that grep doesn't?
        
               | nirav72 wrote:
               | jokes aside. It has a mobile app
        
           | 90s_dev wrote:
           | You must have several TB of the internet on disk by now...
        
           | internetter wrote:
           | storage is cheap, but if you wanted to improve this:
           | 
           | 1. find a way to dedup media
           | 
           | 2. ensure content blockers are doing well
           | 
           | 3. for news articles, put it through readability and store
           | the markdown instead. if you wanted to be really fancy,
           | instead you could attempt to programatically create a
           | "template" of sites you've visited with multiple endpoints so
           | the style is retained but you're not storing the content.
           | alternatively a good compression algo could do this, if you
           | had your directory like /home/andrew/archive/boehs.org.tar.gz
           | and inside of the tar all the boehs.org pages you visited are
           | saved
           | 
           | 4. add fts and embeddings over the pages
        
             | ashirviskas wrote:
             | 1 and partly 3 - I use btrfs with compression and deduping
             | for games and other stuff. Works really well and is
             | "invisible" to you.
        
               | bombela wrote:
               | dedup on btrfs requires to setup a cronjob. And you need
               | to pick one of the dedup too. It's not completely
               | invisible in my mind bwcause of this ;)
        
             | windward wrote:
             | >storage is cheap
             | 
             | It is. 1.1TB is both:
             | 
             | - objectively an incredibly huge amount of information
             | 
             | - something that can be stored for the cost of less than a
             | day of this industry's work
             | 
             | Half my reluctance to store big files is just an irrational
             | fear of the effort of managing it.
        
               | IanCal wrote:
               | > - something that can be stored for the cost of less
               | than a day of this industry's work
               | 
               | Far, far less even. You can grab a 1TB external SSD from
               | a good name for less than a days work at minimum wage in
               | the UK.
               | 
               | I keep getting surprised at just how cheap large storage
               | is every time I need to update stuff.
        
           | shwouchk wrote:
           | i was considering a similar setup, but i don't really trust
           | extensions. Im curious;
           | 
           | - Do you also archive logged in pages, infinite scrollers,
           | banking sites, fb etc? - How many entries is that? - How
           | often do you go back to the archive? is stuff easy to find? -
           | do you have any organization or additional process (eg
           | bookmarks)?
           | 
           | did you try integrating it with llms/rag etc yet?
        
             | eddd-ddde wrote:
             | You can just fork it, audit the code, add your own changes,
             | and self host / publish.
        
           | snthpy wrote:
           | Thanks. I didn't know about this and it looks great.
           | 
           | A couple of questions:
           | 
           | - do you store them compressed or plain?
           | 
           | - what about private info like bank accounts or health
           | issuance?
           | 
           | I guess for privacy one could train oneself to use private
           | browsing mode.
           | 
           | Regarding compression, for thousands of files don't all those
           | self-extraction headers add up? Wouldn't there be space
           | savings by having a global compression dictionary and only
           | storing the encoded data?
        
             | genewitch wrote:
             | By default, singlefile only saves when you tell it to, so
             | there's no worry about leaking personal information.
             | 
             | I haven't put the effort in to make a "bookmark server"
             | that will accomplish what singlefile does but on the
             | internet because of how well singlefile works.
        
             | d4mi3n wrote:
             | > do you store them compressed or plain?
             | 
             | Can't speak to your other issues but I would think the
             | right file system will save you here. Hopefully someone
             | with more insight can provide color here, but my
             | understanding is that file systems like ZFS were
             | specifically built for use cases like this where you have a
             | large set of data you want to store in a space efficient
             | manner. Rather than a compression dictionary, I believe
             | tech like ZFS simply looks at bytes on disk and compresses
             | those.
        
           | nyarlathotep_ wrote:
           | Are you automating this in some fashion? Is there another
           | extension you've authored or similar to invoke SingleFile
           | functionality on a new page load or similar?
        
           | dataflow wrote:
           | Have you tried MHTML?
        
             | RiverCrochet wrote:
             | SingleFile is way more convenient as it saves to a standard
             | HTML file. The only thing I know that easily reads
             | MHTML/.mht files is Internet Explorer.
        
               | dataflow wrote:
               | Chrome and Edge read them just fine? The format is
               | actually the same as .eml AFAIK.
        
               | RiverCrochet wrote:
               | I remember having issues but it could be because the
               | .mht's I had were so old I think I used Internet
               | Explorer's Save As... function to generate them.
        
               | dataflow wrote:
               | I've had such issues with them in the past too, yeah. I
               | never figured out the root cause. But in recent times I
               | haven't had issues, for whatever that's worth. (I also
               | haven't really tried to open many of the old files
               | either.)
        
         | macawfish wrote:
         | Use WARC: https://en.wikipedia.org/wiki/WARC_(file_format) with
         | WebRecorder: https://webrecorder.net/
        
           | shwouchk wrote:
           | warc is not a panacea; for example, gemini makes it super
           | annoying to get a transcript of your conversation, so i
           | started saving those as pdf and warc.
           | 
           | turns out that unlike most webpages, the pdf version is only
           | a single page of what is visible on screen.
           | 
           | turns out also that opening the warc immediately triggers a
           | js redirect that is planted in the page. i can still extract
           | the text manually - it's embedded there - but i cannot "just
           | open" the warc in my browser and expect an offline "archive"
           | version - im interacting with a live webpage! this sucks from
           | all sides - usability, privacy, security.
           | 
           | Admittedly, i don't use webrecorder - does it solve this
           | problem? did you verify?
        
             | weinzierl wrote:
             | Not sure if you tried that. Chrome has a take full page
             | screenshot command. Just open the command bar in dev tools
             | and search for "full" and you will fund it. Firefox has it
             | right in the context menu, no need for dev tools.
             | 
             | Unfortunately there are sites where it does not work.
        
               | eMPee584 wrote:
               | Apart from small UX nits, FF's screenshot feature is
               | great - it's just that storing a 2-15MiB bitmap copy of a
               | text medium still feels dirty to me every time.. would
               | much prefer a PDF export, page size matching the scroll
               | port, with embedded fonts and vectors and without print
               | CSS..
        
         | andai wrote:
         | Is there some kind of thing that turns a web page into a text
         | file? I know you can do it with beautiful soup (or like 4 lines
         | of python stdlib), but I usually need it on my phone, where I
         | don't know a good option.
         | 
         | My phone browser has a "reader view" popup but it only appears
         | sometimes, and usually not on pages that need it!
         | 
         | Edit: Just installed w3m in Termux... the things we can do
         | nowadays!
        
           | XorNot wrote:
           | You want Zotero.
           | 
           | It's for bibliographies, but it also archives and stores web
           | pages locally with a browser integration.
        
             | _huayra_ wrote:
             | I frankly don't know how I'd collect any useful info
             | without it.
             | 
             | I'm sure there are bookmark services that also allow notes,
             | but the tagging, linking related things, etc, all in the
             | app is awesome, plus the ability to export bib tex for
             | writing a paper!
        
         | taeric wrote:
         | That assumption isn't true of any sources? Things flat out
         | change. Some literally, others more in meaning. Some because
         | they are corrected, but there are other reasons.
         | 
         | Not that I don't think there is some benefit in what you are
         | attempting, of course. A similar thing I still wish I could do
         | is to "archive" someone's phone number from my contact list. Be
         | it a number that used to be ours, or family/friends that have
         | passed.
        
         | rubit_xxx16 wrote:
         | > Before 2010 I had this unquestioned assumption that links are
         | supposed to last forever
         | 
         | Any site/company whatsoever of this world (and most) that
         | promises that anything will last forever is seriously deluded
         | or intentionally lying, unless their theory of time is
         | different than that of the majority.
        
         | mycall wrote:
         | Is there some browser extension that automatically goes to
         | web.archive.org if the link timesout?
        
           | theblazehen wrote:
           | I use the Resurrect Pages addon
        
       | diggan wrote:
       | URLs (uniform resource locator) cannot ever last forever, as it's
       | a _location_ and locations can 't last forever :)
       | 
       | URIs however, can be made to last forever! Also comes with the
       | added benefit that if you somehow integrate content-addressing
       | into the identifier, you'll also be able to safely fetch it from
       | any computer, hostile or not.
        
         | 90s_dev wrote:
         | I've been making websites for almost 30 years now.
         | 
         | I still don't know the difference between URI and URL.
         | 
         | I'm starting to think it doesn't matter.
        
           | diggan wrote:
           | > I still don't know the difference between URI and URL.
           | 
           | One is a location, the other one is a ID. Which is which is
           | referenced in the name :)
           | 
           | And sure, it doesn't matter as long as you're fine with
           | referencing _locations_ rather than the actual data, and
           | aware of the tradeoffs.
        
           | Sesse__ wrote:
           | It doesn't matter.
           | 
           | URI is basically a format and nothing else. (foo://bar123
           | would be a URI but not a URL because nothing defines what
           | foo: is.)
           | 
           | URLs and URNs are thingies using the URI format;
           | https://news.ycombinator.com is a URL (in addition to being a
           | URI) because there's an RFC that specifies that https: means
           | and how to go out and fetch them.
           | 
           | urn:isbn:0451450523 (example cribbed from Wikipedia) is an
           | URN (in addition to being an URI) that uniquely identifies a
           | book, but doesn't tell you how to go find that book.
           | 
           | Mostly, the difference is pedantic, given that URNs never
           | took off.
        
             | 90s_dev wrote:
             | It's almost like URNs were born in an _urn_! [1]
             | 
             | [1]: _ba dum tss_
        
           | marcosdumay wrote:
           | An URI is an standard way to write names of documents.
           | 
           | And URL is an URI that also tells you how to find the
           | document.
        
           | layer8 wrote:
           | URLs in the strict sense are a subset of URIs. They specify a
           | mechanism (like HTTP or FTP) for how to access the referenced
           | resource. The other type of URIs are opaque IDs, like
           | doi:10.1000/182 or urn:isbn:9780141036144. These technically
           | can't expire, though that doesn't mean you'll be able to
           | access what they reference.
           | 
           | However, "URL" in the broader sense is used as an umbrella
           | term for URIs and IRIs (internationalized resource
           | identifiers), in particular by WHATWG.
           | 
           | In practice, what matters is the specific URI scheme ("http",
           | "doi", etc.).
        
           | immibis wrote:
           | A URL tells you where to get some data, like
           | https://example.com/index.html
           | 
           | A URN tells you which data to get (usually by hash or by some
           | big centralized registry), but not how to get it. DOIs in
           | academia, for example, or RFC numbers. Magnet links are
           | borderline.
           | 
           | URIs are either URLs or URNs. URNs are rarely used since
           | they're less practical since browsers can't open them - but
           | note that in any case each URL scheme (https) or URN scheme
           | (doi) is unique - there's no universal way to fetch one
           | without specific handling for each supported scheme. So it's
           | not actually that unusual for a browser not to be able to
           | open a certain scheme.
        
         | postoplust wrote:
         | For example: IPFS URI's are content addresses
         | 
         | https://docs.ipfs.tech/
        
         | bowsamic wrote:
         | Does this have any actual grounding in reality or does your
         | lack of suggestion for action confirm my suspicion that this is
         | just a theoretical wish?
        
           | diggan wrote:
           | > Does this have any actual grounding in reality
           | 
           | Depends on your use case I suppose. For things I want to
           | ensure I can reference _forever_ (theoretical forever), then
           | using _location_ for that reference feels less than ideal, I
           | cannot even count the number of dead bookmarks on both hands
           | and feet, so  "link rot" is a real issue.
           | 
           | If those bookmarks instead referenced the actual content (via
           | content-addressing for example), rather than the location,
           | then those would still work today.
           | 
           | But again, not everyone cares about things sticking around,
           | not all use cases require the reference to continue being
           | alive, and so on, so if it's applicable to you or not is
           | something only you can decide.
        
       | olalonde wrote:
       | > This article was written by a human, but links were suggested
       | by and grammar checked by an LLM.
       | 
       | This is the second time today I've seen a disclaimer like this.
       | Looks like we're witnessing the start of a new trend.
        
         | tester756 wrote:
         | It's crazy that people feel that they need to put such
         | disclaimers
        
           | layer8 wrote:
           | It's more a claimer than a disclaimer. ;)
        
             | danadam wrote:
             | I'd probably call it "disclosure".
        
           | psychoslave wrote:
           | This comment was written by a human with no check by any
           | automaton, but how will you check that?
        
             | acquisitionsilk wrote:
             | Business emails, other comments here and there of a more
             | throwaway or ephemeral nature - who cares if LLMs helped?
             | 
             | Personal blogs, essays, articles, creative writing,
             | "serious work" - please tell us if LLMs were used, if they
             | were, and to what extent. If I read a blog and it seems
             | human and there's no mention of LLMs, I'd like to be able
             | to safely assume it's a human who wrote it. Is that so much
             | to ask?
        
             | qingcharles wrote:
             | That's exactly what a bot would say!
        
           | actuallyalys wrote:
           | It makes sense to me. After seeing a bunch of AI slop, people
           | started putting no AI buttons and disclaimers. Then some
           | people using AI for little things wanted to clarify it wasn't
           | AI generated wholesale without falsely claiming AI wasn't
           | involved at all.
        
         | chii wrote:
         | i dont find the need to have such a disclaimer at all.
         | 
         | If the content can stand on its own, then it is sufficient. If
         | the content is slop, then why does it matter that it is an ai
         | generated slop vs human generated slop?
         | 
         | The only reason anyone wants to know/have the disclaimer is if
         | they cannot themselves discern the quality of the contents, and
         | is using ai generation as a proxy for (bad) quality.
        
           | johannes1234321 wrote:
           | For the author it matters. To which degree do they want to be
           | associated with the resulting text.
           | 
           | And I differentiate between "Matt Godbolt" who is an expert
           | in some areas and in my experience careful about avoiding
           | wrong information and an LLM which may produce additional
           | depth, but may also make up things.
           | 
           | And well, "discern the quality of the contents" - I often
           | read texts to learn new things. On new things I don't have
           | enough knowledge to qualify the statements, but I may have
           | experience with regards to the author or publisher.
        
             | chii wrote:
             | and what do you do to make this differentiation if what
             | you're reading is a scientific paper?
        
               | johannes1234321 wrote:
               | Same?
               | 
               | (Some researcher's names I know, some institutions
               | published good reports in the past and that I take into
               | consideration on how much I trust it ... and since I'm
               | human I trust it more if it confirms my view and less if
               | it challenges it or put in different words: there are
               | many factors going into subjective trust)
        
       | 90s_dev wrote:
       | Some famous programmer once wrote about how links should last
       | forever.
       | 
       | He advocated for /foo/bar with no extension. He was right about
       | not using /foo/bar.php because the _implementation_ might change.
       | 
       | But he was wrong, it should be /foo/bar.html because the _end-
       | result_ will always be HTML when it 's served by a browser,
       | whether it's generated by PHP, Node.js or by hand.
       | 
       | It's pointless to prepare for some hypothetical new browser that
       | uses an _alternate_ language _other than HTML_ and that _doesn
       | 't_ use HTML.
       | 
       | Just use .html for your pages and stop worrying about how to
       | correctly convert foo.md to foo/index.html and configure nginx
       | accordingly.
        
         | Dwedit wrote:
         | mod_rewrite means you can redirect the .php page to something
         | else if you stop using php.
        
           | shakna wrote:
           | Unless mod_rewrite is disabled, because it has had a few
           | security bugs over the years. Like last year. [0]
           | 
           | [0] https://nvd.nist.gov/vuln/detail/CVE-2024-38475
        
         | 90s_dev wrote:
         | Found it: https://www.w3.org/Provider/Style/URI
         | 
         | Why did I think Joel Spolsky or Jeff Atwood wrote it?
        
         | Sesse__ wrote:
         | > Some famous programmer once wrote about how links should last
         | forever.
         | 
         | You're probably thinking of W3C's guidance:
         | https://www.w3.org/Provider/Style/URI
         | 
         | > But he was wrong, it should be /foo/bar.html because the end-
         | result will always be HTML
         | 
         | 20 years ago, it wasn't obvious at all that the end-result
         | would always be HTML (in particular, various styled forms of
         | XML was thought to eventually take over). And in any case,
         | there's no reason to have the content-type in the URL; why
         | would the user care about that?
        
           | 90s_dev wrote:
           | There's strong precedence for associating file extensions
           | with content types. And it allows static files to map 1:1 to
           | URLs.
           | 
           | I agree though that I was too harsh, I didn't realize it was
           | written in 1998 when HTML was still new. I probably first
           | read it around 2010.
           | 
           | But now that we have hindsight, I think it's safe to say
           | .html files will continue to be supported for the next 50
           | years.
        
         | crackalamoo wrote:
         | I use /foo/bar/ with the trailing slash because it works better
         | with relative URLs for resources like images. I could also use
         | /foo/bar/index.html but I find the former to be cleaner
        
           | 90s_dev wrote:
           | It's always bothered me in a small way that github doesn't
           | honor this:
           | 
           | https://github.com/sdegutis/bubbles
           | 
           | https://github.com/sdegutis/bubbles/
           | 
           | No redirect, just two renders!
           | 
           | It bothers me first because it's semantically different.
           | 
           | Second and more importnatly, because it's always such a pain
           | to configure that redirect in nginx or whatever. I eventually
           | figure it out each time, after many hours wasted looking it
           | up all over again and trial/error.
        
         | esafak wrote:
         | If it's always .html, it's cruft; get rid of it. And what if
         | it's not HTML but JSON? Besides, does the user care? Berners-
         | Lee was right.
         | 
         | https://www.w3.org/Provider/Style/URI
        
           | 90s_dev wrote:
           | If it's JSON then name it /foo/bar.json, and as a bonus you
           | can _also_ have  /foo/bar.html!
           | 
           | You say the extension is cruft. That's your opinion. I don't
           | share it.
        
             | marcosdumay wrote:
             | The alternative is to declare what you want on the Accept
             | header, what is way less transparent but is more flexible.
             | 
             | I never saw any site where the extra flexibility added any
             | value. So, right now I do favor the extension.
        
             | kelnos wrote:
             | At the risk of committing the appeal-to-authority fallacy,
             | it's also the opinion of Tim Berners-Lee, which I would
             | hope carries at least some weight.
             | 
             | The way I look at it is that yes, the extension can be
             | useful for requesting a particular file format (IMO the
             | Accept header is not particularly accessible, especially if
             | you are just a regular web browser user). But if you have a
             | default/canonical representation, then you should give that
             | representation in response to a URL that has no extension.
             | And when you link to that document in a representation-
             | neutral way, you should link without the extension.
             | 
             | That doesn't stop you from _also_ serving that same content
             | from a URL that includes the extension that describes the
             | default /canonical representation. And people who want to
             | link to you and ensure they get a particular representation
             | can use the extension in their links. But someone who
             | doesn't care, and just wants the document in whatever
             | format the website owner recommends, should be able to get
             | it without needing to know the extension. For those
             | situations, the extension is an implementation detail that
             | is irrelevant to most visitors.
        
               | 90s_dev wrote:
               | > it's also the opinion of Tim Berners-Lee, which I would
               | hope carries at least some weight
               | 
               | Not at all. He's famous for helping create the initial
               | version of JavaScript, which was a fairly even mixture of
               | great and terrible. Which means his initial contributions
               | to software were not extremely noteworthy, and he just
               | happened to be in the right time and right place, since
               | something like JavaScript was apparently inevitable.
               | Plus, I can't think of any of his major contributions to
               | software in the decades since. So no, I don't even think
               | that's really an appeal to authority.
        
               | wolfgang42 wrote:
               | _> [Tim Berners-Lee is] famous for helping create the
               | initial version of JavaScript_
               | 
               | You may be thinking of Brendan Eich? Berners-Lee is
               | famous for HTML, HTTP, the first web browser, and the
               | World Wide Web in general; as far as I know he had
               | nothing to do with JS.
        
       | swyx wrote:
       | idk man how can URLs last forever if it costs money to keep a
       | domain name alive?
       | 
       | i also wonder if url death could be a good thing. humanity makes
       | special effort to keep around the good stuff. the rest goes into
       | the garbage collection of history.
        
         | johannes1234321 wrote:
         | Historians however would love to have more garbage from
         | history, to get more insights on "real" life rather than just
         | the parts one considered worth keeping.
         | 
         | If I could time jump it would be interesting to see how
         | historians inna thousand years will look back at our period
         | where a lot of information will just disappear without traces
         | as digital media rots.
        
           | swyx wrote:
           | we'd keep the curiosities around, like so much Ea Nasir Sells
           | Shit Copper. we have room for like 5-10 of those per century.
           | not like 8 billion. much of life is mundane.
        
             | rightbyte wrote:
             | Imagine being judged 1000s of year later by some Yelp
             | reviews like poor Nasir.
        
             | woodruffw wrote:
             | > much of life is mundane.
             | 
             | The things that make (or fail to make) life mundane at some
             | point in history are themselves subjects of significant
             | academic interest.
             | 
             | (And of course we have no way to tell what things are
             | "curiosities" or not. Preservation can be seen as a way to
             | minimize survivorship bias.)
        
             | cortesoft wrote:
             | Today's mundane is tomorrow's fascination
        
             | shakna wrote:
             | We also have rooms full of footprints. In a thousand years,
             | your mundane is the fascination of the world.
        
             | johannes1234321 wrote:
             | Yes, at the same time we'd be excited about more mundane
             | sources from history. The legends about the mighty are
             | interesting, but what do we actually know about every day
             | love from people a thousand years ago? Very little. Most
             | things are speculation based on objects (tools etc.), on
             | structure of buildings and so on. If we go back just few
             | hundred years there is (using European perspective) a
             | somewhat interesting source in court cases from legal
             | conflicts between "average" people, but in older times more
             | or less all written material is on the powerful, be it
             | worldly or religious power, which often describes the
             | rulers in an extra positive way (from their perspective)
             | and their opponents extra weak.
             | 
             | Having more average sources certainly helps and we now
             | aren't good judges on what will be relevant in future. We
             | can only try to keep some of everything.
        
           | mrguyorama wrote:
           | I regularly wonder if modern educated people do not journal
           | as much as previous century educated people who were kind of
           | rare.
           | 
           | Maybe we should get a journaling boom going.
           | 
           | But it has to be written, because pen and paper is literally
           | ten times more durable than even good digital storage.
        
             | swyx wrote:
             | > pen and paper is literally ten times more durable than
             | even good digital storage.
             | 
             | citation needed lol. data replication >>>> paper's single
             | point of failure.
        
               | johannes1234321 wrote:
               | The question is: What is more likely in 1000 years to
               | still exist and being readable. The papers caught in some
               | lost ruins or some form of storage media?
               | 
               | Sure, as long as the media is copied there is a chance of
               | survival, but will this then be "average" material or
               | things we now consider interesting, only? Will the chain
               | hold or will it become as uninteresting as many other
               | things were over time? Will the Organisation doing it be
               | funded? Will the location where this happens be spares
               | from war?
               | 
               | For today's historians the random finds are important
               | artifacts to understand "average" people's lives as the
               | well preserved documents are legends on the mighty
               | people.
               | 
               | Having lots of material all over gives a chance for some
               | to survive and from 40 years or so back we were in a good
               | spot. Lots of paper allover about everything. Analog
               | vinyl records, which might be readable in a future to
               | learn about our music. But now all on storage media,
               | where many forms see data loss, where the format is
               | outdated and (when looking from a thousand years away)
               | fast change of data formats etc.
        
               | KPGv2 wrote:
               | > What is more likely in 1000 years to still exist and
               | being readable. The papers caught in some lost ruins or
               | some form of storage media?
               | 
               | The storage media. We have evidence to support this:
               | 
               | * original paper works from 1000 years ago are _insanely
               | rare_
               | 
               | * more recent storage media provide much more content
               | 
               | How many digital copies of Beowulf do we have? Millions?
               | 
               | How many paper copies from 1000 years ago? _one_
               | 
               | how many other works from 1000 years ago do we have zero
               | copies of thanks to paper's fragility and thus don't even
               | know existed? probably a _lot_
        
               | johannes1234321 wrote:
               | However that one paper, stating a random fact, might tell
               | more about the people than an epic poem.
               | 
               | You can't have a fully history without either.
        
               | tredre3 wrote:
               | > The question is: What is more likely in 1000 years to
               | still exist and being readable. The papers caught in some
               | lost ruins or some form of storage media?
               | 
               | But that's just survivorship bias. The vast vast vast
               | majority of all written sheets of paper have been lost to
               | history. Those deemed worthy were carefully preserved,
               | some of the rest was preserved by a fluke. The same is
               | happening with digital media.
        
         | internetter wrote:
         | > i also wonder if url death could be a good thing. humanity
         | makes special effort to keep around the good stuff. the rest
         | goes into the garbage collection of history.
         | 
         | agreed. formerly wrote some thoughts here:
         | https://boehs.org/node/internet-evanescence
        
       | s17n wrote:
       | URLs lasting forever was a beautiful dream but in reality, it
       | seems that 99% of URLs don't in fact last forever. Rather than
       | endlessly fighting a losing battle, maybe we should build the
       | technology around the assumption that infrastructure isn't
       | permanent?
        
         | nonethewiser wrote:
         | >maybe we should build the technology around the assumption
         | that infrastructure isn't permanent?
         | 
         | Yes. Also not using a url shortener as infrastructure.
        
         | hoppp wrote:
         | Yes.
         | 
         | domain names often exchange hands and a URL that is supposed to
         | last forever can turn into malicious phishing link over time.
        
           | emaro wrote:
           | In theory a content-addressed system like IPFS would be the
           | best: if someone online still has a copy, you can get it too.
        
             | mananaysiempre wrote:
             | It feels as though, much like cryptography in general
             | reduces almost all confidentiality-adjacent problems to key
             | distribution (which is damn near unsolvable in large
             | uncoordinated deployments like Web PKI or PGP), content-
             | addressable storage reduces almost all data-persistence-
             | adjacent problems to maintenance of mutable name-to-hash
             | mappings (which is damn near unsolvable in large
             | uncoordinated deployments like BitTorrent, Git, or
             | IP[FN]S).
        
               | dreamcompiler wrote:
               | DNS seems to solve the problem of a decentralized
               | loosely-coordinated mapping service pretty well.
        
               | emaro wrote:
               | True, but then you're back on square one. Because it's
               | not guaranteed that using a (DNS) name will point to the
               | same content forever.
        
               | hoppp wrote:
               | But then all content should be static and never update?
               | 
               | If you serve an SPA via IPFS, the SPA still needs to
               | fetch the data from an endpoint which could go down or
               | change
               | 
               | Even if you put everything on a blockchain, an RPC
               | endpoint to read the data must have a URL
        
               | mananaysiempre wrote:
               | > But then all content should be static and never update?
               | 
               | And thus we arrive at the root of the conflict. Many
               | users (that care about this kind of thing) want to
               | publications that they've seen to stay where they've seen
               | them; many publishers have become accustomed to being
               | able to memory-hole things (sometimes for very real
               | safety reasons; often for marketing ones). That on top of
               | all the usual problems of maintaining a space of human-
               | readable names.
        
             | immibis wrote:
             | Note that IPFS is now on the EU Piracy Watchlist which may
             | be a precursor to making it illegal.
        
         | jjmarr wrote:
         | URL identify the location of a resource on a network, not the
         | resource itself, and so are not required to be permanent or
         | unique. That's why they're called "uniform resource locators".
         | 
         | This problem was recognized in 1997 and is why the Digital
         | Object Identifier was invented.
        
         | dreamcompiler wrote:
         | URNs were supposed to solve that problem by separating the
         | identity of the thing from the location of the thing.
         | 
         | But they never became popular and then link shorteners
         | reimplemented the idea, badly.
         | 
         | https://en.m.wikipedia.org/wiki/Uniform_Resource_Name
        
       | devnullbrain wrote:
       | >despite Google solemnly promising that "all existing links will
       | continue to redirect to the intended destination," it went read-
       | only a few years back, and now they're finally sunsetting it in
       | August 2025
       | 
       | It's become so trite to mention that I'm rolling my eyes at
       | myself just for bringing it up again but... come on! How bad can
       | it be before Google do something about the reputation this
       | behaviour has created?
       | 
       | Was Stadia not an expensive enough failure?
        
         | iainmerrick wrote:
         | I'm very surprised, even though I shouldn't be, that they're
         | actually shutting the read-only goo.gl service down.
         | 
         | For other obsolete apps and services, you can argue that they
         | require some continual maintenance and upkeep, so keeping them
         | around is expensive and not cost-effective if very few people
         | are using them.
         | 
         | But a URL shortener is super simple! It's just a database, and
         | in this case we don't even need to write to it. It's literally
         | one of the example programs for AWS Lambda, intentionally
         | chosen because it's really simple.
         | 
         | I guess the goo.gl link database is probably really big, but
         | even so, this is Google! Storage is cheap! Shutting it down is
         | such a short-sighted mean-spirited bean-counter decision, I
         | just don't get it.
        
       | creatonez wrote:
       | There's something poetic about abusing a link shortener as a
       | database and then later having to retrieve all your precious
       | links from random corners of the internet because you've lost the
       | original reference.
        
         | nonethewiser wrote:
         | Didnt they just use the link shortener to compress the url?
         | They used their url as the "database" (ie holding the compiler
         | state).
        
           | Arcuru wrote:
           | They didn't store anything themselves since they encoded the
           | full state in the urls that were given out. So the link
           | shortener was the only place where the "database", the urls,
           | were being stored.
        
             | nonethewiser wrote:
             | Yeah but the purpose of the url shortener was not to store
             | the data, it was to shorten the url. The fact that the data
             | was persisted on google's sever somewhere is incidental.
             | 
             | In other words, every shortened url is "using the url
             | shortener as a database" in that sense. Taking a url with a
             | long query parameter and using a url shortener to shorten
             | it does not constitute "abusing a link shortener as a
             | database."
        
               | cortesoft wrote:
               | Except in this case the url IS the data, so storing the
               | url is the same as storing the data.
        
               | nonethewiser wrote:
               | Its incidental. The state is in the url which is only
               | shortened because its so long. Google's url shortener is
               | not needed to store the data.
               | 
               | It's simply a normal use-case for a url shortener. A long
               | url, usually because of some very large query parameter,
               | which gets mapped to a short one.
        
         | rs186 wrote:
         | Shortening long URLs is the intended use case for a ... URL
         | shortener.
         | 
         | The real abusers are the people who use a shortener to hide
         | scam/spam/illegal websites behind a common domain and post it
         | everywhere.
        
           | creatonez wrote:
           | These are not just "long URLs". These are URLs where the
           | _entire_ content is stored in the fragment suffix of the URL.
           | They are blobs, and always have been.
        
       | wrs wrote:
       | I hate to say it, but unless there's a really well-funded
       | foundation involved, Compiler Explorer and godbolt.org won't last
       | forever either. (Maybe by then all the info will have been
       | distilled into the 487 quadrillion parameter model of
       | everything...)
        
         | layer8 wrote:
         | Thanks to the no-hiding theorem, the information will live
         | forever. ;)
        
         | mattgodbolt wrote:
         | We've done alright so far: 13 years this week. I have funding
         | for another year and change even assuming growth and all our
         | current sponsors pull out.
         | 
         | I /am/ thinking about a foundation or similar though: the
         | single point of failure is not funding but "me".
        
         | badmintonbaseba wrote:
         | Well, that's true, but at least now compiler explorer links
         | will stop working when compiler explorer vanishes, but not
         | before that.
         | 
         | I think the most valuable long-living compiler explorer links
         | are in bug reports. I like to link to compiler explorer in bug
         | reports for convenience, but I also include the code in the
         | report itself, and specify what compiler I used with what
         | version to reproduce the bug. I don't expect compiler explorer
         | to vanish anytime soon, but making bug reports self-contained
         | like this protects against that.
        
       | layer8 wrote:
       | I find it somewhat surprising that it's worth the effort for
       | Google to shut down the read-only version. Unless they fear some
       | legal risks of leaving redirects to private links online.
        
         | actuallyalys wrote:
         | Hard to say from the outside, but it's possible the service
         | relies on some outdated or insecure library, runtime, service,
         | etc. they want to stop running. Although frankly it seems just
         | as possible it's a trivial expense and they're cutting it
         | because it's still a net expense, goodwill and past promises be
         | dammed.
        
           | Scaevolus wrote:
           | Typically services like these are side projects of just a few
           | Google employees, and when the last one leaves they are shut
           | down.
        
           | mmooss wrote:
           | Another possibility is that it's a distraction - whatever the
           | marginal costs, there's a fixed cost to each system in terms
           | of cognitive overhead, if not documentation, legal issues
           | (which can change as laws and regulations change), etc.
           | Removing distractions is basic management.
        
           | mbac32768 wrote:
           | yeah but nobody wants to put "spent two months migrating
           | goo.gl url shortener to work with Sisyphus release manager
           | and Dante 7 SRE monitoring" in their perf packet
           | 
           | that's a negative credit activity
        
       | sdf4j wrote:
       | > One of my founding principles is that Compiler Explorer links
       | should last forever.
       | 
       | And yet... that was a very self-destructive decision.
        
         | mattgodbolt wrote:
         | I'm not sure why so?
        
           | MyPasswordSucks wrote:
           | Because URL shortening is relatively trivial to implement,
           | and instead of just doing so on their own end, they decided
           | to rely on a third-party service.
           | 
           | Considering link permanence was a "founding principle",
           | that's just unbelievably stupid. If I decide one of my
           | "founding principles" is that I'm never going to show up at
           | work with a dirty windshield, then I shouldn't rely on the
           | corner gas station's squeegee and cleaning fluid.
        
             | gwd wrote:
             | First of all, _how the links are made permanent_ has
             | nothing to do with the principle that _they should be made
             | permanent_.
             | 
             | There seemed to be two principles at play here:
             | 
             | 1. Links should always work
             | 
             | 2. We don't want to store any user data
             | 
             | #2 is a bit complicated, because although it sounds nice,
             | it has two potential justifications:
             | 
             | 2a: For privacy reasons, don't store any user data
             | 
             | 2b: To avoid having to think through the implications of
             | storing all those things ourselves
             | 
             | I'm not sure how much each played into their thinking;
             | possibly because of a lack of clarity, 2a sounded nice and
             | 2b was the real motivation.
             | 
             | I'd say 2a is a reasonable aspiration; but using a link
             | shortener changed it from "don't store any user data" to
             | "store the user data somewhere we can't easily get at it",
             | which isn't the same thing.
             | 
             | 2b, when stated more clearly, is obviously just taking on
             | technical debt and adding dependencies which may come back
             | to bite you -- as it did.
        
       | sedatk wrote:
       | Surprisingly, purl.org URLs still work after a quarter century,
       | thanks to Internet Archive.
        
       | 2YwaZHXV wrote:
       | Presumably there's no way to get someone at Google to query their
       | database and find all the shortened links that go to godbolt.org?
        
       | devrandoom wrote:
       | > despite Google solemnly promising ...
       | 
       | I'm pretty sure the lore says that a solemn promise from Google
       | carries the exact same value as a prostitute saying she likes
       | you.
        
       | nssnsjsjsjs wrote:
       | The collolary of URLs that last forever is we have both forever
       | storage (costs money forever) and forever institutional care and
       | memory.
       | 
       | Where URLs may last longer is where they are not used for the RL
       | bit. But more like a UUID for namespacing. E.g. in XML, Java or
       | Go.
        
       | mbac32768 wrote:
       | it seems a bit crazy to try to avoid storing a relatively small
       | amount of data when a link is shared when storage costs and
       | bandwidth costs are rapidly dropping with time
       | 
       | but perhaps I don't appreciate how much traffic godbolt gets
        
         | mattgodbolt wrote:
         | It was a simpler time and I didn't want the responsibility of
         | storing other people's data. We do now though!
        
           | mattgodbolt wrote:
           | Oh and traffic: https://stats.compiler-explorer.com/
        
       | Ericson2314 wrote:
       | The only type of reference that lasts forever is a content
       | address.
       | 
       | We should be using more of them.
        
       | rurban wrote:
       | He missed the archive.org crawl for those links in the blog post.
       | they have them stored also now. https://github.com/compiler-
       | explorer/compiler-explorer/discu...
        
       | sebstefan wrote:
       | >Over the last few days, I've been scraping everywhere I can
       | think of, collating the links I can find out in the wild, and
       | compiling my own database of links1 - and importantly, the URLs
       | they redirect to. So far, I've found 12,000 links from scraping:
       | 
       | >Google (using their web search API)
       | 
       | >GitHub (using their API)
       | 
       | >Our own (somewhat limited) web logs
       | 
       | >The archive.org Stack Overflow data dumps
       | 
       | >Archive.org's own list of archived webpages
       | 
       | You're an angel Matt
        
       | 3cats-in-a-coat wrote:
       | Nothing lasts forever.
       | 
       | I've pondered that a lot in my system design which bears some
       | resemblance to the principles of REST.
       | 
       | I have split resources in ephemeral (and mutable), and immutable,
       | reference counted (or otherwise GC-ed), which are persistent
       | while referred to, but collected when no one refers to them.
       | 
       | In a distributed system the former is the default, the latter can
       | exist in little islands of isolated context.
       | 
       | You can't track references throughout the entire world. The only
       | thing that works is timeouts. But those are not reliable. Nor you
       | can exist forever, years after no one needs you. A system needs
       | its parts to be useful, or it dies full of useless parts.
        
       ___________________________________________________________________
       (page generated 2025-05-29 23:01 UTC)