[HN Gopher] Hyperlink Maximalism (2022)
       ___________________________________________________________________
        
       Hyperlink Maximalism (2022)
        
       Author : Tomte
       Score  : 75 points
       Date   : 2023-07-25 17:40 UTC (5 hours ago)
        
 (HTM) web link (thesephist.com)
 (TXT) w3m dump (thesephist.com)
        
       | thesephist wrote:
       | Hey HN! Author here.
       | 
       | For the many Obsidian users here, wanted to share an Obsidian
       | demo/plugin that I saw recently by Justin Smith[0] that I think
       | faithfully carries over a lot of what I liked about this idea
       | into the Obsidian land, complete with a semantic index w/
       | language models.
       | 
       | If you're an Obsidian user, do check out the demo. I can't take
       | credit for any part of building it, but it's really cool to see
       | the idea in action :)
       | 
       | [0] https://twitter.com/justindsmith/status/1679978286955532296
        
         | Slow_Hand wrote:
         | Excellent! This was my first question upon reading, whether it
         | can be integrated into my Obsidian Database.
        
       | Santosh83 wrote:
       | One of the HN regular contributor's personal site is similar.
       | Extensively cross-linked and hovering over any links opens in its
       | own popup window. A unique site.
        
       | j2kun wrote:
       | A well written document should focus the reader. When everything
       | is a hyperlink and a "thought map" then you'll just end up being
       | distracted and either not reading the original doc, or clicking
       | on nothing and ignoring the deluge of hyperlinks. The demos in
       | this article are heavily obstructed by all the visual noise.
       | 
       | If something is important enough, the author will discuss it
       | directly in the document. Hyperlinks can provide context when the
       | reader may not have the context assumed of the intended audience,
       | or they're a poor man's bibliographic citation (because the links
       | always break eventually), but otherwise it's a signal that it's
       | _not_ important.
        
       | carlosjobim wrote:
       | Two problems:
       | 
       | 1. Most people who publish their writings online are too lazy to
       | hyperlink, even where it is extremely appropriate. Even intra-
       | site links.
       | 
       | 2. Link rot is real. Maintainers of web sites are too arrogant to
       | make redirects, and happily destroy all their incoming links when
       | playing with a new framework. So now links from your site lead to
       | 404s, because of somebody on the other end, and you have to scan
       | your outgoing links every few months and change them.
       | 
       | In a way Mac OS and iOS have already implemented the "everything
       | is a link"-approach, with their look up feature. It's much more
       | powerful than it used to be, and you can select any word or words
       | anywhere and find all kinds of information.
        
       | nologic01 wrote:
       | Love the fact that people still think and work on human centric
       | computing.
       | 
       | There should clearly be some low hanging fruit in revisiting the
       | hypertext concept with the benefit of a few decades of further
       | tech development.
       | 
       | Its not clear though which aspect(s) would be the most
       | beneficial. Eg when scanning text for information and
       | connections, our context is frequently an important factor, but
       | that context is not available to the computer.
        
       | jkestner wrote:
       | Should this be a system-level service in many circumstances? I
       | love that with a three-finger tap, I get an instant dictionary
       | definition of whatever word I'm hovering over (not even
       | selected). I want that reliable availability, but more expansive.
        
       | mananaysiempre wrote:
       | > my computer should search across everything I've read and some
       | small high-quality subset of the Web
       | 
       | Even without the contextual search--which is initeresting mind
       | you!--I want this part, and I've wanted it for a long long time.
       | 
       | I have it for journal articles simply by the virtue of
       | obsessively saving and filing every PDF I've read for the last
       | decade, but for the Web I don't know how I could just plain index
       | everything I've read (and not just everything in
       | Pocket/Wallabag/etc). For example, is there anything that can
       | pull links from Firefox Sync and save/index extracted text from
       | them in a halfway reliable fashion? (I don't expect it to be able
       | to bypass yucky stuff like CAPTCHAs on archive.is.)
       | 
       | I know there are a handful of WARC-ecosystem tools, but AFAIU
       | these are rarely that automatic, are tied to a browser plugin as
       | opposed to just pulling the URLs from history sync (yes, I'm
       | unreasonably fond of Epiphany/GNOME Web), and kind of heavy. Am I
       | wrong? Any alternatives?
        
         | jazzyjackson wrote:
         | There's a different strategy, it's a VC thing that's mac only
         | but pretty fun to play with:
         | 
         | http://rewind.ai takes screenshots continuously and uses mac's
         | system OCR to index everything you've ever looked at, whether
         | that's web, code, or messages. When you search it pulls up
         | screenshots that match.
        
         | rhn_mk1 wrote:
         | > plain index everything I've read
         | 
         | Recoll together with the extension can be configured to index
         | every single web page you browse.
         | 
         | https://addons.mozilla.org/en/firefox/addon/recoll-we/
        
         | groby_b wrote:
         | https://chromewebstore.google.com/detail/habonpimjphpdnmcfka...
         | (or https://github.com/tjhorner/archivebox-exporter for source)
         | 
         | Pushes your history to ArchiveBox, which does the heavy lifting
         | storing/processing the content.
         | 
         | Alas, might not work with Epiphany because there's no complete
         | extension support.
         | 
         | But IIRC, it stores its urls in $XDG_DATA_HOME/epiphany/ephy-
         | history.db - so a bit of sqlite and ArchiveBox might do the
         | trick for you.
         | 
         | Note: I'm running something similar, but find that I'd rather
         | not rely on my history, I tend to click on a lot of garbage ;)
         | You might want to curate a bit.
        
         | janalsncm wrote:
         | I built something like this over a weekend. Saves every page
         | title, html body, and time and lets you search them in the
         | chrome new tab page. It also indexes on the fly.
         | 
         | I have it installed on my browser and I'll probably release it
         | when I can figure out how to make it more useful.
        
       | zetalyrae wrote:
       | The next best thing: the Common Lisp HyperSpec:
       | http://www.lispworks.com/documentation/lw50/CLHS/Body/05_aa....
        
       | HotGarbage wrote:
       | Sounds familiar:
       | https://en.wikipedia.org/wiki/Adequacy.org#Adequacy_style
        
       | andai wrote:
       | >everything should be hyperlinked
       | 
       | >almost nothing in the article is hyperlinked
       | 
       | What did he mean by this?
        
         | mananaysiempre wrote:
         | The first part is more the hook than the thesis, as the rest of
         | TFA explains. (The first paragraphs outright admit that it's
         | not actually that defensible if taken literally.)
         | 
         | The actual point is that every potential term and phrase in a
         | text you're reading should be preemptively searched for (across
         | e.g. your web browsing history, reading list, etc.), marked up
         | inline according to the rough potential usefulness of the
         | results, and in any case those results should be made quickly
         | accessible through user action, whatever the machine's opinion
         | on their usefulness.
         | 
         | So less manual hyperlinking and more web searches. (Manual
         | hyperlinks have their place as well, but that place is things
         | the author thinks especially relevant, not literally every
         | point they want to make or refer to.)
        
       | qez2 wrote:
       | [dead]
        
       | 1shooner wrote:
       | I don't understand why, with all the UI/UX experience we have
       | with hypertext, that have not moved the semantic of a hyperlink
       | beyond a dumb one-way pointer.
       | 
       | Assuming that a maximal hypertextual interface needs to
       | dynamically display relationships (rather than just a huge tag
       | cloud or making all the body text hyperlink blue), a machine-
       | readable characterization of the nature of those relationships
       | would be invaluable. Does my text refute the linked resource? Or
       | support it, or provide related factual statements, or is it
       | chronologically or conceptually derivative? Some sort of
       | ontological palette to write with could produce something that
       | could be traversed in all sorts of novel ways. It would give
       | writers more agency over external systems that will otherwise
       | apply their own semantic meaning to the text, and its links.
       | 
       | Also, Hypertext (HTML, anyway) is oddly asymmetrical in its
       | normal usage. We link a small string of text within a document,
       | one-way, to the entirety an entire separate document. I think
       | convention is the primary obstacle, but why can't we describe a
       | relationship from our entire document to some other resource?
        
         | _jal wrote:
         | If you haven't run across it, you should read up on Ted
         | Nelson's Xanadu project.
         | 
         | I've heard different arguments for why that failed. I think the
         | likeliest is just that simple and dumb is often more successful
         | than complex and nuanced.
         | 
         | > I think convention is the primary obstacle
         | 
         | At this point, I think the existence of the web as it is is the
         | primary obstacle. Business models built on laws and case
         | rulings depend on links as they currently exist. So changes
         | here would need to get buy in from those folks, browser makers,
         | standards body folks (but I repeat myself), and then would have
         | to somehow be reconciled with laws that make assumptions about
         | how they work.
         | 
         | Easy peasy.
        
       | galaxyLogic wrote:
       | I can already select a word or phrase in my browser, right-click
       | and choose "Search the web (or Google etc.) for this".
       | 
       | So I think we already have this feature. Now we just need better
       | search-engines. Maybe AI supported ones? Wait we already have
       | those too.
       | 
       | Making every word look like a hyperlink is a bad idea because
       | often the user will want to search for a whole phrase, not just
       | an individual word.
        
       | [deleted]
        
       | codeGreene wrote:
       | Any thoughts of representing topics across different notes or
       | pages visually? I have always wanted something similar to the
       | https://twitchatlas.com/ but for topics of interest to me. For
       | example Quantum Physics would be a very large bubble. Within that
       | bubble could be links to several sub-topics and branching out
       | from there would be related topics.
       | 
       | I know I am not articulating this the best. I am a visual learner
       | and going through pages and pages in OneNote and cherrytree has
       | been ineffective.
        
       | deafpolygon wrote:
       | this is a terrible idea. you'll end up overwhelming the reader.
        
         | Zambyte wrote:
         | True. Like on the Hacker News website, where almost all of the
         | posts are hyperlinks. People clearly get overwhelmed and jusy
         | comment instead of opening the articles ;)
        
       | pie_flavor wrote:
       | Perhaps this conjures different sentiments from other HN readers,
       | but what instantly pops to mind for _me_ is my childhood web
       | browsing experience where I had the wrong toolbar or plugin or
       | whatever, and thus every page was filled with automatically
       | generated double-underlined green links that invariably went to a
       | page I regretted visiting.
       | 
       | If there is one thing the SEO and LLM crazes have taught us, it
       | is that small curated beats big generated, every single time.
        
         | hot_gril wrote:
         | That or some Yahoo! sites or random blogs that did this on
         | their own. The links didn't go to malicious sites, just useless
         | ones.
         | 
         | I can see this actually being useful if done well, but I didn't
         | like the demo just because of the annoyingly highlighted words.
         | It should only show up if I mouse over.
        
           | giantrobot wrote:
           | The best (worst) were all the ads that were a pop up
           | underneath the cursor. There was absolutely no safe place to
           | park your cursor. The ad of course occulted the page and you
           | couldn't read anything. The bastard sibling of that type of
           | ad was the popup under the cursor when you were just hovering
           | over a link to see where it went.
        
       | pazimzadeh wrote:
       | > The computer can look at every word on the page, every phrase,
       | name, quote, and section of text, and show me a "map" of the
       | words and ideas behind which lay the most interesting ideas I
       | might want to know about
       | 
       | This would be really useful when reading research papers.
        
         | warkdarrior wrote:
         | Yeah, if one could solve the problem of identifying "most
         | interesting ideas," that'd be good.
        
           | pazimzadeh wrote:
           | In biology, it would already be a huge upgrade if you could
           | click on the name of any gene or protein and see a summary of
           | the protein function, links to sequence info, a list of other
           | names that the gene has historically been labelled with, and
           | links to other recent paper where the gene was studied.
           | 
           | In the non-biologies, I'm sure something similar would also
           | be useful.
           | 
           | > Yeah, if one could solve the problem of identifying "most
           | interesting ideas,"
           | 
           | I think ChatGPT can already categorize ideas, so you could at
           | the very list see information about related concepts?
        
       | gwern wrote:
       | How to add hyperlinks is something I've thought a bit about for
       | Gwern.net: there's no point having all these fancy popups if
       | there are no hyperlinks exploiting them, right?
       | 
       | The way I currently do it is that first, I make hyperlinks stable
       | by automatically snapshotting & making local archives of pages
       | (https://gwern.net/archiving#preemptive-local-archiving). There
       | is no point in adding links if linkrot discourages anyone from
       | using them, of course, and I found that manual linkrot fixing did
       | not scale to the amount of writing & hyperlinking I want to do.
       | 
       | The next step is adding links automatically. Particularly in the
       | STEM topics I write most about these days, AI, there are many
       | acronyms & named systems which mean specific things but it's easy
       | to get lost in. Fortunately, that makes them easy to write
       | automatic link rules for:
       | https://github.com/gwern/gwern.net/blob/master/build/Config/...
       | These run automatically on essay bodies when compiling the site,
       | and on annotations when created. If a URL is already present, its
       | rule doesn't run; and if it's not, only the first instance gets
       | linked and the rest are skipped. (This is important: there are
       | some approaches which take the lazy approach of hyperlinking
       | every instance. This is bad and discredits linking.) This code is
       | very slow but fast enough for static site building, anyway.
       | 
       | Sometimes terms are too ambiguous or too rare or too much work to
       | write an explicit rewrite rule for. But it will still exist on-
       | site. In fact, you can say that the site corpus _defines_ a set
       | of rewrite rules: everytime I write by hand `[foo](http://bar)`,
       | am I not _implicitly_ saying that there ought to be a rewrite
       | rule for the string `foo` which ought to hyperlink `http://bar`?
       | So there is a script
       | (https://github.com/gwern/gwern.net/blob/master/build/link-su...)
       | which will parse the site corpus, compile all the text/link
       | pairs, create/remove a bunch of them per whitelist/blacklists and
       | a frequency/length threshold, and then generate a bunch of Emacs
       | Lisp pairs. This master list of rewrites then gets read by an
       | Elisp snippet in my Emacs and turned into several thousand
       | interactive search-and-replace commands when I run my generic
       | formatting command on a buffer.
       | 
       | The effect of this second script is that after I have linked `Foo
       | et al 2023` to `/doc/2023-foo.pdf` a few times (perhaps I went
       | back and hyperlinked all instances of it after realizing it's an
       | important paper), any future instances of 'Foo et al 2023' will
       | pop up a search-and-replace asking to hyperlink it to
       | `/doc/2023-foo.pdf`, and so on.
       | 
       | Third, I exploit my link-recommendations for manually-curated
       | 'see also' sections appended to annotations. I have a fairly
       | standard link-recommender approach where each annotation is
       | embedded by a neural network (OA API for now), and one does
       | nearest-neighbor lookups to find _n_ 'similar' annotations, and
       | shows the reader them in case any are relevant. So far so good.
       | But I _also_ do that after editing each annotation: embed-
       | recommend-list, and spits out a HTML list of the top 20 or so
       | similar-links appended to the annotation. I can look at that and
       | delete the irrelevant entries, or the entire list. This means
       | that they 'll be included in the final embedded version of the
       | annotation, will show up in any fulltext searche I run, are more
       | visible to the reader, can be edited into the main body if I want
       | to, etc.
       | 
       | Fourth and most lately, I've been experimenting with GPT-4 for
       | auto-formatting & auto-linking (https://github.com/gwern/gwern.ne
       | t/blob/master/build/paragra...). GPT-4 has memorized many URLs,
       | and where it hasn't, it still makes pretty good guesses. So, as
       | part of the standard formatting passes, I pass annotations
       | through GPT-4, with a bit added to its prompt, 'try to add useful
       | hyperlinks to Wikipedia and other sources'. It often does, and
       | it's quite convenient when that works. GPT-4 still confabulates
       | URLs more often than I link, and sometimes hyperlinks too-obvious
       | WP links and I have to delete them. So, still some adjustments
       | required there.
       | 
       | And these work well with the other site features like recursive
       | popups, or bidirectional backlinks
       | (https://gwern.net/design#backlink).
        
       | mxuribe wrote:
       | The concept reminds me of Vannevar Bush's Memex device/system
       | [https://en.wikipedia.org/wiki/Memex] - very cool! Though, i can
       | imagine the UX, UI would have to be very tight and deliberate,
       | lest the user can be easily overwhelemed, overloaded with too
       | much noise (and not enough signal).
        
       | ichbinlegion wrote:
       | What stops the reader from copy&pasting a term into a search
       | engine? Maybe we are getting a bit too lazy?
        
         | phailhaus wrote:
         | There is a huge advantage to surfacing search results directly
         | in the document. You drop the friction to effectively zero,
         | which means you're far more likely to search random phrases in
         | a document when you never would have before.
        
         | hot_gril wrote:
         | I've read some research papers where I need to look up a word
         | in like every sentence. It's hard to focus when I have to keep
         | going to a separate tab/window.
        
       ___________________________________________________________________
       (page generated 2023-07-25 23:00 UTC)