[HN Gopher] Hyperlink Maximalism (2022)
___________________________________________________________________
Hyperlink Maximalism (2022)
Author : Tomte
Score : 75 points
Date : 2023-07-25 17:40 UTC (5 hours ago)
(HTM) web link (thesephist.com)
(TXT) w3m dump (thesephist.com)
| thesephist wrote:
| Hey HN! Author here.
|
| For the many Obsidian users here, wanted to share an Obsidian
| demo/plugin that I saw recently by Justin Smith[0] that I think
| faithfully carries over a lot of what I liked about this idea
| into the Obsidian land, complete with a semantic index w/
| language models.
|
| If you're an Obsidian user, do check out the demo. I can't take
| credit for any part of building it, but it's really cool to see
| the idea in action :)
|
| [0] https://twitter.com/justindsmith/status/1679978286955532296
| Slow_Hand wrote:
| Excellent! This was my first question upon reading, whether it
| can be integrated into my Obsidian Database.
| Santosh83 wrote:
| One of the HN regular contributor's personal site is similar.
| Extensively cross-linked and hovering over any links opens in its
| own popup window. A unique site.
| j2kun wrote:
| A well written document should focus the reader. When everything
| is a hyperlink and a "thought map" then you'll just end up being
| distracted and either not reading the original doc, or clicking
| on nothing and ignoring the deluge of hyperlinks. The demos in
| this article are heavily obstructed by all the visual noise.
|
| If something is important enough, the author will discuss it
| directly in the document. Hyperlinks can provide context when the
| reader may not have the context assumed of the intended audience,
| or they're a poor man's bibliographic citation (because the links
| always break eventually), but otherwise it's a signal that it's
| _not_ important.
| carlosjobim wrote:
| Two problems:
|
| 1. Most people who publish their writings online are too lazy to
| hyperlink, even where it is extremely appropriate. Even intra-
| site links.
|
| 2. Link rot is real. Maintainers of web sites are too arrogant to
| make redirects, and happily destroy all their incoming links when
| playing with a new framework. So now links from your site lead to
| 404s, because of somebody on the other end, and you have to scan
| your outgoing links every few months and change them.
|
| In a way Mac OS and iOS have already implemented the "everything
| is a link"-approach, with their look up feature. It's much more
| powerful than it used to be, and you can select any word or words
| anywhere and find all kinds of information.
| nologic01 wrote:
| Love the fact that people still think and work on human centric
| computing.
|
| There should clearly be some low hanging fruit in revisiting the
| hypertext concept with the benefit of a few decades of further
| tech development.
|
| Its not clear though which aspect(s) would be the most
| beneficial. Eg when scanning text for information and
| connections, our context is frequently an important factor, but
| that context is not available to the computer.
| jkestner wrote:
| Should this be a system-level service in many circumstances? I
| love that with a three-finger tap, I get an instant dictionary
| definition of whatever word I'm hovering over (not even
| selected). I want that reliable availability, but more expansive.
| mananaysiempre wrote:
| > my computer should search across everything I've read and some
| small high-quality subset of the Web
|
| Even without the contextual search--which is initeresting mind
| you!--I want this part, and I've wanted it for a long long time.
|
| I have it for journal articles simply by the virtue of
| obsessively saving and filing every PDF I've read for the last
| decade, but for the Web I don't know how I could just plain index
| everything I've read (and not just everything in
| Pocket/Wallabag/etc). For example, is there anything that can
| pull links from Firefox Sync and save/index extracted text from
| them in a halfway reliable fashion? (I don't expect it to be able
| to bypass yucky stuff like CAPTCHAs on archive.is.)
|
| I know there are a handful of WARC-ecosystem tools, but AFAIU
| these are rarely that automatic, are tied to a browser plugin as
| opposed to just pulling the URLs from history sync (yes, I'm
| unreasonably fond of Epiphany/GNOME Web), and kind of heavy. Am I
| wrong? Any alternatives?
| jazzyjackson wrote:
| There's a different strategy, it's a VC thing that's mac only
| but pretty fun to play with:
|
| http://rewind.ai takes screenshots continuously and uses mac's
| system OCR to index everything you've ever looked at, whether
| that's web, code, or messages. When you search it pulls up
| screenshots that match.
| rhn_mk1 wrote:
| > plain index everything I've read
|
| Recoll together with the extension can be configured to index
| every single web page you browse.
|
| https://addons.mozilla.org/en/firefox/addon/recoll-we/
| groby_b wrote:
| https://chromewebstore.google.com/detail/habonpimjphpdnmcfka...
| (or https://github.com/tjhorner/archivebox-exporter for source)
|
| Pushes your history to ArchiveBox, which does the heavy lifting
| storing/processing the content.
|
| Alas, might not work with Epiphany because there's no complete
| extension support.
|
| But IIRC, it stores its urls in $XDG_DATA_HOME/epiphany/ephy-
| history.db - so a bit of sqlite and ArchiveBox might do the
| trick for you.
|
| Note: I'm running something similar, but find that I'd rather
| not rely on my history, I tend to click on a lot of garbage ;)
| You might want to curate a bit.
| janalsncm wrote:
| I built something like this over a weekend. Saves every page
| title, html body, and time and lets you search them in the
| chrome new tab page. It also indexes on the fly.
|
| I have it installed on my browser and I'll probably release it
| when I can figure out how to make it more useful.
| zetalyrae wrote:
| The next best thing: the Common Lisp HyperSpec:
| http://www.lispworks.com/documentation/lw50/CLHS/Body/05_aa....
| HotGarbage wrote:
| Sounds familiar:
| https://en.wikipedia.org/wiki/Adequacy.org#Adequacy_style
| andai wrote:
| >everything should be hyperlinked
|
| >almost nothing in the article is hyperlinked
|
| What did he mean by this?
| mananaysiempre wrote:
| The first part is more the hook than the thesis, as the rest of
| TFA explains. (The first paragraphs outright admit that it's
| not actually that defensible if taken literally.)
|
| The actual point is that every potential term and phrase in a
| text you're reading should be preemptively searched for (across
| e.g. your web browsing history, reading list, etc.), marked up
| inline according to the rough potential usefulness of the
| results, and in any case those results should be made quickly
| accessible through user action, whatever the machine's opinion
| on their usefulness.
|
| So less manual hyperlinking and more web searches. (Manual
| hyperlinks have their place as well, but that place is things
| the author thinks especially relevant, not literally every
| point they want to make or refer to.)
| qez2 wrote:
| [dead]
| 1shooner wrote:
| I don't understand why, with all the UI/UX experience we have
| with hypertext, that have not moved the semantic of a hyperlink
| beyond a dumb one-way pointer.
|
| Assuming that a maximal hypertextual interface needs to
| dynamically display relationships (rather than just a huge tag
| cloud or making all the body text hyperlink blue), a machine-
| readable characterization of the nature of those relationships
| would be invaluable. Does my text refute the linked resource? Or
| support it, or provide related factual statements, or is it
| chronologically or conceptually derivative? Some sort of
| ontological palette to write with could produce something that
| could be traversed in all sorts of novel ways. It would give
| writers more agency over external systems that will otherwise
| apply their own semantic meaning to the text, and its links.
|
| Also, Hypertext (HTML, anyway) is oddly asymmetrical in its
| normal usage. We link a small string of text within a document,
| one-way, to the entirety an entire separate document. I think
| convention is the primary obstacle, but why can't we describe a
| relationship from our entire document to some other resource?
| _jal wrote:
| If you haven't run across it, you should read up on Ted
| Nelson's Xanadu project.
|
| I've heard different arguments for why that failed. I think the
| likeliest is just that simple and dumb is often more successful
| than complex and nuanced.
|
| > I think convention is the primary obstacle
|
| At this point, I think the existence of the web as it is is the
| primary obstacle. Business models built on laws and case
| rulings depend on links as they currently exist. So changes
| here would need to get buy in from those folks, browser makers,
| standards body folks (but I repeat myself), and then would have
| to somehow be reconciled with laws that make assumptions about
| how they work.
|
| Easy peasy.
| galaxyLogic wrote:
| I can already select a word or phrase in my browser, right-click
| and choose "Search the web (or Google etc.) for this".
|
| So I think we already have this feature. Now we just need better
| search-engines. Maybe AI supported ones? Wait we already have
| those too.
|
| Making every word look like a hyperlink is a bad idea because
| often the user will want to search for a whole phrase, not just
| an individual word.
| [deleted]
| codeGreene wrote:
| Any thoughts of representing topics across different notes or
| pages visually? I have always wanted something similar to the
| https://twitchatlas.com/ but for topics of interest to me. For
| example Quantum Physics would be a very large bubble. Within that
| bubble could be links to several sub-topics and branching out
| from there would be related topics.
|
| I know I am not articulating this the best. I am a visual learner
| and going through pages and pages in OneNote and cherrytree has
| been ineffective.
| deafpolygon wrote:
| this is a terrible idea. you'll end up overwhelming the reader.
| Zambyte wrote:
| True. Like on the Hacker News website, where almost all of the
| posts are hyperlinks. People clearly get overwhelmed and jusy
| comment instead of opening the articles ;)
| pie_flavor wrote:
| Perhaps this conjures different sentiments from other HN readers,
| but what instantly pops to mind for _me_ is my childhood web
| browsing experience where I had the wrong toolbar or plugin or
| whatever, and thus every page was filled with automatically
| generated double-underlined green links that invariably went to a
| page I regretted visiting.
|
| If there is one thing the SEO and LLM crazes have taught us, it
| is that small curated beats big generated, every single time.
| hot_gril wrote:
| That or some Yahoo! sites or random blogs that did this on
| their own. The links didn't go to malicious sites, just useless
| ones.
|
| I can see this actually being useful if done well, but I didn't
| like the demo just because of the annoyingly highlighted words.
| It should only show up if I mouse over.
| giantrobot wrote:
| The best (worst) were all the ads that were a pop up
| underneath the cursor. There was absolutely no safe place to
| park your cursor. The ad of course occulted the page and you
| couldn't read anything. The bastard sibling of that type of
| ad was the popup under the cursor when you were just hovering
| over a link to see where it went.
| pazimzadeh wrote:
| > The computer can look at every word on the page, every phrase,
| name, quote, and section of text, and show me a "map" of the
| words and ideas behind which lay the most interesting ideas I
| might want to know about
|
| This would be really useful when reading research papers.
| warkdarrior wrote:
| Yeah, if one could solve the problem of identifying "most
| interesting ideas," that'd be good.
| pazimzadeh wrote:
| In biology, it would already be a huge upgrade if you could
| click on the name of any gene or protein and see a summary of
| the protein function, links to sequence info, a list of other
| names that the gene has historically been labelled with, and
| links to other recent paper where the gene was studied.
|
| In the non-biologies, I'm sure something similar would also
| be useful.
|
| > Yeah, if one could solve the problem of identifying "most
| interesting ideas,"
|
| I think ChatGPT can already categorize ideas, so you could at
| the very list see information about related concepts?
| gwern wrote:
| How to add hyperlinks is something I've thought a bit about for
| Gwern.net: there's no point having all these fancy popups if
| there are no hyperlinks exploiting them, right?
|
| The way I currently do it is that first, I make hyperlinks stable
| by automatically snapshotting & making local archives of pages
| (https://gwern.net/archiving#preemptive-local-archiving). There
| is no point in adding links if linkrot discourages anyone from
| using them, of course, and I found that manual linkrot fixing did
| not scale to the amount of writing & hyperlinking I want to do.
|
| The next step is adding links automatically. Particularly in the
| STEM topics I write most about these days, AI, there are many
| acronyms & named systems which mean specific things but it's easy
| to get lost in. Fortunately, that makes them easy to write
| automatic link rules for:
| https://github.com/gwern/gwern.net/blob/master/build/Config/...
| These run automatically on essay bodies when compiling the site,
| and on annotations when created. If a URL is already present, its
| rule doesn't run; and if it's not, only the first instance gets
| linked and the rest are skipped. (This is important: there are
| some approaches which take the lazy approach of hyperlinking
| every instance. This is bad and discredits linking.) This code is
| very slow but fast enough for static site building, anyway.
|
| Sometimes terms are too ambiguous or too rare or too much work to
| write an explicit rewrite rule for. But it will still exist on-
| site. In fact, you can say that the site corpus _defines_ a set
| of rewrite rules: everytime I write by hand `[foo](http://bar)`,
| am I not _implicitly_ saying that there ought to be a rewrite
| rule for the string `foo` which ought to hyperlink `http://bar`?
| So there is a script
| (https://github.com/gwern/gwern.net/blob/master/build/link-su...)
| which will parse the site corpus, compile all the text/link
| pairs, create/remove a bunch of them per whitelist/blacklists and
| a frequency/length threshold, and then generate a bunch of Emacs
| Lisp pairs. This master list of rewrites then gets read by an
| Elisp snippet in my Emacs and turned into several thousand
| interactive search-and-replace commands when I run my generic
| formatting command on a buffer.
|
| The effect of this second script is that after I have linked `Foo
| et al 2023` to `/doc/2023-foo.pdf` a few times (perhaps I went
| back and hyperlinked all instances of it after realizing it's an
| important paper), any future instances of 'Foo et al 2023' will
| pop up a search-and-replace asking to hyperlink it to
| `/doc/2023-foo.pdf`, and so on.
|
| Third, I exploit my link-recommendations for manually-curated
| 'see also' sections appended to annotations. I have a fairly
| standard link-recommender approach where each annotation is
| embedded by a neural network (OA API for now), and one does
| nearest-neighbor lookups to find _n_ 'similar' annotations, and
| shows the reader them in case any are relevant. So far so good.
| But I _also_ do that after editing each annotation: embed-
| recommend-list, and spits out a HTML list of the top 20 or so
| similar-links appended to the annotation. I can look at that and
| delete the irrelevant entries, or the entire list. This means
| that they 'll be included in the final embedded version of the
| annotation, will show up in any fulltext searche I run, are more
| visible to the reader, can be edited into the main body if I want
| to, etc.
|
| Fourth and most lately, I've been experimenting with GPT-4 for
| auto-formatting & auto-linking (https://github.com/gwern/gwern.ne
| t/blob/master/build/paragra...). GPT-4 has memorized many URLs,
| and where it hasn't, it still makes pretty good guesses. So, as
| part of the standard formatting passes, I pass annotations
| through GPT-4, with a bit added to its prompt, 'try to add useful
| hyperlinks to Wikipedia and other sources'. It often does, and
| it's quite convenient when that works. GPT-4 still confabulates
| URLs more often than I link, and sometimes hyperlinks too-obvious
| WP links and I have to delete them. So, still some adjustments
| required there.
|
| And these work well with the other site features like recursive
| popups, or bidirectional backlinks
| (https://gwern.net/design#backlink).
| mxuribe wrote:
| The concept reminds me of Vannevar Bush's Memex device/system
| [https://en.wikipedia.org/wiki/Memex] - very cool! Though, i can
| imagine the UX, UI would have to be very tight and deliberate,
| lest the user can be easily overwhelemed, overloaded with too
| much noise (and not enough signal).
| ichbinlegion wrote:
| What stops the reader from copy&pasting a term into a search
| engine? Maybe we are getting a bit too lazy?
| phailhaus wrote:
| There is a huge advantage to surfacing search results directly
| in the document. You drop the friction to effectively zero,
| which means you're far more likely to search random phrases in
| a document when you never would have before.
| hot_gril wrote:
| I've read some research papers where I need to look up a word
| in like every sentence. It's hard to focus when I have to keep
| going to a separate tab/window.
___________________________________________________________________
(page generated 2023-07-25 23:00 UTC)