[HN Gopher] Personal computing paves the way for personal librar...
       ___________________________________________________________________
        
       Personal computing paves the way for personal library science
        
       Author : _bramses
       Score  : 175 points
       Date   : 2024-04-28 21:57 UTC (1 days ago)
        
 (HTM) web link (www.bramadams.dev)
 (TXT) w3m dump (www.bramadams.dev)
        
       | walterbell wrote:
       | _> Personal Library Science is the leverage of LLM technology,
       | applied to a personal library. A personal library differs from a
       | impersonal library in the fact that a personal library is an
       | interpretation of a source material. These interpretations
       | include: photographs from different photographers at the same
       | event, or favorite scenes from a movie, or favorite passages from
       | books, parts of songs that bring you to tears, etc. Importantly,
       | these interpretations create unique sets that go on to create
       | unique problems which require unique, idiosyncratic solutions._
       | 
       | Would an LLM-driven "Personal Library" require manually annotated
       | textual interpretation of each curated item, or could it derive
       | personal interpretations from user history and the uniqueness of
       | curated items/sets?
       | 
       | For those who have been using local, offline LLMs with a manually
       | curated text/image corpus, what have been the most valuable or
       | surprising use cases?
       | 
       | Author demo video (2023), https://youtube.com/watch?v=7TgqMRz2r3M
       | & tooling comment (2024),
       | https://news.ycombinator.com/item?id=39789712
       | 
       |  _> Inspired by the commonplace book format, I take highlights
       | from Kindle and embed them in a DB. From there I build (multiple)
       | downstream apps but the central one, Commonplace Bot is a bot
       | that serves as a retrieval and transformer for said highlights._
       | 
       | Related: https://en.wikipedia.org/wiki/Lifelog
        
         | dartos wrote:
         | > Would an LLM-driven "Personal Library" require manually
         | annotated textual interpretation of each curated item
         | 
         | No. In something like this you'd probably have the LLM annotate
         | and curate your personal library for you.
         | 
         | Potentially by creating and assigning tags or topics based on
         | the content of your library.
        
           | unshavedyak wrote:
           | Yea. I had this discussion not too long ago about this. I'd
           | love to have a combination of a library (Personal Knowledge
           | Management style), data ingestions, and a current world
           | view/state.
           | 
           | The PKM is the stored info to write to and query against
           | (both for LLMs and humans). The data ingests are just a
           | pipeline of digital inputs to the system, like chat logs,
           | maybe (transcribed) webcam feeds, files i'm currently editing
           | on desktop, browsing history, etc. The current world view is
           | the interpretation of what i'm doing - to tie all the ingests
           | together and give them context. Eg in isolation browsing some
           | Rust crates might not be that useful. But if i'm also editing
           | Project X on my computer then it's reasonable to assume the
           | searching is related to X. However if it's been 8 hours since
           | any Project X activity, it's less likely related. Same goes
           | for context-less chat logs (as happens frequently in my
           | house) where they are extensions of a voice conversation,
           | etc.
           | 
           | All of this stuff is of course insanely privacy invading, so
           | i'd only implement this locally. I also wouldn't even store
           | most of it for fear of data invasion, but using it to fuel a
           | PKM automatically seems pretty sexy. Like browser history,
           | but for your life.
           | 
           | This is all just wishful thinking though, LLMs have been
           | moving too fast for me to even bother toying with this. I
           | should note though that i did not intend for LLMs to be
           | "smart". Rather, in a RAG-like fashion (i think is the term),
           | i want to just let LLMs do what they're good at -
           | summarization & autocomplete, and let the world view / PKM
           | store the real data.
        
             | dartos wrote:
             | FWIW LLMs have been advancing on benchmarks but the
             | practical usage of them (RAG, React, CoT, etc) hasn't
             | really changed much in the past year.
        
         | _bramses wrote:
         | > Would an LLM-driven "Personal Library" require manually
         | annotated textual interpretation of each curated item, or could
         | it derive personal interpretations from user history and the
         | uniqueness of curated items/sets?
         | 
         | I've personally found that tagging is less robust than LLM
         | embeddings (mainly due to dimensionality), but human appended
         | thoughts about a source -- also embedded -- serve even better
         | as tags.
         | 
         | Example: "this is a quote about dinosaurs..." (Old way of doing
         | things) Tags: dinosaurs, jurassic, history Query: "dinosaurs" >
         | results = 1...
         | 
         | (New way of doing things) Embedded Quote: [0.182...] User Added
         | Thought: "this dinosaur reminds me of a time i went to six
         | flags with my cousins and..." Embedded User Added Thought:
         | [0.284...]
         | 
         | Query: "dinosaurs" > results = 2 (indexes = sources, thoughts)
         | 
         | The "thoughts" index can do a second layer cosine similarity
         | search and serve as a tag on its own to fetch similar concepts.
         | Basically a tree search created by similarity from user
         | input/feedback loops.
        
         | skybrian wrote:
         | I imagine an LLM could work well for doing autocomplete while
         | saving and annotating documents. But it's not personal unless
         | you edit the result to say what you want to say.
        
       | mistrial9 wrote:
       | there may be an important divergence implied by this essay ..
       | people here ask about using an LLM.. but the essay refers to
       | "different photographs of the same scene from different
       | photographers" or other personal collection items that are
       | related but subjective or not-authoritative
       | 
       | There is a rush in public to condense and summarize many
       | authoritative publications to find patterns, or to replace a
       | human expert with automated results.. yet that is fundamentally
       | different than taking multiple incomplete perspectives to add to
       | a human library-owners knowledge and investigations.
       | 
       | It is subtle to speak it but not subtle in its implications..
       | taking "data as facts" and condensing them or reordering them or
       | rewriting an output based on them, using automation, is different
       | than a human mind taking in many inputs for human mind knowledge
       | and enabling new outputs from a human author.
        
         | _bramses wrote:
         | > There is a rush in public to condense and summarize many
         | authoritative publications to find patterns, or to replace a
         | human expert with automated results.. yet that is fundamentally
         | different than taking multiple incomplete perspectives to add
         | to a human library-owners knowledge and investigations. It is
         | subtle to speak it but not subtle in its implications.. taking
         | "data as facts" and condensing them or reordering them or
         | rewriting an output based on them, using automation, is
         | different than a human mind taking in many inputs for human
         | mind knowledge and enabling new outputs from a human author.
         | 
         | You nailed it! Thanks for noticing the divergence!
        
           | walterbell wrote:
           | There's lots of interesting work that came out of BCL in
           | 1960s,
           | https://en.wikipedia.org/wiki/Biological_Computer_Laboratory
           | 
           |  _> The focus of research at BCL was systems theory and
           | specifically the area of self-organizing systems, bionics,
           | and bio-inspired computing; that is, analyzing, formalizing,
           | and implementing biological processes using computers. BCL
           | was inspired by the ideas of Warren McCulloch and the Macy
           | Conferences, as well as many other thinkers in the field of
           | cybernetics._
           | 
           | On cybernetics, https://www.pangaro.com/definition-
           | cybernetics.html
           | 
           |  _> Artificial Intelligence (AI) grew from a desire to make
           | computers smart, whether smart like humans or just smart in
           | some other way. Cybernetics grew from a desire to understand
           | and build systems that can achieve goals.. it connects
           | control (actions taken in hope of achieving goals) with
           | communication (connection and information flow between the
           | actor and the environment).. Later, Gordon Pask offered
           | conversation as the core interaction of systems that have
           | goals._
        
       | teja_nemana wrote:
       | > The "during" is hard work, and very lonely work. There are no
       | promises of success, and indeed, the path is one where you can't
       | see more than three feet ahead of you and you exist on the
       | cliff's edge of extinction by any silly mishap. The work of
       | "during" is exhausting, and it constantly holds you taut and
       | alert, afraid of the shadows that lurk beyond the campfire's
       | edge.
       | 
       | Well said. All anyone can do is to do the lonely work till you
       | can't anymore or you find friends to not be lonely at that work
       | anymore.
        
       | quest88 wrote:
       | I've had related ideas lurking at the back of my mind for a while
       | now. Essentially, I want to save more things locally and and
       | interact with it. For example, I have a bunch of book notes
       | stored in Bear. I'd like to be able to ask questions about those
       | notes, and also show the pages of the book itself.
        
         | squirrel wrote:
         | Try Zenfetch. It's designed for this use case.
        
           | gabev wrote:
           | Thanks for mentioning Zenfetch :)
           | 
           | Happy to answer any questions
        
             | bibliotekka wrote:
             | What is Zenfetch?
        
               | gabev wrote:
               | Personal RAG. Connect your existing bookmarks/web
               | browsing/notes into a knowledge library with AI search
               | and chat over top it
        
       | openrisk wrote:
       | Personal computing has stagnated for such a long time, it creates
       | substantial uncertainty about what state it might evolve to if
       | and when the next step actually happens.
       | 
       | In this respect local LLM's are simply the tip of the iceberg,
       | pointing out the vast amount of personal information processing
       | that is available in principle but does not actually happen.
        
         | walterbell wrote:
         | One could argue that personal computing (desktop) software
         | piracy lead to web-based SaaS subscription licensing. In
         | theory, mobile app stores solved device software piracy, at the
         | cost of high distribution fees, policy restrictions and
         | telemetry.
         | 
         | Thanks to Linux being used at scale in Android and WSL, it's
         | now maintained and capable on the desktop, as a hypothetical
         | foundation for personal computing innovation. But even there,
         | native GUI toolkits took a backseat to web and CLI. Remember
         | Chandler? http://www.osafoundation.org/
         | 
         | Investors poured small fortunes into cauldrons of smart
         | devices, wearables and AR/VR, with little to show as nascent
         | ecosystems failed to achieve escape velocity, due to closed
         | hardware and software that forestalled the experimentation
         | which birthed personal computing.
         | 
         | Apple Silicon has reinvigorated walled laptops. Hopefully next
         | month's derivative Qualcomm SoC from PC OEMs can offer good
         | price/performance/watt for Apple-competitive-yet-open Arm
         | laptops and tablets that can run any Linux distro, with retail
         | SSDs and RAM, plus AI silicon roadmap.
         | 
         | A modular Framework Arm laptop would be a good start to
         | rebooting PC innovation.
        
           | idle_zealot wrote:
           | How does slightly improved laptop hardware relate to re-
           | invigorating desktop software? Surely desktop computing has
           | stagnated because most users are primarily or exclusively
           | mobile users. In Mac land Apple has been progressively
           | dumbing down their interfaces, in Windows land Microsoft is
           | more focused on extracting maximum value from their users
           | than trying to meaningfully improve their platform. In Linux
           | land there are some interesting things happening with
           | Nix/Guix around declarative system configurations, and around
           | Fedora with its layered images+Flatpak distros for making
           | systems more reliable, and System76 may be doing something
           | novel interface-wise with Cosmic marrying powerful
           | tiling/tabbing window layouts with intuitive controls and the
           | niceties of an all-in-one desktop environment. From my
           | perspective desktop computing is definitely advancing, but
           | only for hobbyists, not for mainstream desktop operating
           | systems.
        
             | walterbell wrote:
             | _> How does slightly improved laptop hardware relate to re-
             | invigorating desktop software?_
             | 
             | If Arm SystemReady laptops with good performance/watt have
             | an open security foundation (declarative, immutable OS at
             | EL2) to support multiple competing "app store" equivalents
             | on Linux, the resulting revenue and competitive market can
             | reward innovative desktop software - open, closed or
             | hybrid. Without an Apple tax on storage and memory, funds
             | can be redirected to a competitive market of smaller ISVs.
        
           | p_l wrote:
           | Unfortunately so far Qualcomm was hard at work avoiding
           | making an open platform ARM-based laptop/tablet - to the
           | point of squeezing around MS rules on that through special
           | drivers etc to make Windows think it's dealing with EFI-
           | compliant hardware.
        
             | walterbell wrote:
             | That's disappointing, since mainline Linux support has made
             | progress, https://www.linaro.org/blog/qualcomm-and-linaro-
             | enable-lates...
        
               | p_l wrote:
               | The main issue is that what's upstreamed is essentially
               | drivers for the SoCs - but the firmware of Qualcomm-
               | powered laptops tends to be not fully compliant.
               | 
               | So it's easy to make a device powered by one when you
               | control how linux is booted on it, but for whatever
               | reason things like EFI NVRAM interface on windows
               | qualcomm powered laptops was done non-standard and the
               | only reason windows works is because there are drivers
               | shipped which work around it - and I seriously doubt its
               | intended by Microsoft, because Microsoft actually
               | benefits from devices following their official,
               | documented, hardware-interface specs - it makes for easy
               | upgrades, reinstalls, etc. etc.
        
       | A_D_E_P_T wrote:
       | I work in a very interdisciplinary, and somewhat niche,
       | tech/engineering field. For the past 15 years, I've been saving
       | every relevant PDF that I can find -- mostly studies of the sort
       | published by Elsevier and Springer, but also books and
       | presentations. I now have around 10k, which probably makes it the
       | largest private library focused on this particular domain of
       | expertise.
       | 
       | It has been extremely useful, especially because it's text-
       | searchable and the really important papers are properly
       | categorized.
       | 
       | A local LLM will make it 100x more useful. Also, it might not
       | even need be "local." If I make it available via the web, I can
       | probably sell access to other scientists and engineers in my
       | field.
       | 
       | Recent advances _really_ benefit data hoarders out there.
       | 
       | I'd add that these days it totally makes sense to download
       | libgen's entire archive, because (1) storage has never been
       | cheaper, and (2) you can use it to train local LLMs.
        
         | hervature wrote:
         | > If I make it available via the web, I can probably sell
         | access to other scientists and engineers in my field.
         | 
         | Out of curiosity. Does this statement come from complete
         | ignorance of or complete disregard to copyright of the author?
        
           | throwaway11460 wrote:
           | If it's research it very probably is at least partially
           | publicly funded. Regardless of whatever the law says, I don't
           | think it's immoral to take it and offer better services
           | around it that will be useful enough that someone decides to
           | pay.
        
             | hervature wrote:
             | Do you not see the hypocrisy in stating that someone should
             | be able to take something partially publicly funded and
             | profit from it while the creator of said work should not
             | retain some rights over said profit? By extension of the
             | transitive property of the nebulous "partially", the LLM
             | wrapper should be provided for free with complete disregard
             | to the wrapper's creator since it is a derivative of
             | partially publicly funded work.
        
               | throwaway11460 wrote:
               | Yeah, the LLM maybe - though nobody paid for the training
               | costs in that case and that feels weird. If the public
               | paid for a work, it should be able to use it and not be
               | required to give away their own derivative work for free
               | since they already paid for it through taxes.
               | 
               | While I rather like the idea of having to provide access
               | to derivates of publicly funded works, I fear that people
               | would rather not use it than invest money into innovative
               | approaches of using it. Of course if the public pays for
               | the training and development costs, then by all means it
               | should be available.
               | 
               | And the library itself and the computing resources to
               | operate it cost money that someone needs to pay.
               | Publishers didn't pay for the research and yet they can
               | profit from it - why this guy shouldn't?
        
               | jddj wrote:
               | Rightly or wrongly, Elsevier has a market capitalisation
               | of PS62B for essentially doing what GP is proposing.
        
               | p_l wrote:
               | The hoarders of IP in this case usually have done 0 work
               | to produce it (Elsevier et al).
        
           | walterbell wrote:
           | _> complete ignorance of or complete disregard to copyright_
           | 
           | A question that could be appended to many LLM discussions!
        
             | luqtas wrote:
             | but the LLM is no different than a human doing that &
             | outputs COMPLETELY unique strings! - said the profiting
             | robot parked at the gray zone law area waiting for lobyists
             | turn it white
        
           | znpy wrote:
           | OpenAI et similia seems to be doing just fine though
        
           | thomastjeffery wrote:
           | I see you have encountered the first reality of "personal
           | library".
           | 
           | Of course, the idea of selling access to your private digital
           | collection (or a derivative model) is _relatively more
           | absurd_ than the idea of monopolizing the original published
           | work... Even so, this is as good a time as any to reconsider
           | the practicality of copyright.
        
         | squigz wrote:
         | The data hoarder community would encourage you to release that
         | collection for free, not try to profit from it.
        
         | matthewmorgan wrote:
         | How would you go about training an llm on 10k large pdfs?
        
         | RetroTechie wrote:
         | > I'd add that these days it totally makes sense to download
         | libgen's entire archive, because (1) storage has never been
         | cheaper, and (2) you can use it to train local LLMs.
         | 
         | Hardly. Data hoarding comes with most downsides of hoarding
         | physical objects. It's just smaller, cheaper & easier to
         | process.
         | 
         | There's people that get rid of any physical object they haven't
         | used in the past year (or 2, or 5, whatever). This makes sense.
         | Imho, every object you own falls in 3 categories:
         | 
         | 1) Things you use on a (semi?) regular basis. They make your
         | life easier/nicer.
         | 
         | 2) Things that are valuable. For flexible metrics of what
         | constitutes value (sentimental, nostalgia, monetary, insurance
         | against 'disaster', ...)
         | 
         | Not having used something in a long time is a good hint it's
         | _not_ valuable.
         | 
         | 3) Luggage. Whose value is negative. It doesn't provide
         | anything, just takes up space, drains mental energy (and
         | possibly other resources), and in doing so gets in the way of
         | other pursuits.
         | 
         | Data is no different. For any _single_ piece of it, you either
         | use it from time to time, somehow derive value from it, or it
         | is useless luggage that you drag around at a cost.
         | 
         | Apply good judgement in what to hoard.
        
           | throwaway14356 wrote:
           | i just keep everything ive bothered to keep, (data and stuff)
           | i put it in a sensible place and it only consumes time when
           | looking for it.
           | 
           | the new external drive is so much larger than the oldest ones
           | i put the new stuff on it and use the rest of the space to
           | backup old drives.
           | 
           | friends are constantly purging stuff but they seem unaware
           | how much time and effort it takes.
           | 
           | the significant other wanted to clean up her old photos
           | rather than upgrade the icloud. its an insane amount of work?
        
           | squigz wrote:
           | > Data hoarding comes with most downsides of hoarding
           | physical objects. It's just smaller, cheaper & easier to
           | process.
           | 
           | So the same downsides, except ... much better?
           | 
           | > There's people that get rid of any physical object they
           | haven't used in the past year (or 2, or 5, whatever)
           | 
           | > Data is no different. ... you either use it from time to
           | time
           | 
           | I suppose the real question then is what timeframe do you
           | consider "using it from time to time"? It seems to me that
           | this depends very much on the person, and likely on the
           | objects themselves - probably a large appliance you haven't
           | used in a year is likelier to be thrown away than a small
           | item you haven't used in several. Considering data storage is
           | smaller, cheaper, and easier to manage, I suppose a
           | reasonable timeframe for keeping it, just in case, would be a
           | lot longer than physical items.
           | 
           | This of course says nothing of the societal and cultural
           | value such archivists safeguard.
        
         | netdevnet wrote:
         | > A local LLM will make it 100x more useful. Also, it might not
         | even need be "local." If I make it available via the web, I can
         | probably sell access to other scientists and engineers in my
         | field.
         | 
         | That's not legal. Just because you own some files does not mean
         | that you own the IP of the content within.
        
           | __MatrixMan__ wrote:
           | It is, however, the right thing to do.
        
             | squigz wrote:
             | Profiting off it is not the right thing to do.
        
           | financetechbro wrote:
           | That's OpenAI's business model tho
        
       | detourdog wrote:
       | If one remembers NeXT included all sorts of non-computer
       | documents and literature. The idea of storing vasts amounts of
       | data for personal use was at the dawn of the PC era.
        
         | WillAdams wrote:
         | I miss Librarian.app --- it was quite useful for a project of
         | mine:
         | 
         | https://tug.org/TUGboat/Articles/tb24-2/tb77adams.pdf
         | 
         | (basically used copies of _The Bible_ and The Works of
         | Shakespeare to determine if a given set of letters appeared in
         | the English language or no)
        
       | WillAdams wrote:
       | Not too long ago, I managed to pretty much ruin the wiki for a
       | small (and at that time opensource) CNC machine by using it as my
       | personal notebook --- probably my usage of it thus was a big part
       | of why it was left off-line when the person hosting it moved.
       | 
       | You can see it on the Wayback Machine:
       | 
       | https://web.archive.org/web/20211127090321/https://wiki.shap...
       | 
       | In retrospect, I should have put some of that effort into:
       | 
       | https://en.wikibooks.org/wiki/Hobbyist_CNC_Machining
       | 
       | although since then, a machine owner worked up:
       | 
       | https://shapeokoenthusiasts.gitbook.io/shapeoko-cnc-a-to-z
       | 
       | I still regret a bunch of stuff I didn't keep copies of, esp. the
       | scans of Barry Hughart's notes for his novels.
       | 
       | The irony is that one can see a bit of the result of discussion
       | of this sort of thing at the top of one's browser window --- the
       | URL bar, where URL == "Uniform Resource Locator" --- the
       | originally proposed term was "Universal Resource Locator", but
       | the argument against that was that people were not librarians,
       | and that unlike Ted Nelson's Xanadu, there wouldn't an over-
       | arching data structure and organization, so a given document
       | wouldn't have a single canonical location.
       | 
       | Anyone interested in this sort of thing who hasn't read it,
       | should read Tim Berner-Lee's book:
       | 
       | https://www.w3.org/People/Berners-Lee/Weaving/Overview.html
        
       | dtagames wrote:
       | Best quote from the article: _"...personal library science is
       | focused on your relationship with your information. How do we
       | store information so that it useful at a later date? How do we
       | transform our information into new valuable assets in different
       | creative domains? How do we do all of this while being flexible
       | enough for the idiosyncrasies, proclivities, likes and dislikes
       | of eight billion distinct individuals? How do we chronicle the
       | information diet of a single person as they learn new things,
       | interact with the world at different phases in their life? How do
       | we make sure we can pass down our best knowledge to generations
       | below? "_
        
         | EricE wrote:
         | My personal favorite
         | https://www.devontechnologies.com/apps/devonthink
        
       | RecycledEle wrote:
       | Years ago I spent thousands of hours trying to figure out how to
       | organize a digital library.
       | 
       | My final answer was to use the Library of Congress catalog
       | system. They need to add some sub-categories for how-to
       | explanations.
       | 
       | Then have a field for media type (video vs. PDF vs. image)
       | 
       | Then note the style of presentation (academic vs. folksy vs. a
       | manual vs. a dad showing you how to do this)
       | 
       | Then note the language
        
       ___________________________________________________________________
       (page generated 2024-04-29 23:02 UTC)