[HN Gopher] Personal computing paves the way for personal librar...
___________________________________________________________________
Personal computing paves the way for personal library science
Author : _bramses
Score : 175 points
Date : 2024-04-28 21:57 UTC (1 days ago)
(HTM) web link (www.bramadams.dev)
(TXT) w3m dump (www.bramadams.dev)
| walterbell wrote:
| _> Personal Library Science is the leverage of LLM technology,
| applied to a personal library. A personal library differs from a
| impersonal library in the fact that a personal library is an
| interpretation of a source material. These interpretations
| include: photographs from different photographers at the same
| event, or favorite scenes from a movie, or favorite passages from
| books, parts of songs that bring you to tears, etc. Importantly,
| these interpretations create unique sets that go on to create
| unique problems which require unique, idiosyncratic solutions._
|
| Would an LLM-driven "Personal Library" require manually annotated
| textual interpretation of each curated item, or could it derive
| personal interpretations from user history and the uniqueness of
| curated items/sets?
|
| For those who have been using local, offline LLMs with a manually
| curated text/image corpus, what have been the most valuable or
| surprising use cases?
|
| Author demo video (2023), https://youtube.com/watch?v=7TgqMRz2r3M
| & tooling comment (2024),
| https://news.ycombinator.com/item?id=39789712
|
| _> Inspired by the commonplace book format, I take highlights
| from Kindle and embed them in a DB. From there I build (multiple)
| downstream apps but the central one, Commonplace Bot is a bot
| that serves as a retrieval and transformer for said highlights._
|
| Related: https://en.wikipedia.org/wiki/Lifelog
| dartos wrote:
| > Would an LLM-driven "Personal Library" require manually
| annotated textual interpretation of each curated item
|
| No. In something like this you'd probably have the LLM annotate
| and curate your personal library for you.
|
| Potentially by creating and assigning tags or topics based on
| the content of your library.
| unshavedyak wrote:
| Yea. I had this discussion not too long ago about this. I'd
| love to have a combination of a library (Personal Knowledge
| Management style), data ingestions, and a current world
| view/state.
|
| The PKM is the stored info to write to and query against
| (both for LLMs and humans). The data ingests are just a
| pipeline of digital inputs to the system, like chat logs,
| maybe (transcribed) webcam feeds, files i'm currently editing
| on desktop, browsing history, etc. The current world view is
| the interpretation of what i'm doing - to tie all the ingests
| together and give them context. Eg in isolation browsing some
| Rust crates might not be that useful. But if i'm also editing
| Project X on my computer then it's reasonable to assume the
| searching is related to X. However if it's been 8 hours since
| any Project X activity, it's less likely related. Same goes
| for context-less chat logs (as happens frequently in my
| house) where they are extensions of a voice conversation,
| etc.
|
| All of this stuff is of course insanely privacy invading, so
| i'd only implement this locally. I also wouldn't even store
| most of it for fear of data invasion, but using it to fuel a
| PKM automatically seems pretty sexy. Like browser history,
| but for your life.
|
| This is all just wishful thinking though, LLMs have been
| moving too fast for me to even bother toying with this. I
| should note though that i did not intend for LLMs to be
| "smart". Rather, in a RAG-like fashion (i think is the term),
| i want to just let LLMs do what they're good at -
| summarization & autocomplete, and let the world view / PKM
| store the real data.
| dartos wrote:
| FWIW LLMs have been advancing on benchmarks but the
| practical usage of them (RAG, React, CoT, etc) hasn't
| really changed much in the past year.
| _bramses wrote:
| > Would an LLM-driven "Personal Library" require manually
| annotated textual interpretation of each curated item, or could
| it derive personal interpretations from user history and the
| uniqueness of curated items/sets?
|
| I've personally found that tagging is less robust than LLM
| embeddings (mainly due to dimensionality), but human appended
| thoughts about a source -- also embedded -- serve even better
| as tags.
|
| Example: "this is a quote about dinosaurs..." (Old way of doing
| things) Tags: dinosaurs, jurassic, history Query: "dinosaurs" >
| results = 1...
|
| (New way of doing things) Embedded Quote: [0.182...] User Added
| Thought: "this dinosaur reminds me of a time i went to six
| flags with my cousins and..." Embedded User Added Thought:
| [0.284...]
|
| Query: "dinosaurs" > results = 2 (indexes = sources, thoughts)
|
| The "thoughts" index can do a second layer cosine similarity
| search and serve as a tag on its own to fetch similar concepts.
| Basically a tree search created by similarity from user
| input/feedback loops.
| skybrian wrote:
| I imagine an LLM could work well for doing autocomplete while
| saving and annotating documents. But it's not personal unless
| you edit the result to say what you want to say.
| mistrial9 wrote:
| there may be an important divergence implied by this essay ..
| people here ask about using an LLM.. but the essay refers to
| "different photographs of the same scene from different
| photographers" or other personal collection items that are
| related but subjective or not-authoritative
|
| There is a rush in public to condense and summarize many
| authoritative publications to find patterns, or to replace a
| human expert with automated results.. yet that is fundamentally
| different than taking multiple incomplete perspectives to add to
| a human library-owners knowledge and investigations.
|
| It is subtle to speak it but not subtle in its implications..
| taking "data as facts" and condensing them or reordering them or
| rewriting an output based on them, using automation, is different
| than a human mind taking in many inputs for human mind knowledge
| and enabling new outputs from a human author.
| _bramses wrote:
| > There is a rush in public to condense and summarize many
| authoritative publications to find patterns, or to replace a
| human expert with automated results.. yet that is fundamentally
| different than taking multiple incomplete perspectives to add
| to a human library-owners knowledge and investigations. It is
| subtle to speak it but not subtle in its implications.. taking
| "data as facts" and condensing them or reordering them or
| rewriting an output based on them, using automation, is
| different than a human mind taking in many inputs for human
| mind knowledge and enabling new outputs from a human author.
|
| You nailed it! Thanks for noticing the divergence!
| walterbell wrote:
| There's lots of interesting work that came out of BCL in
| 1960s,
| https://en.wikipedia.org/wiki/Biological_Computer_Laboratory
|
| _> The focus of research at BCL was systems theory and
| specifically the area of self-organizing systems, bionics,
| and bio-inspired computing; that is, analyzing, formalizing,
| and implementing biological processes using computers. BCL
| was inspired by the ideas of Warren McCulloch and the Macy
| Conferences, as well as many other thinkers in the field of
| cybernetics._
|
| On cybernetics, https://www.pangaro.com/definition-
| cybernetics.html
|
| _> Artificial Intelligence (AI) grew from a desire to make
| computers smart, whether smart like humans or just smart in
| some other way. Cybernetics grew from a desire to understand
| and build systems that can achieve goals.. it connects
| control (actions taken in hope of achieving goals) with
| communication (connection and information flow between the
| actor and the environment).. Later, Gordon Pask offered
| conversation as the core interaction of systems that have
| goals._
| teja_nemana wrote:
| > The "during" is hard work, and very lonely work. There are no
| promises of success, and indeed, the path is one where you can't
| see more than three feet ahead of you and you exist on the
| cliff's edge of extinction by any silly mishap. The work of
| "during" is exhausting, and it constantly holds you taut and
| alert, afraid of the shadows that lurk beyond the campfire's
| edge.
|
| Well said. All anyone can do is to do the lonely work till you
| can't anymore or you find friends to not be lonely at that work
| anymore.
| quest88 wrote:
| I've had related ideas lurking at the back of my mind for a while
| now. Essentially, I want to save more things locally and and
| interact with it. For example, I have a bunch of book notes
| stored in Bear. I'd like to be able to ask questions about those
| notes, and also show the pages of the book itself.
| squirrel wrote:
| Try Zenfetch. It's designed for this use case.
| gabev wrote:
| Thanks for mentioning Zenfetch :)
|
| Happy to answer any questions
| bibliotekka wrote:
| What is Zenfetch?
| gabev wrote:
| Personal RAG. Connect your existing bookmarks/web
| browsing/notes into a knowledge library with AI search
| and chat over top it
| openrisk wrote:
| Personal computing has stagnated for such a long time, it creates
| substantial uncertainty about what state it might evolve to if
| and when the next step actually happens.
|
| In this respect local LLM's are simply the tip of the iceberg,
| pointing out the vast amount of personal information processing
| that is available in principle but does not actually happen.
| walterbell wrote:
| One could argue that personal computing (desktop) software
| piracy lead to web-based SaaS subscription licensing. In
| theory, mobile app stores solved device software piracy, at the
| cost of high distribution fees, policy restrictions and
| telemetry.
|
| Thanks to Linux being used at scale in Android and WSL, it's
| now maintained and capable on the desktop, as a hypothetical
| foundation for personal computing innovation. But even there,
| native GUI toolkits took a backseat to web and CLI. Remember
| Chandler? http://www.osafoundation.org/
|
| Investors poured small fortunes into cauldrons of smart
| devices, wearables and AR/VR, with little to show as nascent
| ecosystems failed to achieve escape velocity, due to closed
| hardware and software that forestalled the experimentation
| which birthed personal computing.
|
| Apple Silicon has reinvigorated walled laptops. Hopefully next
| month's derivative Qualcomm SoC from PC OEMs can offer good
| price/performance/watt for Apple-competitive-yet-open Arm
| laptops and tablets that can run any Linux distro, with retail
| SSDs and RAM, plus AI silicon roadmap.
|
| A modular Framework Arm laptop would be a good start to
| rebooting PC innovation.
| idle_zealot wrote:
| How does slightly improved laptop hardware relate to re-
| invigorating desktop software? Surely desktop computing has
| stagnated because most users are primarily or exclusively
| mobile users. In Mac land Apple has been progressively
| dumbing down their interfaces, in Windows land Microsoft is
| more focused on extracting maximum value from their users
| than trying to meaningfully improve their platform. In Linux
| land there are some interesting things happening with
| Nix/Guix around declarative system configurations, and around
| Fedora with its layered images+Flatpak distros for making
| systems more reliable, and System76 may be doing something
| novel interface-wise with Cosmic marrying powerful
| tiling/tabbing window layouts with intuitive controls and the
| niceties of an all-in-one desktop environment. From my
| perspective desktop computing is definitely advancing, but
| only for hobbyists, not for mainstream desktop operating
| systems.
| walterbell wrote:
| _> How does slightly improved laptop hardware relate to re-
| invigorating desktop software?_
|
| If Arm SystemReady laptops with good performance/watt have
| an open security foundation (declarative, immutable OS at
| EL2) to support multiple competing "app store" equivalents
| on Linux, the resulting revenue and competitive market can
| reward innovative desktop software - open, closed or
| hybrid. Without an Apple tax on storage and memory, funds
| can be redirected to a competitive market of smaller ISVs.
| p_l wrote:
| Unfortunately so far Qualcomm was hard at work avoiding
| making an open platform ARM-based laptop/tablet - to the
| point of squeezing around MS rules on that through special
| drivers etc to make Windows think it's dealing with EFI-
| compliant hardware.
| walterbell wrote:
| That's disappointing, since mainline Linux support has made
| progress, https://www.linaro.org/blog/qualcomm-and-linaro-
| enable-lates...
| p_l wrote:
| The main issue is that what's upstreamed is essentially
| drivers for the SoCs - but the firmware of Qualcomm-
| powered laptops tends to be not fully compliant.
|
| So it's easy to make a device powered by one when you
| control how linux is booted on it, but for whatever
| reason things like EFI NVRAM interface on windows
| qualcomm powered laptops was done non-standard and the
| only reason windows works is because there are drivers
| shipped which work around it - and I seriously doubt its
| intended by Microsoft, because Microsoft actually
| benefits from devices following their official,
| documented, hardware-interface specs - it makes for easy
| upgrades, reinstalls, etc. etc.
| A_D_E_P_T wrote:
| I work in a very interdisciplinary, and somewhat niche,
| tech/engineering field. For the past 15 years, I've been saving
| every relevant PDF that I can find -- mostly studies of the sort
| published by Elsevier and Springer, but also books and
| presentations. I now have around 10k, which probably makes it the
| largest private library focused on this particular domain of
| expertise.
|
| It has been extremely useful, especially because it's text-
| searchable and the really important papers are properly
| categorized.
|
| A local LLM will make it 100x more useful. Also, it might not
| even need be "local." If I make it available via the web, I can
| probably sell access to other scientists and engineers in my
| field.
|
| Recent advances _really_ benefit data hoarders out there.
|
| I'd add that these days it totally makes sense to download
| libgen's entire archive, because (1) storage has never been
| cheaper, and (2) you can use it to train local LLMs.
| hervature wrote:
| > If I make it available via the web, I can probably sell
| access to other scientists and engineers in my field.
|
| Out of curiosity. Does this statement come from complete
| ignorance of or complete disregard to copyright of the author?
| throwaway11460 wrote:
| If it's research it very probably is at least partially
| publicly funded. Regardless of whatever the law says, I don't
| think it's immoral to take it and offer better services
| around it that will be useful enough that someone decides to
| pay.
| hervature wrote:
| Do you not see the hypocrisy in stating that someone should
| be able to take something partially publicly funded and
| profit from it while the creator of said work should not
| retain some rights over said profit? By extension of the
| transitive property of the nebulous "partially", the LLM
| wrapper should be provided for free with complete disregard
| to the wrapper's creator since it is a derivative of
| partially publicly funded work.
| throwaway11460 wrote:
| Yeah, the LLM maybe - though nobody paid for the training
| costs in that case and that feels weird. If the public
| paid for a work, it should be able to use it and not be
| required to give away their own derivative work for free
| since they already paid for it through taxes.
|
| While I rather like the idea of having to provide access
| to derivates of publicly funded works, I fear that people
| would rather not use it than invest money into innovative
| approaches of using it. Of course if the public pays for
| the training and development costs, then by all means it
| should be available.
|
| And the library itself and the computing resources to
| operate it cost money that someone needs to pay.
| Publishers didn't pay for the research and yet they can
| profit from it - why this guy shouldn't?
| jddj wrote:
| Rightly or wrongly, Elsevier has a market capitalisation
| of PS62B for essentially doing what GP is proposing.
| p_l wrote:
| The hoarders of IP in this case usually have done 0 work
| to produce it (Elsevier et al).
| walterbell wrote:
| _> complete ignorance of or complete disregard to copyright_
|
| A question that could be appended to many LLM discussions!
| luqtas wrote:
| but the LLM is no different than a human doing that &
| outputs COMPLETELY unique strings! - said the profiting
| robot parked at the gray zone law area waiting for lobyists
| turn it white
| znpy wrote:
| OpenAI et similia seems to be doing just fine though
| thomastjeffery wrote:
| I see you have encountered the first reality of "personal
| library".
|
| Of course, the idea of selling access to your private digital
| collection (or a derivative model) is _relatively more
| absurd_ than the idea of monopolizing the original published
| work... Even so, this is as good a time as any to reconsider
| the practicality of copyright.
| squigz wrote:
| The data hoarder community would encourage you to release that
| collection for free, not try to profit from it.
| matthewmorgan wrote:
| How would you go about training an llm on 10k large pdfs?
| RetroTechie wrote:
| > I'd add that these days it totally makes sense to download
| libgen's entire archive, because (1) storage has never been
| cheaper, and (2) you can use it to train local LLMs.
|
| Hardly. Data hoarding comes with most downsides of hoarding
| physical objects. It's just smaller, cheaper & easier to
| process.
|
| There's people that get rid of any physical object they haven't
| used in the past year (or 2, or 5, whatever). This makes sense.
| Imho, every object you own falls in 3 categories:
|
| 1) Things you use on a (semi?) regular basis. They make your
| life easier/nicer.
|
| 2) Things that are valuable. For flexible metrics of what
| constitutes value (sentimental, nostalgia, monetary, insurance
| against 'disaster', ...)
|
| Not having used something in a long time is a good hint it's
| _not_ valuable.
|
| 3) Luggage. Whose value is negative. It doesn't provide
| anything, just takes up space, drains mental energy (and
| possibly other resources), and in doing so gets in the way of
| other pursuits.
|
| Data is no different. For any _single_ piece of it, you either
| use it from time to time, somehow derive value from it, or it
| is useless luggage that you drag around at a cost.
|
| Apply good judgement in what to hoard.
| throwaway14356 wrote:
| i just keep everything ive bothered to keep, (data and stuff)
| i put it in a sensible place and it only consumes time when
| looking for it.
|
| the new external drive is so much larger than the oldest ones
| i put the new stuff on it and use the rest of the space to
| backup old drives.
|
| friends are constantly purging stuff but they seem unaware
| how much time and effort it takes.
|
| the significant other wanted to clean up her old photos
| rather than upgrade the icloud. its an insane amount of work?
| squigz wrote:
| > Data hoarding comes with most downsides of hoarding
| physical objects. It's just smaller, cheaper & easier to
| process.
|
| So the same downsides, except ... much better?
|
| > There's people that get rid of any physical object they
| haven't used in the past year (or 2, or 5, whatever)
|
| > Data is no different. ... you either use it from time to
| time
|
| I suppose the real question then is what timeframe do you
| consider "using it from time to time"? It seems to me that
| this depends very much on the person, and likely on the
| objects themselves - probably a large appliance you haven't
| used in a year is likelier to be thrown away than a small
| item you haven't used in several. Considering data storage is
| smaller, cheaper, and easier to manage, I suppose a
| reasonable timeframe for keeping it, just in case, would be a
| lot longer than physical items.
|
| This of course says nothing of the societal and cultural
| value such archivists safeguard.
| netdevnet wrote:
| > A local LLM will make it 100x more useful. Also, it might not
| even need be "local." If I make it available via the web, I can
| probably sell access to other scientists and engineers in my
| field.
|
| That's not legal. Just because you own some files does not mean
| that you own the IP of the content within.
| __MatrixMan__ wrote:
| It is, however, the right thing to do.
| squigz wrote:
| Profiting off it is not the right thing to do.
| financetechbro wrote:
| That's OpenAI's business model tho
| detourdog wrote:
| If one remembers NeXT included all sorts of non-computer
| documents and literature. The idea of storing vasts amounts of
| data for personal use was at the dawn of the PC era.
| WillAdams wrote:
| I miss Librarian.app --- it was quite useful for a project of
| mine:
|
| https://tug.org/TUGboat/Articles/tb24-2/tb77adams.pdf
|
| (basically used copies of _The Bible_ and The Works of
| Shakespeare to determine if a given set of letters appeared in
| the English language or no)
| WillAdams wrote:
| Not too long ago, I managed to pretty much ruin the wiki for a
| small (and at that time opensource) CNC machine by using it as my
| personal notebook --- probably my usage of it thus was a big part
| of why it was left off-line when the person hosting it moved.
|
| You can see it on the Wayback Machine:
|
| https://web.archive.org/web/20211127090321/https://wiki.shap...
|
| In retrospect, I should have put some of that effort into:
|
| https://en.wikibooks.org/wiki/Hobbyist_CNC_Machining
|
| although since then, a machine owner worked up:
|
| https://shapeokoenthusiasts.gitbook.io/shapeoko-cnc-a-to-z
|
| I still regret a bunch of stuff I didn't keep copies of, esp. the
| scans of Barry Hughart's notes for his novels.
|
| The irony is that one can see a bit of the result of discussion
| of this sort of thing at the top of one's browser window --- the
| URL bar, where URL == "Uniform Resource Locator" --- the
| originally proposed term was "Universal Resource Locator", but
| the argument against that was that people were not librarians,
| and that unlike Ted Nelson's Xanadu, there wouldn't an over-
| arching data structure and organization, so a given document
| wouldn't have a single canonical location.
|
| Anyone interested in this sort of thing who hasn't read it,
| should read Tim Berner-Lee's book:
|
| https://www.w3.org/People/Berners-Lee/Weaving/Overview.html
| dtagames wrote:
| Best quote from the article: _"...personal library science is
| focused on your relationship with your information. How do we
| store information so that it useful at a later date? How do we
| transform our information into new valuable assets in different
| creative domains? How do we do all of this while being flexible
| enough for the idiosyncrasies, proclivities, likes and dislikes
| of eight billion distinct individuals? How do we chronicle the
| information diet of a single person as they learn new things,
| interact with the world at different phases in their life? How do
| we make sure we can pass down our best knowledge to generations
| below? "_
| EricE wrote:
| My personal favorite
| https://www.devontechnologies.com/apps/devonthink
| RecycledEle wrote:
| Years ago I spent thousands of hours trying to figure out how to
| organize a digital library.
|
| My final answer was to use the Library of Congress catalog
| system. They need to add some sub-categories for how-to
| explanations.
|
| Then have a field for media type (video vs. PDF vs. image)
|
| Then note the style of presentation (academic vs. folksy vs. a
| manual vs. a dad showing you how to do this)
|
| Then note the language
___________________________________________________________________
(page generated 2024-04-29 23:02 UTC)