[HN Gopher] GPT for Second Brains
       ___________________________________________________________________
        
       GPT for Second Brains
        
       Author : abbabon
       Score  : 93 points
       Date   : 2023-02-06 18:27 UTC (4 hours ago)
        
 (HTM) web link (reasonabledeviations.com)
 (TXT) w3m dump (reasonabledeviations.com)
        
       | michaericalribo wrote:
       | These "augmented intelligence" applications are so exciting to
       | me. I'm not as interested in autonomous artificial intelligence.
       | Computers are tools to make my life easier, not meant to lead
       | their own lives!
       | 
       | There's a big up-front cost of building a notes database for this
       | application, but it illustrates the point nicely: encode a bunch
       | of data ("memories"), and use an AI like GPT to retrieve
       | information ("remembering"). It's not a fundamentally different
       | process from what we do already, but it replaces the need for me
       | to spend time on an automatable task.
       | 
       | I'm excited to see what humans spend our time doing once we've
       | offloaded the boring dirty work to AIs.
        
         | pessimist wrote:
         | In chess we had a tiny window of a few years when humans could
         | use the help of computers to play the world's best chess. By
         | 2000, computers were far better than humans and the gap has
         | increased. Chess players are now entertainers, like all us
         | humans are destined to spend our time doing.
        
       | leobg wrote:
       | Slight overkill to use GPT, though it works for the author and I
       | can see that it's the low hanging fruit, being available as an
       | API. But this can also be done locally, using SBERT, or even
       | (faster, though less powerful) fastText.
       | 
       | Also, it's helpful not to cut paragraphs into separate pieces,
       | but rather to use a sliding window approach, where each paragraph
       | retains the context of what came before, and/or the breadcrumbs
       | of its parent headlines.
        
         | dchuk wrote:
         | When using SBERT instead of gpt for this use case, is it paired
         | with some sort of vector database or just all done in
         | code/memory?
        
           | leobg wrote:
           | You'd want persistence, since the embedding process takes
           | some time. But you don't need to go all Pinecone on this.
           | There is FAISS, and there is hnswlib, for example. Like
           | SQLite for vector search.
        
       | sowbug wrote:
       | I wonder whether your individually trained chat bot will be
       | allowed to assert the Fifth Amendment right against self-
       | incrimination to stop it from talking when the police interview
       | it. And if it's allowed, do you or it decide whether whether to
       | assert it? What if the two of you disagree?
       | 
       | Similar questions for civil trials, divorce proceedings, child
       | custody....
        
         | bbor wrote:
         | Wait is this a serious concern or a joke? I think asking a chat
         | bot whether it's admissible is the same as asking someone's
         | handwritten diary if it's admissible.
        
           | sowbug wrote:
           | Fortunately, US courts take these sorts of questions quite
           | seriously. See, e.g., State v. Smith (Ohio 2009): "Even the
           | more basic models of modern cell phones are capable of
           | storing a wealth of digitized information wholly unlike any
           | physical object found within a closed container."
           | https://en.wikipedia.org/wiki/Carpenter_v._United_States is
           | also a good read, as is
           | https://en.wikipedia.org/wiki/Riley_v._California: _" Modern
           | cell phones are not just another technological convenience.
           | With all they contain and all they may reveal, they hold for
           | many Americans "the privacies of life". The fact that
           | technology now allows an individual to carry such information
           | in his hand does not make the information any less worthy of
           | the protection for which the Founders fought."_
           | 
           | If a cell phone is recognized as having a higher expectation
           | of privacy than a mere passive document, then it stands to
           | reason courts will also recognize a personalized machine that
           | is even more earnest to help as my infernal "by the way..."
           | Alexa home device.
        
         | rolenthedeep wrote:
         | Think about what kinds of data an AI assistant would store
         | about you vs what your phone stores.
         | 
         | Your phone is already more or less an extension of your brain,
         | and whether or not you can be forced to unlock and surrender it
         | for inspection is already a contentious topic.
         | 
         | IANAL, but phone privacy would probably set the precedent for
         | AI assistant privacy.
         | 
         | Always keep your phone encrypted and be aware what your local
         | laws are. Some places will force you to provide biometric
         | authentication, but not provide a PIN or password. Check if
         | your phone has a duress lockdown mode: some phones lock and/or
         | wipe if you press the power button five times or something like
         | that.
        
         | michaericalribo wrote:
         | Imagine a model that decides on its own to assert the Fifth on
         | your behalf. But now imagine that AI decides to lock you out of
         | your own system...
        
           | DavidPiper wrote:
           | We already have those. They're called Google, Microsoft and
           | Apple.
        
         | qwertox wrote:
         | This is a topic which really deserves a lot more attention, as
         | in: from magazines to newspapers to talk shows. Seems like an
         | appropriate time to get it on the agenda before governments opt
         | to decide on their own.
        
         | roywiggins wrote:
         | How could it have a right not to self-incriminate when it can't
         | be tried for a crime? An AI can't be indicted or convicted.
         | 
         | Humans can be required to testify too if they're immunized.
        
           | sokoloff wrote:
           | With limitations. I cannot be compelled to testify against my
           | wife (and probably not against my kids, though I'm unsure of
           | that [edit: that seems to vary by state currently]), even if
           | I personally am granted immunity.
        
             | [deleted]
        
           | sowbug wrote:
           | Apologies for the ambiguity. I imagine that my future AI-
           | based "second brain" will be derived from my own brain,
           | including its personality, memories, and preferences. Anyone
           | in an adversarial position to me would be very interested in
           | talking to it. I pose whether the term "self-incrimination"
           | should include one's second brain as part of one's self. The
           | question was not whether police would put a PC in jail.
        
       | tra3 wrote:
       | This is fascinating.
       | 
       | Can I train it on 5 years of stream of consciousness morning
       | brain dumps and then say "write blah as me"?
       | 
       | Before I do that, I'd love to know if training data becomes part
       | of the global knowledge base available to everyone..
        
         | ilaksh wrote:
         | This is not a fine-tuning example. It's an embedding search
         | example. You use the embeddings to search for relevant
         | knowledgebase chunks and then include them in the prompt. Which
         | goes to the original model, not a model that you have trained
         | more.
         | 
         | This is popular because it's much much easier to do effectively
         | than fine tuning and the OpenAI model is very capable of
         | integrating kb snippets into a response. What I have heard is
         | that it's easy to overdo fine tuning with OpenAI's model and
         | makes more sense when you want a different format of response
         | rather than just pulling in some content.
         | 
         | Having said all of that, they do have a fine-tuning endpoint
         | and I am guessing if you find the right parameters and give it
         | a lot of properly formatted training data then it will be able
         | to do an okay job. I have the impression it is not easy to do
         | either of those things quite right though.
         | 
         | As far as privacy, no they will not share your data when you
         | use the API. ChatGPT is different, they ARE using the inputs to
         | train the model.
        
           | crosen99 wrote:
           | > Having said all of that, they do have a fine-tuning
           | endpoint and I am guessing if you find the right parameters
           | and give it a lot of properly formatted training data then it
           | will be able to do an okay job.
           | 
           | Unfortunately, the fine-tuning API cannot be used to add
           | knowledge to the model. It only helps condition the model to
           | a certain response pattern using the knowledge it already
           | has.
        
         | PartiallyTyped wrote:
         | I have considered training a model on about a year's
         | conversations from my little community's discord server and ask
         | it so synthesise sentences as if I was writing them.
        
         | michaericalribo wrote:
         | These privacy considerations are highest-priority for any
         | extended roll-out of LLM-based products.
         | 
         | Privacy on the side of model servers would be good. Open source
         | models that can be run locally would be better.
        
           | feanaro wrote:
           | I personally think anything server-side is unacceptable. Only
           | open source and local will fly.
        
       | lukemtx wrote:
       | I wanted to do this! :D
        
       | PaulHoule wrote:
       | Would be nice to see some indication of how well it works in his
       | case.
       | 
       | I worked on a 'Semantic Search' product almost 10 years ago that
       | used a neural network to do dimensional reduction and had inputs
       | to the scoring function from the 'gist vector' and the residual
       | word vector which was possible to calculate in that case because
       | the gist vector was derived from the word vector and the
       | transform was reversible.
       | 
       | I've seen papers in the literature which come to the same
       | conclusion about what it takes to get good similarity results w/
       | older models as a significant amount of the meaning in text is in
       | pointy words that might not be included in the gist vector, maybe
       | you do better with an LLM since the vocabulary is huge.
        
         | sandkoan wrote:
         | I'd honestly argue that he might not have even needed OpenAI
         | embeddings--any off-the-shelf Huggingface model would've
         | sufficed.
         | 
         | Because of attention mechanisms, we no longer so heavily depend
         | on the existence of those "pointy words," so generally,
         | Transformers-based semantic search works quite well.
        
           | lpasselin wrote:
           | I actually tried this last year, before OpenAI released their
           | cheaper embeddings v2 in december. From my experiments, when
           | compared to Bert embeddings (or recent variation of the
           | model) the OpenAI embeddings are miles ahead when doing
           | similarity search.
        
             | leobg wrote:
             | Interesting. Nils Reimers (SBERT guy) wrote on Medium that
             | he found them to perform worse than SOTA models. Though
             | that was, I believe, before December.
        
           | PaulHoule wrote:
           | I was thinking RoBERTa 3, longformer or Big Bird would be a
           | good choice for this, though having any limit on the
           | attention window is a weakness.
        
       | trane_project wrote:
       | I've been thinking of using GPT or similar LLMs to extract
       | flashcards to use with my spaced repetition project
       | (https://github.com/trane-project/trane/). As in you give it a
       | book and it creates the flashcards for you and the dependencies
       | between the lessons.
       | 
       | I played around with chatgpt and it worked pretty well. I have a
       | lot of other things in my plate to get around first (including
       | starting a math curriculum) but it's definitely an exciting
       | direction.
       | 
       | I think LLMs and AI are not anywhere near actual intelligence
       | (chatgpt can spout a lot of good sounding nonsense ATM), but the
       | semantic analysis they can do is by itself very useful.
        
       ___________________________________________________________________
       (page generated 2023-02-06 23:00 UTC)