[HN Gopher] GPT for Second Brains
___________________________________________________________________
GPT for Second Brains
Author : abbabon
Score : 93 points
Date : 2023-02-06 18:27 UTC (4 hours ago)
(HTM) web link (reasonabledeviations.com)
(TXT) w3m dump (reasonabledeviations.com)
| michaericalribo wrote:
| These "augmented intelligence" applications are so exciting to
| me. I'm not as interested in autonomous artificial intelligence.
| Computers are tools to make my life easier, not meant to lead
| their own lives!
|
| There's a big up-front cost of building a notes database for this
| application, but it illustrates the point nicely: encode a bunch
| of data ("memories"), and use an AI like GPT to retrieve
| information ("remembering"). It's not a fundamentally different
| process from what we do already, but it replaces the need for me
| to spend time on an automatable task.
|
| I'm excited to see what humans spend our time doing once we've
| offloaded the boring dirty work to AIs.
| pessimist wrote:
| In chess we had a tiny window of a few years when humans could
| use the help of computers to play the world's best chess. By
| 2000, computers were far better than humans and the gap has
| increased. Chess players are now entertainers, like all us
| humans are destined to spend our time doing.
| leobg wrote:
| Slight overkill to use GPT, though it works for the author and I
| can see that it's the low hanging fruit, being available as an
| API. But this can also be done locally, using SBERT, or even
| (faster, though less powerful) fastText.
|
| Also, it's helpful not to cut paragraphs into separate pieces,
| but rather to use a sliding window approach, where each paragraph
| retains the context of what came before, and/or the breadcrumbs
| of its parent headlines.
| dchuk wrote:
| When using SBERT instead of gpt for this use case, is it paired
| with some sort of vector database or just all done in
| code/memory?
| leobg wrote:
| You'd want persistence, since the embedding process takes
| some time. But you don't need to go all Pinecone on this.
| There is FAISS, and there is hnswlib, for example. Like
| SQLite for vector search.
| sowbug wrote:
| I wonder whether your individually trained chat bot will be
| allowed to assert the Fifth Amendment right against self-
| incrimination to stop it from talking when the police interview
| it. And if it's allowed, do you or it decide whether whether to
| assert it? What if the two of you disagree?
|
| Similar questions for civil trials, divorce proceedings, child
| custody....
| bbor wrote:
| Wait is this a serious concern or a joke? I think asking a chat
| bot whether it's admissible is the same as asking someone's
| handwritten diary if it's admissible.
| sowbug wrote:
| Fortunately, US courts take these sorts of questions quite
| seriously. See, e.g., State v. Smith (Ohio 2009): "Even the
| more basic models of modern cell phones are capable of
| storing a wealth of digitized information wholly unlike any
| physical object found within a closed container."
| https://en.wikipedia.org/wiki/Carpenter_v._United_States is
| also a good read, as is
| https://en.wikipedia.org/wiki/Riley_v._California: _" Modern
| cell phones are not just another technological convenience.
| With all they contain and all they may reveal, they hold for
| many Americans "the privacies of life". The fact that
| technology now allows an individual to carry such information
| in his hand does not make the information any less worthy of
| the protection for which the Founders fought."_
|
| If a cell phone is recognized as having a higher expectation
| of privacy than a mere passive document, then it stands to
| reason courts will also recognize a personalized machine that
| is even more earnest to help as my infernal "by the way..."
| Alexa home device.
| rolenthedeep wrote:
| Think about what kinds of data an AI assistant would store
| about you vs what your phone stores.
|
| Your phone is already more or less an extension of your brain,
| and whether or not you can be forced to unlock and surrender it
| for inspection is already a contentious topic.
|
| IANAL, but phone privacy would probably set the precedent for
| AI assistant privacy.
|
| Always keep your phone encrypted and be aware what your local
| laws are. Some places will force you to provide biometric
| authentication, but not provide a PIN or password. Check if
| your phone has a duress lockdown mode: some phones lock and/or
| wipe if you press the power button five times or something like
| that.
| michaericalribo wrote:
| Imagine a model that decides on its own to assert the Fifth on
| your behalf. But now imagine that AI decides to lock you out of
| your own system...
| DavidPiper wrote:
| We already have those. They're called Google, Microsoft and
| Apple.
| qwertox wrote:
| This is a topic which really deserves a lot more attention, as
| in: from magazines to newspapers to talk shows. Seems like an
| appropriate time to get it on the agenda before governments opt
| to decide on their own.
| roywiggins wrote:
| How could it have a right not to self-incriminate when it can't
| be tried for a crime? An AI can't be indicted or convicted.
|
| Humans can be required to testify too if they're immunized.
| sokoloff wrote:
| With limitations. I cannot be compelled to testify against my
| wife (and probably not against my kids, though I'm unsure of
| that [edit: that seems to vary by state currently]), even if
| I personally am granted immunity.
| [deleted]
| sowbug wrote:
| Apologies for the ambiguity. I imagine that my future AI-
| based "second brain" will be derived from my own brain,
| including its personality, memories, and preferences. Anyone
| in an adversarial position to me would be very interested in
| talking to it. I pose whether the term "self-incrimination"
| should include one's second brain as part of one's self. The
| question was not whether police would put a PC in jail.
| tra3 wrote:
| This is fascinating.
|
| Can I train it on 5 years of stream of consciousness morning
| brain dumps and then say "write blah as me"?
|
| Before I do that, I'd love to know if training data becomes part
| of the global knowledge base available to everyone..
| ilaksh wrote:
| This is not a fine-tuning example. It's an embedding search
| example. You use the embeddings to search for relevant
| knowledgebase chunks and then include them in the prompt. Which
| goes to the original model, not a model that you have trained
| more.
|
| This is popular because it's much much easier to do effectively
| than fine tuning and the OpenAI model is very capable of
| integrating kb snippets into a response. What I have heard is
| that it's easy to overdo fine tuning with OpenAI's model and
| makes more sense when you want a different format of response
| rather than just pulling in some content.
|
| Having said all of that, they do have a fine-tuning endpoint
| and I am guessing if you find the right parameters and give it
| a lot of properly formatted training data then it will be able
| to do an okay job. I have the impression it is not easy to do
| either of those things quite right though.
|
| As far as privacy, no they will not share your data when you
| use the API. ChatGPT is different, they ARE using the inputs to
| train the model.
| crosen99 wrote:
| > Having said all of that, they do have a fine-tuning
| endpoint and I am guessing if you find the right parameters
| and give it a lot of properly formatted training data then it
| will be able to do an okay job.
|
| Unfortunately, the fine-tuning API cannot be used to add
| knowledge to the model. It only helps condition the model to
| a certain response pattern using the knowledge it already
| has.
| PartiallyTyped wrote:
| I have considered training a model on about a year's
| conversations from my little community's discord server and ask
| it so synthesise sentences as if I was writing them.
| michaericalribo wrote:
| These privacy considerations are highest-priority for any
| extended roll-out of LLM-based products.
|
| Privacy on the side of model servers would be good. Open source
| models that can be run locally would be better.
| feanaro wrote:
| I personally think anything server-side is unacceptable. Only
| open source and local will fly.
| lukemtx wrote:
| I wanted to do this! :D
| PaulHoule wrote:
| Would be nice to see some indication of how well it works in his
| case.
|
| I worked on a 'Semantic Search' product almost 10 years ago that
| used a neural network to do dimensional reduction and had inputs
| to the scoring function from the 'gist vector' and the residual
| word vector which was possible to calculate in that case because
| the gist vector was derived from the word vector and the
| transform was reversible.
|
| I've seen papers in the literature which come to the same
| conclusion about what it takes to get good similarity results w/
| older models as a significant amount of the meaning in text is in
| pointy words that might not be included in the gist vector, maybe
| you do better with an LLM since the vocabulary is huge.
| sandkoan wrote:
| I'd honestly argue that he might not have even needed OpenAI
| embeddings--any off-the-shelf Huggingface model would've
| sufficed.
|
| Because of attention mechanisms, we no longer so heavily depend
| on the existence of those "pointy words," so generally,
| Transformers-based semantic search works quite well.
| lpasselin wrote:
| I actually tried this last year, before OpenAI released their
| cheaper embeddings v2 in december. From my experiments, when
| compared to Bert embeddings (or recent variation of the
| model) the OpenAI embeddings are miles ahead when doing
| similarity search.
| leobg wrote:
| Interesting. Nils Reimers (SBERT guy) wrote on Medium that
| he found them to perform worse than SOTA models. Though
| that was, I believe, before December.
| PaulHoule wrote:
| I was thinking RoBERTa 3, longformer or Big Bird would be a
| good choice for this, though having any limit on the
| attention window is a weakness.
| trane_project wrote:
| I've been thinking of using GPT or similar LLMs to extract
| flashcards to use with my spaced repetition project
| (https://github.com/trane-project/trane/). As in you give it a
| book and it creates the flashcards for you and the dependencies
| between the lessons.
|
| I played around with chatgpt and it worked pretty well. I have a
| lot of other things in my plate to get around first (including
| starting a math curriculum) but it's definitely an exciting
| direction.
|
| I think LLMs and AI are not anywhere near actual intelligence
| (chatgpt can spout a lot of good sounding nonsense ATM), but the
| semantic analysis they can do is by itself very useful.
___________________________________________________________________
(page generated 2023-02-06 23:00 UTC)