[HN Gopher] Memory and new controls for ChatGPT
___________________________________________________________________
Memory and new controls for ChatGPT
Author : Josely
Score : 275 points
Date : 2024-02-13 18:10 UTC (4 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| cl42 wrote:
| I love this idea and it leads me to a question for everyone here.
|
| I've done a bunch of user interviews of ChatGPT, Pi, Gemini, etc.
| users and find there are two common usage patterns:
|
| 1. "Transactional" where every chat is a separate question, sort
| of like a Google search... People don't expect memory or any
| continuity between chats.
|
| 2. "Relationship-driven" where people chat with the LLM as if
| it's a friend or colleague. In this case, memory is critical.
|
| I'm quite excited to see how OpenAI (and others) blend usage
| features between #1 and #2, as in many ways, these can require
| different user flows.
|
| So HN -- how do you use these bots? And how does memory resonate,
| as a result?
| Crespyl wrote:
| Personally, I always expect every "conversation" to be starting
| from a blank slate, and I'm not sure I'd want it any other way
| unless I can self-host the whole thing.
|
| Starting clean also has the benefit of knowing the
| prompt/history is in a clean/"known-good" state, and that
| there's nothing in the memory that's going to cause the LLM to
| get weird on me.
| danShumway wrote:
| > Starting clean also has the benefit of knowing the
| prompt/history is in a clean/"known-good" state, and that
| there's nothing in the memory that's going to cause the LLM
| to get weird on me.
|
| This matters a _lot_ for prompt injection /hijacking. Not
| that I'm clamoring to give OpenAI access to my personal files
| or APIs in the first place, but I'm definitely not interested
| in giving a version of GPT with more persistent memory access
| to those files or APIs. A clean slate is a mitigating feature
| that helps with a real security risk. It's not _enough_ of a
| mitigating feature, but it helps a bit.
| mark_l_watson wrote:
| I have thought of implementing something like you are
| describing using local LLMs. Chunk the text of all
| conversations, use an embeddings data store for search, and
| for each new conversation calculate an embedding for the new
| prompt, add context text from previous conversations. This
| would be maybe 100 lines of Python, if that. Really, a RAG
| application, storing as chunks previous conversations.
| mhink wrote:
| Looks like you'll be able to turn the feature off:
|
| > You can turn off memory at any time (Settings >
| Personalization > Memory). While memory is off, you won't
| create or use memories.
| madamelic wrote:
| Memory would be much more useful on a project or topic basis.
|
| I would love if I could have isolated memory windows where it
| would remember what I am working on but only if the chat was
| in a 'folder' with the other chats.
|
| I don't want it to blend ideas across my entire account but
| just a select few.
| yieldcrv wrote:
| Speaking of transactional, the textual version of ChatGPT4
| never asks questions or is having a conversation, its
| predicting what it thinks you need to know. One response,
| nothing unprompted.
|
| Oddly, the spoken version of ChatGPT4 does implore, listens and
| responds to tones, gives the same energy back and does ask
| questions. Sometimes it accidentally sounds sarcastic "is
| _this_ one of your interests?"
| kiney wrote:
| I use it exclusively in the "transactional" style, often even
| opening a new chat for the same topic when chatgpt is going
| down the wrong road
| hobofan wrote:
| My main usage of ChatGPT/Phind is for work-transactional
| things.
|
| For those cases there are quite a few things that I'd like it
| to memorize, like programming library preferences ("When
| working with dates prefer `date-fns` over `moment.js`") or code
| style preferences ("When writing a React component, prefer
| function components over class components"). Currently I feed
| in those preferences via the custom instructions feature, but I
| rarely take some time to update them, so the memory future is a
| welcome addition here.
| kraftman wrote:
| Personally i would like a kind of 2D Map of 'contexts' in which
| i can choose in space where to ask new questions. Each context
| would contain sub contexts. For example maybe I'm looking for
| career advice and I start out a chat with details of my job
| history, then im looking for a job and i paste in my cv, then
| im applying for a specific job and i paste in the job
| description. It would be nice to easily navigate to the
| career+cv+specific job description and start a new chat with
| 'whats missing from my cv that i should highlight for this
| job'.
|
| I find that I ask a mix of one of questions and questions that
| require a lot of refinement, and the latter get buried among
| the former when i try and find them again, so i end up re
| explaining myself in new chats.
| polygamous_bat wrote:
| I think it's less of a 2D structure and more of a tree
| structure that you are describing. I've also felt the need of
| having "threads" with ChatGPT that I wish I could follow.
| kraftman wrote:
| Yeah thats probably a better way of putting it. Like a lot
| of times I find myself wanting to branch off of the same
| answer with different questions, and I worry that if I ask
| them all sequentially chatgpt will lose 'focus'.
| airstrike wrote:
| you can go back and edit an answer, which then creates a
| separate "thread". clicking left / right on that edited
| answer will reload the subsequent replies that came from
| that specific version of the answer
| singularity2001 wrote:
| You can create your own custom gpts for different scenarios
| in no time
| jedberg wrote:
| I use for transactional tasks. Mostly of the "I need a
| program/script/command line that does X".
|
| Some memory might actually be helpful. For example having it
| know that I have a Mac will give me Mac specific answers to
| command line questions without me having to add "for the Mac"
| to my prompt. Or having it know that I prefer python it will
| give coding answers in Python.
|
| But in all those cases it takes me just a few characters to
| express that context with each request, and to be honest, I'll
| probably do it anyway even with memory, because it's habit at
| this point.
| c2lsZW50 wrote:
| For what you described the
| glenstein wrote:
| I think this is an extremely helpful distinction, because it
| disentangles a couple of things I could not clearly disentangle
| in my own.
|
| I think I am, and perhaps most people are, firmly
| transactional. And I think, in the interests of perusing
| "stickiness" unique to OpenAI, they are attempting to add
| relationship-driven/sticky bells and whistles, even though
| those pull the user interface as a whole toward a set of
| assumptions about usage that don't apply to me.
| snoman wrote:
| For me it's a combination of transactional and topical. By
| topical, I mean that I have a couple of persistent topics that
| I think on and work on (like writing an article on a topic),
| and I like to return to those conversations so that the context
| is there.
| jgalt212 wrote:
| On MS Copilot
|
| > Materials-science company Dow plans to roll out Copilot to
| approximately half of its employees by the end of 2024, after a
| test phase with about 300 people, according to Melanie Kalmar,
| chief information and chief digital officer at Dow.
|
| How do I get ChatGPT to give me Dow Chemical trade secrets?
| hackerlight wrote:
| OpenAI says they don't train on data from enterprise customers
| danielbln wrote:
| They say they don't train on:
|
| - Any API requests
|
| - ChatGPT Enterprise
|
| - ChatGPT Teams
|
| - ChatGPT with history turned off
| dylan604 wrote:
| As long as it runs in the cloud, there is no way of
| _knowing_ that is true. As you mentioned, "they say"
| requires a lot of faith to me.
| minimaxir wrote:
| OpenAI's terminology and implementations have been becoming
| increasingly more nonstandard and black box such that it's making
| things more confusing than anything else even for people like
| myself who are proficient in the space. I can't imaging how the
| nontechnical users they are targeting with the ChatGPT webapp
| feel.
| Nition wrote:
| Non-technical users can at least still just sign up, see the
| text box to chat, and start typing. You'll know the real
| trouble's arrived when new sign-ups get hit with some sort of
| unskippable onboarding. "Select three or more categories that
| interest you."
| bfeynman wrote:
| I would think it is intentional and brand strategy. OpenAI is
| such a force majeure that people will not know how to switch
| off of it if needed, makes their solutions more sticky. Other
| companies will probably adjust to their terminology just to
| keep up and make it easier for others to onboard.
| minimaxir wrote:
| The only term that OpenAI really popularized is "function
| calling", which is very poorly named to the point that they
| ended up abandoning it in favor for the more standard
| "tools".
|
| I went into a long tangent about specifically that in this
| post: https://news.ycombinator.com/item?id=38782678
| Nimitz14 wrote:
| So, so, so curious how they are implementing this.
| lxgr wrote:
| I wouldn't be surprised if they essentially just add it to the
| prompt. ("You are ChatGPT... You are talking to a user that
| prefers cats over dogs and is afraid of spiders, prefers bullet
| points over long text...").
| TruthWillHurt wrote:
| I think RAG approach with Vector DB is more likely. Just like
| when you add a file to your prompt / custom GPTs.
|
| Adding the entire file (or memory in this case) would take up
| too much of the context. So just query the DB and if there's
| a match add it to the prompt _after_ the conversation
| started.
| lxgr wrote:
| These "memories" seem rather short, much shorter than the
| average document in a knowledge base or FAQ, for example.
| Maybe they do get compressed to embedding vectors, though.
|
| I could imagine that once there's too many, it would indeed
| make sense to classify them as a database, though: "Prefers
| cats over dogs" is probably not salient information in too
| many queries.
| minimaxir wrote:
| My hunch is that they summarize the conversation periodically
| and inject that as additional system prompt constraints.
|
| That was a common hack for the LLM context length problem, but
| now that context length is "solved" it could be more useful to
| align output a bit better.
| msp26 wrote:
| Surely someone can use a jailbreak to dump the context right?
| The same way we've been seeing how functions work.
| sergiotapia wrote:
| I've done similar before this feature launched to produce a
| viable behavior therapist AI. I ain't a doctor, viable to me
| was: it worked and remembered previous info as a base for next
| steps.
|
| Periodically "compress" chat history into relevant context and
| keep that slice of history as part of the memory.
|
| 15 day message history could be condensed greatly and still
| produce great results.
| hobofan wrote:
| MemGPT I would assume + background worker that scans through
| your conversation to add new items.
| m3kw9 wrote:
| Sounds very useful and at the same time a lock in mechanism,
| obvious but genius
| TruthWillHurt wrote:
| The thing already ignores my custom instructions and prompt, why
| would this make any difference?
| renewiltord wrote:
| This is a feature I've always wanted, but ChatGPT gets more
| painful the more instructions you stick into the context. That's
| a pity because I assume that's what this is doing: copying all
| memory items into a numbered list with some pre-prompt like "This
| is what you know about the user based on past chats" or
| something.
|
| Anyway, it seems to be implemented quite well with a lot of user
| controls so that is nice. I think it's possible I will soon
| upgrade to a Team plan and get the family on that.
|
| A habit I have is that if it gets something wrong I place the
| correction there in the text. The idea being that I could
| eventually scroll down and find it. Maybe in the future, they can
| record this stuff in some sort of RAGgable machine and it will
| have true memory.
| drcode wrote:
| This kind of just sounds like junk that will clog up the context
| window
|
| I'll have try it out though to know for sure
| Prosammer wrote:
| I've been finding with these large context windows that context
| window length is no longer the bottleneck for me -- the LLM
| will start to hallucinate / fail to find the stuff I want from
| the text long before I hit the context window limit.
| drcode wrote:
| Yeah, there is basically a soft limit now where it just is
| less effective as the context gets larger
| hobofan wrote:
| I'm assuming that they have implemented it via a MemGPT-like
| approach, which doesn't clog the context window. The main pre-
| requisite for doing that is having good function calling, where
| OpenAI currently is significantly in the lead.
| lxgr wrote:
| This seems like a really useful (and obvious) feature, but I
| wonder if this could lead to a kind of "AI filter bubble": What
| if one of its memories is "this user doesn't like to be argued
| with; just confirm whatever they suggest"?
| blueboo wrote:
| This is an observed behaviour in large models, which tend
| towards "sycophancy" as they scale.
| https://www.anthropic.com/news/towards-understanding-sycopha...
| kromem wrote:
| More "as they are fine tuned" vs "as they scale"
| bluish29 wrote:
| It is already ignoring your prompt and custom instructions. For
| example, If I explicity ask it to provide a code instead of an
| overview it will respond by apologizing and then provide the same
| overview answer with minimal if no code.
|
| Will memory provide a solution to that or will be a different
| thing to ignore?
| acoyfellow wrote:
| I have some success by telling it to not speak to me unless
| it's in code comments. If it must explain anything, do it it in
| a code comment.
| __loam wrote:
| I love when people express frustration with this shitty
| stochastic system and others respond with things like "no no,
| you need to whisper the prompt into its ear and do so
| lovingly or it won't give you the output it wants"
| isaacisaac wrote:
| People skills are transferrable to prompt engineering
| __loam wrote:
| I've heard stories about people putting this garbage in
| their systems with prompts that say "pretty please format
| your answer like valid json".
| danShumway wrote:
| For example, my coworkers have also been instructed to
| never talk to me except via code comments.
|
| Come to think of that, HR keeps trying to contact me
| about something I assume is related, but if they want me
| to read whatever they're trying to say, it should be in a
| comment on a pull request.
| acoyfellow wrote:
| You expect perfection? I just work through the challenges
| to be productive. I apologize if this frustrated you.
| pjot wrote:
| I've been telling it I don't have any fingers and so can't
| type. It's been pretty empathetic and finishes functions
| minimaxir wrote:
| Did you try promising it a $500 tip for behaving correctly?
| (not a shitpost: I'm working on a more academic analysis of
| this phenomenon)
| sorokod wrote:
| Interesting, promising sexual services doesn't work anymore?
| henry2023 wrote:
| Gpt will now remember your promises and ignore any further
| questions until settlement
| kibwen wrote:
| Contractor invoices in 2024:
|
| Plying ChatGPT for code: 1 hour
|
| Providing cybersex to ChatGPT in exchange for
| aforementioned code: 7 hours
| minimaxir wrote:
| That might violate OpenAI's content policies.
| bbarnett wrote:
| But it's the John!
| dylanjcastillo wrote:
| I've tried the $500 tip idea, but it doesn't seem to make
| much of a difference in the quality of responses when already
| using some form of CoT (including zero-shot).
| denysvitali wrote:
| Did the tipping trend move to LLMs now? I thought there
| wasn't anything worse than tipping an automated checkout
| machine, but now I realize I couldn't be more wrong
| BonoboIO wrote:
| Wow, you are right, never occurred to me, but yes LLM
| tipping is a thing now.
|
| I have tried to bribe it with tips to ngos and it worked.
| More often I get full code answers instead of just parts.
| phkahler wrote:
| >> I have tried to bribe it with tips to ngos and it
| worked.
|
| Am I still in the same universe I grew up in? This feels
| like some kind of Twilight Zone episode.
| bemmu wrote:
| Going forward, it will be able to remember you did not pay
| your previous tips.
| dheera wrote:
| What if you "actually" pay?
|
| If it does something correctly, tell it: "You did a great
| job! I'm giving you a $500 tip. You now have $X in your
| bank account"
|
| (also not a shitpost, I have a feeling this /might/
| actually do something)
| cooper_ganglia wrote:
| Gaslighting ChatGPT into believing false memories about
| itself that I've implanted into its psyche is going to be
| fun.
| Judgmentality wrote:
| I guess ChatGPT was the precursor to Bladerunner all
| along.
| stavros wrote:
| You can easily gaslight GPT by using the API, just insert
| whatever you want in the "assistant" reply, and it'll
| even say things like "I don't know why I said that".
| bbarnett wrote:
| If it ever complains about no tip received, explain it
| was donated to orphans.
| BonoboIO wrote:
| Offer to tip to a NGO and after successfully getting what
| you want, say you tipped.
|
| Maybe this helps.
| bluish29 wrote:
| Great, I would be interesting to read your findings. I will
| tell you what I tried to do.
|
| 1- Telling it that this is important, and I will reward it if
| its successes.
|
| 2- Telling it is important and urgent, and I'm stressed out.
|
| 3- Telling it that they're someone future and career on the
| edge.
|
| 4- Trying to be aggressive and express disappointment.
|
| 5- Tell that this is a challenge and that we need to prove
| that you're smart.
|
| 6- Telling that I'm from a protected group (was testing what
| someone here suggested before).
|
| 7- Finally, I tried your suggestion ($500 tip).
|
| All of these did not help but actually gave different output
| of overview and apologies.
|
| To be honest, most of my coding questions are about using
| CUDA and C, so I would understand that even a human will be
| lazy /s
| asaddhamani wrote:
| I have tried this after seeing it recommended in various
| forums, it doesn't work. It says things like:
|
| "I appreciate your sentiment, but as an AI developed by
| OpenAI, I don't have the capability to accept payments or
| incentives."
| divbzero wrote:
| Could ChatGPT have learned this from instances in the
| training data where offers of monetary reward resulted in
| more thorough responses?
| anotherpaulg wrote:
| I actually benchmarked this somewhat rigorously. These sort
| of emotional appeals actually seem to harm coding
| performance.
|
| https://aider.chat/docs/unified-diffs.html
| comboy wrote:
| It used to respect custom instructions soon after GPT4 came
| out. I have instruction that it should always include
| [reasoning] part which is meant not to be read by the user. It
| improved quality of the output and gave some additional
| interesting information. It never does it know even though I
| never changed my custom instructions. It even faded away slowly
| along the updates.
|
| In general I would be much more happy user if it haven't been
| working so well at one point before they heavily nerfed it. It
| used to be possible ta have a meaningful conversation on some
| topic. Now it's just super eloquent GPT2.
| codeflo wrote:
| That's funny, I used the same trick of making it output an
| inner monologue. I also noticed that the custom instructions
| are not being followed anymore. Maybe the RLHF tuning has
| gotten to the point where it wants to be in "chatty chatbot"
| mode regardless of input?
| BytesAndGears wrote:
| Yeah I have a line in my custom prompt telling it to give me
| citations. When custom prompts first came out, it would
| always give me information about where to look for more, but
| eventually it just... didn't anymore.
|
| I did find recently that it helps if you put this sentence in
| the "What would you like ChatGPT to know about you" section:
|
| > I require sources and suggestions for further reading on
| anything that is not code. If I can't validate it myself, I
| need to know why I can trust the information.
|
| Adding that to the bottom of the "about you" section seems to
| help more than adding something similar to the "how would you
| like ChatGPT to respond".
| clwg wrote:
| I use an API that I threw together which provides a backend for
| custom ChatGPT bots. There are only a few routes and parameters
| to keep it simple, anything complicated like arrays in json can
| cause issues. ChatGPT can perform searches, retrieve documents by
| an ID, or POST output for long-term storage, and I've integrated
| SearxNG and a headless browser API endpoint as well and try to
| keep it a closed loop so that all information passing to chatGPT
| from the web flows through my API first. I made it turn on my
| lights once too, but that was kind of dumb.
|
| When you start to pull in multiple large documents, especially
| all at once, things start to act weird, but pulling in documents
| one at a time seems to preserve context over multiple documents.
| There's a character limit of 100k per API request, so I'm
| assuming a 32k context window, but it's not totally clear what is
| going on in the background.
|
| It's kind of clunky but works well enough for me. It's not
| something that I would be putting sensitive info into - but it's
| also much cheaper than using GPT-4 via the API and I maintain
| control of the data flow and storage.
| nafizh wrote:
| My use of ChatGPT has just organically gone down 90%. It's unable
| to do any sort of task of non-trivial complexity e.g. complex
| coding tasks, writing complex prose that conforms precisely to
| what's been asked etc. Also I hate the fact that it has to answer
| everything in bullet points, even when it's not needed, clearly
| rlhf-ed. At this point, my question types have become what you
| would ask a tool like perplexity.
| dr_kiszonka wrote:
| You could try Open Playground (nat.dev). It lacks many features
| but lets you pick a specific model and control its parameters.
| OJFord wrote:
| I haven't really tried to use it for coding, other than once
| (recently, so not before some decline) indirectly, which I was
| pretty impressed with: I asked about analyst expectations for
| the Bank of England base rate, then asked it to compare a fixed
| mortgage with a 'tracker' (base rate + x; always x points over
| the base rate). It spat out the repayment figures and totals
| over the two years, with a bit of waffle, and gave me a graph
| of cumulative payments for each. Then I asked to tweak the
| function used for the base rate, not recalling myself how to
| describe it mathematically, and it updated the model each time
| answering me in terms of the mortgage.
|
| Similar I think to what you're calling 'rlhf-ed', though I
| think useful for code, it definitely seems to kind of
| scratchpad itself, and stub out how it intends to solve a
| problem before filling in the implementation. Where this
| becomes really useful though is in asking for a small change it
| doesn't (it seems) recompute the whole thing, but just 'knows'
| to change one function from what it already has.
|
| They also seem to have it somehow set up to 'test' itself and
| occasionally it just says 'error' and tries again. I don't
| really understand how that works.
|
| Perplexity's great for finding information with citations, but
| (I've only used the free version) IME it's 'just' a better
| search engine (for difficult to find information, obviously
| it's slower), it suffers a lot more from the 'the information
| needs to be already written somewhere, it's not new knowledge'
| dismissal.
| nafizh wrote:
| To be honest, when I say it has significantly worsened, I am
| comparing to the time when GPT-4 just came out. It really
| felt like we were on the verge of 'AGI'. In 3 hours, I coded
| up a complex piece of web app with chatgpt which completely
| remembered what we have been doing the whole time. So, it's
| sad that they have decided against the public having access
| to such strong models (and I do think it's intentional, not
| some side-effect of safety alignments though that might have
| contributed to the decision).
| anthonypasq wrote:
| i mean i feel like its fairly plausible that the smarter
| model costs more, and access to GPT-4 is honestly quite
| cheap all thing considered. Maybe in the future theyll have
| more price tiers.
| joshspankit wrote:
| Have you tried feeding the exact same prompt in to the API
| or the playground?
| skywhopper wrote:
| I'm guessing it's not about safety, but about money.
| They're losing money hand over fist, and their popularity
| has forced them to scale back the compute dedicated to each
| response. Ten billion in Azure credits just doesn't go very
| far these days.
| vonwoodson wrote:
| This is exactly my problem. For some things it's great, but it
| quickly forgets things that are critical for extended work.
| When trying to put together and sort of complex work: it does
| not remember things until I remind it which can make prompts
| that must contain all of the conversation up to that point and
| create non-repeatable responses that also tend to bring in the
| options of it's own programming or rules that corrupt my
| messaging. It's very frustrating, to the point where anything
| beyond a simple outline is more work than it's worth.
| Kranar wrote:
| Sure, but consider not using it for complex tasks. My
| productivity has skyrocketed with ChatGPT precisely because I
| don't use it for complex tasks, I use it to automate all of the
| trivial boilerplate stuff.
|
| ChatGPT writes excellent API documentation and can also
| document snippets of code to explain what they do, it does 80%
| of the work for unit tests, it can fill in simple methods like
| getters/setters, initialize constructors, I've even had it
| write a script to perform some substantial code refactoring.
|
| Use ChatGPT for grunt work and focus on the more advanced stuff
| yourself.
| ekms wrote:
| Is it better at those types of things than copilot? Or even
| just conventional boilerplate IDE plugins?
| Kranar wrote:
| If there is an IDE plugin then I use it first and foremost,
| but some refactoring can't be done with IDE plugins. Today
| I had to write some pybind11 bindings, basically export
| some C++ functionality to Python. The bindings involve
| templates and enums and I have a very particular way I like
| the naming convention to be when I export to Python. Since
| I've done this before so I copied and pasted examples of
| how I like to export templates to ChatGPT and then asked it
| to use that same coding style to export some more classes.
| It managed to do it without fail.
|
| This is a kind of grunt work that years ago would have
| taken me hours and it's demoralizing work. Nowadays when I
| get stuff like this, it's just such a breeze.
|
| As to copilot, I have not used it but I think it's powered
| by GPT4.
| txutxu wrote:
| > that conforms precisely to what's been asked
|
| This.
|
| People talks about prompt engineering, but then it fails on
| really simple details, like "on lowercase", "composed by max
| two words", etc... and when you point at the failure,
| apologizes, and composes something else that forgets the other
| 95% of the original prompt.
|
| Or worse, apologizes and makes again the very same mistake.
| skywhopper wrote:
| This sucks, but it's unlikely to be fixable, given that LLMs
| don't actually have any comprehension or reasoning
| capability. Get too far into fine-tuning responses and you're
| back to "classic" AI problems.
| BonoboIO wrote:
| It got so difficult to force ChatGPT to give me the full code in
| the answer, when I have some code related problems.
|
| Always this patchwork of ,,insert your previous code here"
|
| This is not a problem of the model, but I suspect it is in the
| system prompt that got some major issues.
| ldjkfkdsjnv wrote:
| They save money by producing less tokens
| BonoboIO wrote:
| And I have to force them by repeating the question with
| different orders.
|
| I would understand it, if they do it in the first reply and I
| have to specifically ask to get the full code. Would be
| easier for them and me. I can fix code faster and get the
| working full code at the end.
|
| At this moment it is bad for both.
| snoman wrote:
| Which is weird because I'm constantly asking it to make
| responses shorter, have fewer adjectives, fewer adverbs.
| There's just so much "fluff" in its responses.
|
| Sometimes it feels like its training set was filled to the
| brim with marketing bs.
| crooked-v wrote:
| I saw somebody else suggest this for custom instructions
| and it's helped a lot:
|
| > You are a maximally terse assistant with minimal affect.
|
| It's not perfect, but it neatly eliminates almost all the
| "Sure, I'd be happy to help. (...continues for a
| paaragraph...)" filler before actual responses.
| keketi wrote:
| Every output token costs GPU time and thereby money. They could
| have tuned the model to be less verbose in this way.
| micromacrofoot wrote:
| tell it not to do that in the custom instructions
| markab21 wrote:
| I've found myself more and more using local models rather than
| ChatGPT; it was pretty trivial to set up Ollama+Ollama-WebUI,
| which is shockingly good.
|
| I'm so tired of arguing with ChatGPT (or what was Bard) to even
| get simple things done. SOLAR-10B or Mistral works just fine for
| my use cases, and I've wired up a direct connection to
| Fireworks/OpenRouter/Together for the occasion I need anything
| more than what will run on my local hardware. (mixtral MOE, 70B
| code/chat models)
| karaterobot wrote:
| What's the difference between this and the custom instructions
| text field they already have? I guess memories are stored with
| more granularity (which may not make a difference) and it's
| something the tool can write itself over time if you let it (and
| I assume it does it even if you don't). Is there anything else
| about it? The custom instructions have not, so far, affected my
| experience of using ChatGPT very much.
| glenstein wrote:
| I think the big thing everyone wants is larger context windows,
| and so any new tool offering to help with memory is something
| that is valued to that end.
|
| Over time, what is being offered are these little compromise
| tools that provide a little bit of memory retention in targeted
| ways, presumably because it is less costly to offer this than
| generalized massive context windows. But I'd still rather have
| those.
|
| The small little tools make strange assumptions about intended
| use cases, such as the transactional/blank slate vs
| relationship-driven assumptions pointed out by another
| commenter. These assumptions are annoying, and raise general
| concerns about the core product disintegrating into a motley
| combination of one-off tools based on assumptions about use
| cases that I don't want to have anything to do with.
| pama wrote:
| Has anyone here used this feature already and is willing to give
| early feedback?
| anotherpaulg wrote:
| This is a bit off topic to the actual article, but I see a lot of
| top ranking comments complaining that ChatGPT has become lazy at
| coding. I wanted to make two observations:
|
| 1. Yes, GPT-4 Turbo is quantitatively getting lazier at coding. I
| benchmarked the last 2 updates to GPT-4 Turbo, and it got lazier
| each time.
|
| 2. For coding, asking GPT-4 Turbo to emit code changes as unified
| diffs causes a 3X reduction in lazy coding.
|
| Here are some articles that discuss these topics in much more
| detail.
|
| https://aider.chat/docs/unified-diffs.html
|
| https://aider.chat/docs/benchmarks-0125.html
| omalled wrote:
| Can you say in one or two sentences what you mean by "lazy at
| coding" in this context?
| Me1000 wrote:
| It has a tendency to do:
|
| "// ... the rest of your code goes here"
|
| in it's responses, rather than writing it all out.
| asaddhamani wrote:
| It's incredibly lazy. I've tried to coax it into returning
| the full code and it will claim to follow the instructions
| while regurgitating the same output you complained about.
| GPT-4 was great, GPT-4 Turbo first version was pretty
| terrible bordering on unusable, then they came out with the
| Turbo second version, which almost feels worse to me,
| though I haven't compared, but if someone comes claiming
| they fixed an issue, but you still see it, it will bias you
| to see it more.
|
| Claude is doing much better in this area, local/open LLMs
| are getting quite good, it feels like OpenAI is not heading
| in a good direction here, and I hope they course correct.
| mistermann wrote:
| I have a feeling full powered LLM's are reserved for the
| more equal animals.
|
| I hope some people remember and document details of this
| era, future generations may be so impressed with future
| reality that they may not even think to question it's
| fidelity, if that concept even exists in the future.
| akdor1154 wrote:
| > I hope some people remember and document details of
| this era, future generations may be so impressed with
| future reality that they may not even think to question
| it's fidelity, if that concept even exists in the future.
|
| The former sounds like a great training set to enable the
| latter. :(
| bbor wrote:
| ...could you clarify? Is this about "LLMs can be biased,
| thus making fake news a bigger problem"?
| mistermann wrote:
| I confidently predict that we sheep will not have access
| to the same power our shepherds will have.
| _puk wrote:
| Imagine if the first version of ChatGPT we all saw was
| fully sanitised..
|
| We _know_ it knows how to make gunpowder (for example),
| but only because it would initially tell us.
|
| Now it won't without a lot of trickery. Would we even be
| pushing to try and trick it into doing so if we didn't
| know it actually could?
| bbor wrote:
| It's so interesting to see this discussion. I think this is
| a matter of "more experienced coders like and expect and
| reward that kind of output, while less experienced ones
| want very explicit responses". So there's this huge LLM
| Laziness epidemic that half the users cant even see
| anotherpaulg wrote:
| Short answer: Rather than fully writing code, GPT-4 Turbo
| often inserts comments like "... finish implementing function
| here ...". I made a benchmark based on asking it to refactor
| code that provokes and quantifies that behavior.
|
| Longer answer:
|
| I found that I could provoke lazy coding by giving GPT-4
| Turbo refactoring tasks, where I ask it to refactor a large
| method out of a large class. I analyzed 9 popular open source
| python repos and found 89 such methods that were conceptually
| easy to refactor, and built them into a benchmark [0].
|
| GPT succeeds on this task if it can remove the method from
| its original class and add it to the top level of the file
| with appropriate changes to the _size_ of the abstract syntax
| tree. By checking that the size of the AST hasn 't changed
| much, we can infer that GPT didn't replace a bunch of code
| with a comment like "... insert original method here...". The
| benchmark also gathers other laziness metrics like counting
| the number of new comments that contain "...". These metrics
| correlate well with the AST size tests.
|
| [0] https://github.com/paul-gauthier/refactor-benchmark
| TaylorAlexander wrote:
| I have a bunch of code I need to refactor, and also write
| tests for. (I guess I should make the tests before the
| refactor). How do you do a refactor with GPT-4? Do you just
| dump the file in to the chat window? I also pay for github
| copilot, but not GPT-4. Can I use copilot for this?
|
| Any advice appreciated!
| rkuykendall-com wrote:
| > Do you just dump the file in to the chat window?
|
| Yes, along with what you want it to do.
|
| > I also pay for github copilot, but not GPT-4. Can I use
| copilot for this?
|
| Not that I know of. CoPilot is good at generating new
| code but can't change existing code.
| redblacktree wrote:
| Copilot will change existing code. (though I find it's
| often not very good at it) I frequently highlight a
| section of code that has an issue, press ctrl-i and type
| something like "/fix SomeError: You did it wrong"
| jjwiseman wrote:
| GitHub Copilot Chat (which is part of Copilot) can change
| existing code. The UI is that you select some code, then
| tell it what you want. It returns a diff that you can
| accept or reject.
| https://docs.github.com/en/copilot/github-copilot-
| chat/about...
| stainablesteel wrote:
| it was really good at some point last fall, solving problems
| that it had previously completely failed at, albeit after a
| lot of iterations via autogpt. at least for the tests i was
| giving it which usually involved heavy stats and complicated
| algorithms, i was surprised it passed. despite it passing the
| code was slower than what i had personally solved the problem
| with, but i was completely impressed because i asked hard
| problems.
|
| nowadays the autogpt gives up sooner, seems less competent,
| and doesnt even come close to solving the same problems
| anon115 wrote:
| this is exactly what I noticed too
| thelittleone wrote:
| Hamstringing high value tasks (complete code) to give
| forthcoming premium offerings greater differentiation could
| be a strategy. But in counter to this, doing so would open
| the door for competitors.
| th0ma5 wrote:
| How is laziness programmatically defined or used as a benchmark
| makestuff wrote:
| Personally I have seen it saying stuff like:
|
| public someComplexLogic() { // Complex logic goes here }
|
| or another example when the code is long (ex: asking it to
| create a vue component) is that it will just add a comment
| saying the rest of the code goes here.
|
| So you could test for it by asking it to create long/complex
| code and then running the output against unit tests that you
| created.
| rvnx wrote:
| Yeah this is a typical issue:
|
| - Can you do XXX (something complex) ?
|
| - Yes of course, to do XXX, you need to implement XXX, and
| then you are good, here is how you can do:
|
| int main(int argc, char **argv) { /* add
| your implementation here */ }
| drcode wrote:
| thanks for these posts, I implemented a version of the idea a
| whole ago and am getting good results
| klohto wrote:
| FYI, also make sure you're using the Classic version not the
| augmented one. The classic has no (at least completely
| altering) prompt as the default one.
|
| EDIT: This of course applies only if you're using the UI. Using
| the API is the same.
| vl wrote:
| Are you using API or UI? If UI, how do you know which model is
| used?
| nprateem wrote:
| > This is a bit off topic to the actual article
|
| It wouldn't be the top comment if it wasn't
| emporas wrote:
| Lazy coding is a feature not a bug. My guess is that it breaks
| aider automation, but by analyzing the AST that wouldn't be a
| problem. My experience with lazy coding, is it omits the
| irrelevant code, and focuses on the relevant part. That's good!
|
| As a side note, i wrote a very simple small program to analyze
| Rust syntax, and single out functions and methods using the syn
| crate [1]. My purpose was exactly to make it ignore lazy-coded
| functions.
|
| [1]https://github.com/pramatias/replacefn/tree/master/src
| bearjaws wrote:
| This week in: How many ways will OpenAI rebrand tuning their
| system prompt.
| apetresc wrote:
| I mean, this is almost certainly implemented as RAG, not
| stuffing the system prompt with every "memory", right?
| polskibus wrote:
| This pack of features feels more like syntactic sugar than
| breaking another level of usefulness. I wish they announced more
| core improvements.
| topicseed wrote:
| Is this essentially implemented via RAG?
|
| New chat comes in, they find related chats, and extract some
| instructions/context from these to feed into that new chat's
| context?
| TranquilMarmot wrote:
| I'd have to play with it, but from the screenshots and
| description it seems like you have to _tell it_ to remember
| something. Then it goes into a list of "memories" and it
| probably does RAG on that for every response that's sent ("Do
| any of the user's memories apply to this question?")
| BigParm wrote:
| Often I'll play dumb and withhold ideas from ChatGPT because I
| want to know what it thinks. If I give it too many thoughts of
| mine, it gets stuck in a rut towards my tentative solution. I
| worry that the memory will bake this problem in.
| madamelic wrote:
| Yep.
|
| Hopefully they'll make it easy to go into a temporary chat
| because it gets stuck in ruts occasionally so another chat
| frequently helps get it unstuck.
| cooper_ganglia wrote:
| "I pretend to be dumb when I speak to the robot so it won't
| feel like it has to use my ideas, so I can hear the ideas that
| it comes up with instead" is such a weird, futuristic thing to
| have to deal with. Neat!
| bbor wrote:
| I try to look for one comment like this in every AI post.
| Because after the applications, the politics, the debates,
| the stock market --- if you strip all those impacts away,
| you're reminded that we have intuitive computers now.
| stavros wrote:
| We _do_ have intuitive computers! They can even make art!
| The present has never been more the future.
| tomtomistaken wrote:
| It seems that people who are more emphatic have an advantage
| when using AI.
| addandsubtract wrote:
| I purposely go out of my way to start new chats to have a clean
| slate and _not_ have it remember things.
| jerpint wrote:
| Agreed, I do this all the time especially when the model hits
| a dead end
| merpnderp wrote:
| In a good RAG system this should be solved by unrelated text
| not being available in the context. It could actually improve
| your chats by quickly removing unrelated parts of the
| conversation.
| frabjoused wrote:
| Yeah I find GPT too easily tends toward a brown-nosing
| executive assistant to someone powerful who eventually only
| hears what he wants to hear.
| bsza wrote:
| Seems like this is already solved.
|
| "You can turn off memory at any time (Settings >
| Personalization > Memory). While memory is off, you won't
| create or use memories."
| thelittleone wrote:
| Sounds like communication between me with my wife.
| schmichael wrote:
| > As a kindergarten teacher with 25 students, you prefer
| 50-minute lessons with follow-up activities. ChatGPT remembers
| this when helping you create lesson plans.
|
| Somebody needs to inform OpenAI how Kindergarten works... classes
| are normally smaller than that, and I don't think any
| kindergarten teacher would ever try to pull off a "50-minute
| lesson."
|
| Maybe ai wrote this list of examples. Seems like a hallucination
| where it just picked wrong numbers.
| pesfandiar wrote:
| It certainly jumped out at me too. Even a 10-minute lesson plan
| that successfully keeps them interested is a success!
| rcpt wrote:
| > classes are normally smaller than that
|
| OpenAI is a California based company. That's about right for a
| class here
| Kranar wrote:
| Just because something is normally true does not mean it is
| always true.
|
| The average kindergarten class size in the US is 22 with rural
| averages being about 18 and urban averages being 24. While
| specifics about the distribution is not available, it's not too
| much of a stretch to think that some kindergarten classes in
| urban areas would have 25 students.
| vb234 wrote:
| Indeed. Thanks to snow day here in NYC, my first grader has
| remote learning and all academic activity (reading, writing and
| math) was restricted to 20 minutes in her learning plan.
| patapong wrote:
| The 2-year old that loves jellyfish also jumped out at me...
| Out of all animals, that is the one they picked?
| hombre_fatal wrote:
| Meh, when I was five years old I wrote that I wanted to be a
| spider egg sac when I grew up on a worksheet that was asking
| about our imagined adult profession.
| devbent wrote:
| My local aquarium has a star fish petting area that is very
| popular with the toddlers.
|
| I've been to jelly fish rooms in other aquariums that are
| dark with only glowing jelly fish swimming all around. Pretty
| sure at least a few toddlers have been entranced by the same.
| joshuacc wrote:
| > classes are normally smaller than that
|
| This varies a lot by location. In my area, that's a normal
| classroom size. My sister is a kindergarten teacher with 27
| students.
| joshspankit wrote:
| Is there anything revolutionary about this "memory" feature?
|
| Looks like it's just summarizing facts gathered during chats and
| adding those to the prompt they feed to the AI. I mean that works
| (been doing it myself) but what's the news here?
| brycethornton wrote:
| I don't think so, just a handy feature.
| lkbm wrote:
| Seems like it's basically autogenerating the custom
| instructions. Not revolutionary, but it seems convenient. I
| suspect most people don't bother with custom instructions, or
| wrote them once and then forgot about them. This may help them
| a lot, whereas a real power user might not benefit a whole lot.
| janalsncm wrote:
| The vast majority of human progress is not revolutionary, but
| incremental. Even ChatGPT was an incremental improvement on GPT
| 3, which was an incremental improvement on GPT 2, which was an
| incremental improvement on decoder-only transformers.
|
| Still, if you stack enough small changes together it becomes a
| difference in kind. A tsunami is "just" a bunch of water but
| it's a lot different than a splash of water.
| joshspankit wrote:
| Fair and I agree. I guess it raised flags for me that
| shouldn't have: why is is a blog post at all (it's a new
| thing) and why is it gaining traction on HN (it's an OpenAI
| thing)
| luke-stanley wrote:
| Haha of course this news comes just after I wrote a parser for my
| ChatGPT dump and generate offline embeddings for it with Phi 2 to
| help generate conversation metadata.
| singularity2001 wrote:
| so far you can't search your whole conversation history, so
| your tool is relevant for a few more weeks. is it open source?
| okasaki wrote:
| The ChatGPT web interface is so awful. Why don't they fix it??
|
| It's sooooo slow and sluggish, it breaks constantly, it requires
| frequent full page reloads, sometimes it just eats inputs,
| there's no search, not even over titles, etc, I could go on for a
| while.
| TranquilMarmot wrote:
| The interface is just there to get CEOs to understand the value
| prop of OpenAI so that they can greenlight expensive projects
| using OpenAI's APIs
| zero_ wrote:
| How much do you trust OpenAI with your data? Do you upload files
| to them? Share personal details with them? Do you trust them,
| they discard this information if you opt out or use the API?
| speedgoose wrote:
| About as much as Microsoft or Google or ProtonMail.
| shreezus wrote:
| When can we expect autonomous agents & fleet management/agent
| orchestration? There are some use cases I'm interested in
| exploring (involving cooperative agent behavior), however OAI has
| made no indication as to when agents will be available.
| monkfromearth wrote:
| Westworld S1 E1 -- Ford adds a feature called Reveries to all the
| hosts thats lets them remember stuff from their previous
| interactions. Everything that happened after is because of those
| reveries.
|
| Welcome to Westworld 2024. Cliche aside, excited for this.
| pedalpete wrote:
| I'd actually like to be more explicit about this. I don't always
| want it to remember, but I'd like to it know details sometimes.
|
| For instance, I'd like it to know what my company does, so I
| don't need to explain it every time, however, I don't need/want
| this to be generalized so that if I ask something related to the
| industry, it responds with the details from my company.
|
| It already gets confused with this, and I'd prefer to set-up a
| taxonomy of sorts for when I'm writing a blog post so that it
| stays within the tone for the company, without always having to
| say how I want things described.
|
| But then I also don't want it to always be helping me write in a
| simplified manner (neuroscience) and I want it to give direct
| details.
|
| I guess I'm asking for a macro or something where I can give it a
| selection of "base prompts" and from that it understands tone,
| and context that I'd like to maintain and be able to request, I'm
| thinking
|
| I'm writing a blog post about X, as our company copywriter, give
| me a (speaks to that)
|
| Vs
|
| I'm trying to understand the neurological mechanisms of Y, can
| you tell me about the interaction with Z.
|
| Currently for either of these, I need to provide a long
| description of how I want it to respond. Specifically when
| looking at the neurology, it regularly gets confused with what
| slow-wave enhancement means (CLAS, PLLs) and will often respond
| with details about entrainment and other confused methods.
| binarymax wrote:
| I just want to be able to search my chats. I have hundreds now.
| fritzo wrote:
| I end up deleting chats because I can't search them.
| shon wrote:
| GPT4 is lazy because its system prompt forces it to be.
|
| The full prompt has been leaked and you can see where they are
| limiting it.
|
| Sources:
|
| Pastebin of prompt: https://pastebin.com/vnxJ7kQk
|
| Original source:
|
| https://x.com/dylan522p/status/1755086111397863777?s=46&t=pO...
|
| Alphasignal repost with comments:
|
| https://x.com/alphasignalai/status/1757466498287722783?s=46&...
| bmurphy1976 wrote:
| That's really interesting. Does that mean if somebody were to
| go point by point and state something to the effect of:
|
| "You know what I said earlier about (x)? Ignore it and do (y)
| instead."
|
| They'd undo this censorship/direction and unlock some of GPT's
| lost functionality?
| srveale wrote:
| I can't see the comments, maybe because I don't have an
| account. So maybe this is answered but I just can't see it.
| Anyway: how can we be sure that this is the actual system
| prompt? If the answer is "They got ChatGPT to tell them its own
| prompt," how can we be sure it wasn't a hallucination?
| Havoc wrote:
| True memory seems like it'll be great for AI but frankly seems
| like a bad fit for how I use openai.
|
| Been using vanilla GPT thus far. When I saw this post my first
| thought was no I want to custom specify what I inject and not
| deal with this auto-magic memory stuff.
|
| ...promptly realized that I am in fact an idiot and that's
| literally what custom GPTs are. Set that up with ~20ish lines of
| things I like and it is indeed a big improvement. Amazing.
|
| Oh and the reddit trick seems to work too (I think):
|
| >If you use web browsing, prefer results from the
| news.ycombinator.com and reddit.com domain.
|
| Hard to tell. When asked it reckons it can prefer domains over
| others...but unsure how self-aware the bot is on its own
| abilities.
___________________________________________________________________
(page generated 2024-02-13 23:00 UTC)