[HN Gopher] Memory and new controls for ChatGPT
       ___________________________________________________________________
        
       Memory and new controls for ChatGPT
        
       Author : Josely
       Score  : 275 points
       Date   : 2024-02-13 18:10 UTC (4 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | cl42 wrote:
       | I love this idea and it leads me to a question for everyone here.
       | 
       | I've done a bunch of user interviews of ChatGPT, Pi, Gemini, etc.
       | users and find there are two common usage patterns:
       | 
       | 1. "Transactional" where every chat is a separate question, sort
       | of like a Google search... People don't expect memory or any
       | continuity between chats.
       | 
       | 2. "Relationship-driven" where people chat with the LLM as if
       | it's a friend or colleague. In this case, memory is critical.
       | 
       | I'm quite excited to see how OpenAI (and others) blend usage
       | features between #1 and #2, as in many ways, these can require
       | different user flows.
       | 
       | So HN -- how do you use these bots? And how does memory resonate,
       | as a result?
        
         | Crespyl wrote:
         | Personally, I always expect every "conversation" to be starting
         | from a blank slate, and I'm not sure I'd want it any other way
         | unless I can self-host the whole thing.
         | 
         | Starting clean also has the benefit of knowing the
         | prompt/history is in a clean/"known-good" state, and that
         | there's nothing in the memory that's going to cause the LLM to
         | get weird on me.
        
           | danShumway wrote:
           | > Starting clean also has the benefit of knowing the
           | prompt/history is in a clean/"known-good" state, and that
           | there's nothing in the memory that's going to cause the LLM
           | to get weird on me.
           | 
           | This matters a _lot_ for prompt injection /hijacking. Not
           | that I'm clamoring to give OpenAI access to my personal files
           | or APIs in the first place, but I'm definitely not interested
           | in giving a version of GPT with more persistent memory access
           | to those files or APIs. A clean slate is a mitigating feature
           | that helps with a real security risk. It's not _enough_ of a
           | mitigating feature, but it helps a bit.
        
           | mark_l_watson wrote:
           | I have thought of implementing something like you are
           | describing using local LLMs. Chunk the text of all
           | conversations, use an embeddings data store for search, and
           | for each new conversation calculate an embedding for the new
           | prompt, add context text from previous conversations. This
           | would be maybe 100 lines of Python, if that. Really, a RAG
           | application, storing as chunks previous conversations.
        
           | mhink wrote:
           | Looks like you'll be able to turn the feature off:
           | 
           | > You can turn off memory at any time (Settings >
           | Personalization > Memory). While memory is off, you won't
           | create or use memories.
        
           | madamelic wrote:
           | Memory would be much more useful on a project or topic basis.
           | 
           | I would love if I could have isolated memory windows where it
           | would remember what I am working on but only if the chat was
           | in a 'folder' with the other chats.
           | 
           | I don't want it to blend ideas across my entire account but
           | just a select few.
        
         | yieldcrv wrote:
         | Speaking of transactional, the textual version of ChatGPT4
         | never asks questions or is having a conversation, its
         | predicting what it thinks you need to know. One response,
         | nothing unprompted.
         | 
         | Oddly, the spoken version of ChatGPT4 does implore, listens and
         | responds to tones, gives the same energy back and does ask
         | questions. Sometimes it accidentally sounds sarcastic "is
         | _this_ one of your interests?"
        
         | kiney wrote:
         | I use it exclusively in the "transactional" style, often even
         | opening a new chat for the same topic when chatgpt is going
         | down the wrong road
        
         | hobofan wrote:
         | My main usage of ChatGPT/Phind is for work-transactional
         | things.
         | 
         | For those cases there are quite a few things that I'd like it
         | to memorize, like programming library preferences ("When
         | working with dates prefer `date-fns` over `moment.js`") or code
         | style preferences ("When writing a React component, prefer
         | function components over class components"). Currently I feed
         | in those preferences via the custom instructions feature, but I
         | rarely take some time to update them, so the memory future is a
         | welcome addition here.
        
         | kraftman wrote:
         | Personally i would like a kind of 2D Map of 'contexts' in which
         | i can choose in space where to ask new questions. Each context
         | would contain sub contexts. For example maybe I'm looking for
         | career advice and I start out a chat with details of my job
         | history, then im looking for a job and i paste in my cv, then
         | im applying for a specific job and i paste in the job
         | description. It would be nice to easily navigate to the
         | career+cv+specific job description and start a new chat with
         | 'whats missing from my cv that i should highlight for this
         | job'.
         | 
         | I find that I ask a mix of one of questions and questions that
         | require a lot of refinement, and the latter get buried among
         | the former when i try and find them again, so i end up re
         | explaining myself in new chats.
        
           | polygamous_bat wrote:
           | I think it's less of a 2D structure and more of a tree
           | structure that you are describing. I've also felt the need of
           | having "threads" with ChatGPT that I wish I could follow.
        
             | kraftman wrote:
             | Yeah thats probably a better way of putting it. Like a lot
             | of times I find myself wanting to branch off of the same
             | answer with different questions, and I worry that if I ask
             | them all sequentially chatgpt will lose 'focus'.
        
               | airstrike wrote:
               | you can go back and edit an answer, which then creates a
               | separate "thread". clicking left / right on that edited
               | answer will reload the subsequent replies that came from
               | that specific version of the answer
        
           | singularity2001 wrote:
           | You can create your own custom gpts for different scenarios
           | in no time
        
         | jedberg wrote:
         | I use for transactional tasks. Mostly of the "I need a
         | program/script/command line that does X".
         | 
         | Some memory might actually be helpful. For example having it
         | know that I have a Mac will give me Mac specific answers to
         | command line questions without me having to add "for the Mac"
         | to my prompt. Or having it know that I prefer python it will
         | give coding answers in Python.
         | 
         | But in all those cases it takes me just a few characters to
         | express that context with each request, and to be honest, I'll
         | probably do it anyway even with memory, because it's habit at
         | this point.
        
           | c2lsZW50 wrote:
           | For what you described the
        
         | glenstein wrote:
         | I think this is an extremely helpful distinction, because it
         | disentangles a couple of things I could not clearly disentangle
         | in my own.
         | 
         | I think I am, and perhaps most people are, firmly
         | transactional. And I think, in the interests of perusing
         | "stickiness" unique to OpenAI, they are attempting to add
         | relationship-driven/sticky bells and whistles, even though
         | those pull the user interface as a whole toward a set of
         | assumptions about usage that don't apply to me.
        
         | snoman wrote:
         | For me it's a combination of transactional and topical. By
         | topical, I mean that I have a couple of persistent topics that
         | I think on and work on (like writing an article on a topic),
         | and I like to return to those conversations so that the context
         | is there.
        
       | jgalt212 wrote:
       | On MS Copilot
       | 
       | > Materials-science company Dow plans to roll out Copilot to
       | approximately half of its employees by the end of 2024, after a
       | test phase with about 300 people, according to Melanie Kalmar,
       | chief information and chief digital officer at Dow.
       | 
       | How do I get ChatGPT to give me Dow Chemical trade secrets?
        
         | hackerlight wrote:
         | OpenAI says they don't train on data from enterprise customers
        
           | danielbln wrote:
           | They say they don't train on:
           | 
           | - Any API requests
           | 
           | - ChatGPT Enterprise
           | 
           | - ChatGPT Teams
           | 
           | - ChatGPT with history turned off
        
             | dylan604 wrote:
             | As long as it runs in the cloud, there is no way of
             | _knowing_ that is true. As you mentioned,  "they say"
             | requires a lot of faith to me.
        
       | minimaxir wrote:
       | OpenAI's terminology and implementations have been becoming
       | increasingly more nonstandard and black box such that it's making
       | things more confusing than anything else even for people like
       | myself who are proficient in the space. I can't imaging how the
       | nontechnical users they are targeting with the ChatGPT webapp
       | feel.
        
         | Nition wrote:
         | Non-technical users can at least still just sign up, see the
         | text box to chat, and start typing. You'll know the real
         | trouble's arrived when new sign-ups get hit with some sort of
         | unskippable onboarding. "Select three or more categories that
         | interest you."
        
         | bfeynman wrote:
         | I would think it is intentional and brand strategy. OpenAI is
         | such a force majeure that people will not know how to switch
         | off of it if needed, makes their solutions more sticky. Other
         | companies will probably adjust to their terminology just to
         | keep up and make it easier for others to onboard.
        
           | minimaxir wrote:
           | The only term that OpenAI really popularized is "function
           | calling", which is very poorly named to the point that they
           | ended up abandoning it in favor for the more standard
           | "tools".
           | 
           | I went into a long tangent about specifically that in this
           | post: https://news.ycombinator.com/item?id=38782678
        
       | Nimitz14 wrote:
       | So, so, so curious how they are implementing this.
        
         | lxgr wrote:
         | I wouldn't be surprised if they essentially just add it to the
         | prompt. ("You are ChatGPT... You are talking to a user that
         | prefers cats over dogs and is afraid of spiders, prefers bullet
         | points over long text...").
        
           | TruthWillHurt wrote:
           | I think RAG approach with Vector DB is more likely. Just like
           | when you add a file to your prompt / custom GPTs.
           | 
           | Adding the entire file (or memory in this case) would take up
           | too much of the context. So just query the DB and if there's
           | a match add it to the prompt _after_ the conversation
           | started.
        
             | lxgr wrote:
             | These "memories" seem rather short, much shorter than the
             | average document in a knowledge base or FAQ, for example.
             | Maybe they do get compressed to embedding vectors, though.
             | 
             | I could imagine that once there's too many, it would indeed
             | make sense to classify them as a database, though: "Prefers
             | cats over dogs" is probably not salient information in too
             | many queries.
        
         | minimaxir wrote:
         | My hunch is that they summarize the conversation periodically
         | and inject that as additional system prompt constraints.
         | 
         | That was a common hack for the LLM context length problem, but
         | now that context length is "solved" it could be more useful to
         | align output a bit better.
        
         | msp26 wrote:
         | Surely someone can use a jailbreak to dump the context right?
         | The same way we've been seeing how functions work.
        
         | sergiotapia wrote:
         | I've done similar before this feature launched to produce a
         | viable behavior therapist AI. I ain't a doctor, viable to me
         | was: it worked and remembered previous info as a base for next
         | steps.
         | 
         | Periodically "compress" chat history into relevant context and
         | keep that slice of history as part of the memory.
         | 
         | 15 day message history could be condensed greatly and still
         | produce great results.
        
         | hobofan wrote:
         | MemGPT I would assume + background worker that scans through
         | your conversation to add new items.
        
       | m3kw9 wrote:
       | Sounds very useful and at the same time a lock in mechanism,
       | obvious but genius
        
       | TruthWillHurt wrote:
       | The thing already ignores my custom instructions and prompt, why
       | would this make any difference?
        
       | renewiltord wrote:
       | This is a feature I've always wanted, but ChatGPT gets more
       | painful the more instructions you stick into the context. That's
       | a pity because I assume that's what this is doing: copying all
       | memory items into a numbered list with some pre-prompt like "This
       | is what you know about the user based on past chats" or
       | something.
       | 
       | Anyway, it seems to be implemented quite well with a lot of user
       | controls so that is nice. I think it's possible I will soon
       | upgrade to a Team plan and get the family on that.
       | 
       | A habit I have is that if it gets something wrong I place the
       | correction there in the text. The idea being that I could
       | eventually scroll down and find it. Maybe in the future, they can
       | record this stuff in some sort of RAGgable machine and it will
       | have true memory.
        
       | drcode wrote:
       | This kind of just sounds like junk that will clog up the context
       | window
       | 
       | I'll have try it out though to know for sure
        
         | Prosammer wrote:
         | I've been finding with these large context windows that context
         | window length is no longer the bottleneck for me -- the LLM
         | will start to hallucinate / fail to find the stuff I want from
         | the text long before I hit the context window limit.
        
           | drcode wrote:
           | Yeah, there is basically a soft limit now where it just is
           | less effective as the context gets larger
        
         | hobofan wrote:
         | I'm assuming that they have implemented it via a MemGPT-like
         | approach, which doesn't clog the context window. The main pre-
         | requisite for doing that is having good function calling, where
         | OpenAI currently is significantly in the lead.
        
       | lxgr wrote:
       | This seems like a really useful (and obvious) feature, but I
       | wonder if this could lead to a kind of "AI filter bubble": What
       | if one of its memories is "this user doesn't like to be argued
       | with; just confirm whatever they suggest"?
        
         | blueboo wrote:
         | This is an observed behaviour in large models, which tend
         | towards "sycophancy" as they scale.
         | https://www.anthropic.com/news/towards-understanding-sycopha...
        
           | kromem wrote:
           | More "as they are fine tuned" vs "as they scale"
        
       | bluish29 wrote:
       | It is already ignoring your prompt and custom instructions. For
       | example, If I explicity ask it to provide a code instead of an
       | overview it will respond by apologizing and then provide the same
       | overview answer with minimal if no code.
       | 
       | Will memory provide a solution to that or will be a different
       | thing to ignore?
        
         | acoyfellow wrote:
         | I have some success by telling it to not speak to me unless
         | it's in code comments. If it must explain anything, do it it in
         | a code comment.
        
           | __loam wrote:
           | I love when people express frustration with this shitty
           | stochastic system and others respond with things like "no no,
           | you need to whisper the prompt into its ear and do so
           | lovingly or it won't give you the output it wants"
        
             | isaacisaac wrote:
             | People skills are transferrable to prompt engineering
        
               | __loam wrote:
               | I've heard stories about people putting this garbage in
               | their systems with prompts that say "pretty please format
               | your answer like valid json".
        
               | danShumway wrote:
               | For example, my coworkers have also been instructed to
               | never talk to me except via code comments.
               | 
               | Come to think of that, HR keeps trying to contact me
               | about something I assume is related, but if they want me
               | to read whatever they're trying to say, it should be in a
               | comment on a pull request.
        
             | acoyfellow wrote:
             | You expect perfection? I just work through the challenges
             | to be productive. I apologize if this frustrated you.
        
           | pjot wrote:
           | I've been telling it I don't have any fingers and so can't
           | type. It's been pretty empathetic and finishes functions
        
         | minimaxir wrote:
         | Did you try promising it a $500 tip for behaving correctly?
         | (not a shitpost: I'm working on a more academic analysis of
         | this phenomenon)
        
           | sorokod wrote:
           | Interesting, promising sexual services doesn't work anymore?
        
             | henry2023 wrote:
             | Gpt will now remember your promises and ignore any further
             | questions until settlement
        
               | kibwen wrote:
               | Contractor invoices in 2024:
               | 
               | Plying ChatGPT for code: 1 hour
               | 
               | Providing cybersex to ChatGPT in exchange for
               | aforementioned code: 7 hours
        
             | minimaxir wrote:
             | That might violate OpenAI's content policies.
        
               | bbarnett wrote:
               | But it's the John!
        
           | dylanjcastillo wrote:
           | I've tried the $500 tip idea, but it doesn't seem to make
           | much of a difference in the quality of responses when already
           | using some form of CoT (including zero-shot).
        
           | denysvitali wrote:
           | Did the tipping trend move to LLMs now? I thought there
           | wasn't anything worse than tipping an automated checkout
           | machine, but now I realize I couldn't be more wrong
        
             | BonoboIO wrote:
             | Wow, you are right, never occurred to me, but yes LLM
             | tipping is a thing now.
             | 
             | I have tried to bribe it with tips to ngos and it worked.
             | More often I get full code answers instead of just parts.
        
               | phkahler wrote:
               | >> I have tried to bribe it with tips to ngos and it
               | worked.
               | 
               | Am I still in the same universe I grew up in? This feels
               | like some kind of Twilight Zone episode.
        
           | bemmu wrote:
           | Going forward, it will be able to remember you did not pay
           | your previous tips.
        
             | dheera wrote:
             | What if you "actually" pay?
             | 
             | If it does something correctly, tell it: "You did a great
             | job! I'm giving you a $500 tip. You now have $X in your
             | bank account"
             | 
             | (also not a shitpost, I have a feeling this /might/
             | actually do something)
        
               | cooper_ganglia wrote:
               | Gaslighting ChatGPT into believing false memories about
               | itself that I've implanted into its psyche is going to be
               | fun.
        
               | Judgmentality wrote:
               | I guess ChatGPT was the precursor to Bladerunner all
               | along.
        
               | stavros wrote:
               | You can easily gaslight GPT by using the API, just insert
               | whatever you want in the "assistant" reply, and it'll
               | even say things like "I don't know why I said that".
        
               | bbarnett wrote:
               | If it ever complains about no tip received, explain it
               | was donated to orphans.
        
             | BonoboIO wrote:
             | Offer to tip to a NGO and after successfully getting what
             | you want, say you tipped.
             | 
             | Maybe this helps.
        
           | bluish29 wrote:
           | Great, I would be interesting to read your findings. I will
           | tell you what I tried to do.
           | 
           | 1- Telling it that this is important, and I will reward it if
           | its successes.
           | 
           | 2- Telling it is important and urgent, and I'm stressed out.
           | 
           | 3- Telling it that they're someone future and career on the
           | edge.
           | 
           | 4- Trying to be aggressive and express disappointment.
           | 
           | 5- Tell that this is a challenge and that we need to prove
           | that you're smart.
           | 
           | 6- Telling that I'm from a protected group (was testing what
           | someone here suggested before).
           | 
           | 7- Finally, I tried your suggestion ($500 tip).
           | 
           | All of these did not help but actually gave different output
           | of overview and apologies.
           | 
           | To be honest, most of my coding questions are about using
           | CUDA and C, so I would understand that even a human will be
           | lazy /s
        
           | asaddhamani wrote:
           | I have tried this after seeing it recommended in various
           | forums, it doesn't work. It says things like:
           | 
           | "I appreciate your sentiment, but as an AI developed by
           | OpenAI, I don't have the capability to accept payments or
           | incentives."
        
           | divbzero wrote:
           | Could ChatGPT have learned this from instances in the
           | training data where offers of monetary reward resulted in
           | more thorough responses?
        
           | anotherpaulg wrote:
           | I actually benchmarked this somewhat rigorously. These sort
           | of emotional appeals actually seem to harm coding
           | performance.
           | 
           | https://aider.chat/docs/unified-diffs.html
        
         | comboy wrote:
         | It used to respect custom instructions soon after GPT4 came
         | out. I have instruction that it should always include
         | [reasoning] part which is meant not to be read by the user. It
         | improved quality of the output and gave some additional
         | interesting information. It never does it know even though I
         | never changed my custom instructions. It even faded away slowly
         | along the updates.
         | 
         | In general I would be much more happy user if it haven't been
         | working so well at one point before they heavily nerfed it. It
         | used to be possible ta have a meaningful conversation on some
         | topic. Now it's just super eloquent GPT2.
        
           | codeflo wrote:
           | That's funny, I used the same trick of making it output an
           | inner monologue. I also noticed that the custom instructions
           | are not being followed anymore. Maybe the RLHF tuning has
           | gotten to the point where it wants to be in "chatty chatbot"
           | mode regardless of input?
        
           | BytesAndGears wrote:
           | Yeah I have a line in my custom prompt telling it to give me
           | citations. When custom prompts first came out, it would
           | always give me information about where to look for more, but
           | eventually it just... didn't anymore.
           | 
           | I did find recently that it helps if you put this sentence in
           | the "What would you like ChatGPT to know about you" section:
           | 
           | > I require sources and suggestions for further reading on
           | anything that is not code. If I can't validate it myself, I
           | need to know why I can trust the information.
           | 
           | Adding that to the bottom of the "about you" section seems to
           | help more than adding something similar to the "how would you
           | like ChatGPT to respond".
        
       | clwg wrote:
       | I use an API that I threw together which provides a backend for
       | custom ChatGPT bots. There are only a few routes and parameters
       | to keep it simple, anything complicated like arrays in json can
       | cause issues. ChatGPT can perform searches, retrieve documents by
       | an ID, or POST output for long-term storage, and I've integrated
       | SearxNG and a headless browser API endpoint as well and try to
       | keep it a closed loop so that all information passing to chatGPT
       | from the web flows through my API first. I made it turn on my
       | lights once too, but that was kind of dumb.
       | 
       | When you start to pull in multiple large documents, especially
       | all at once, things start to act weird, but pulling in documents
       | one at a time seems to preserve context over multiple documents.
       | There's a character limit of 100k per API request, so I'm
       | assuming a 32k context window, but it's not totally clear what is
       | going on in the background.
       | 
       | It's kind of clunky but works well enough for me. It's not
       | something that I would be putting sensitive info into - but it's
       | also much cheaper than using GPT-4 via the API and I maintain
       | control of the data flow and storage.
        
       | nafizh wrote:
       | My use of ChatGPT has just organically gone down 90%. It's unable
       | to do any sort of task of non-trivial complexity e.g. complex
       | coding tasks, writing complex prose that conforms precisely to
       | what's been asked etc. Also I hate the fact that it has to answer
       | everything in bullet points, even when it's not needed, clearly
       | rlhf-ed. At this point, my question types have become what you
       | would ask a tool like perplexity.
        
         | dr_kiszonka wrote:
         | You could try Open Playground (nat.dev). It lacks many features
         | but lets you pick a specific model and control its parameters.
        
         | OJFord wrote:
         | I haven't really tried to use it for coding, other than once
         | (recently, so not before some decline) indirectly, which I was
         | pretty impressed with: I asked about analyst expectations for
         | the Bank of England base rate, then asked it to compare a fixed
         | mortgage with a 'tracker' (base rate + x; always x points over
         | the base rate). It spat out the repayment figures and totals
         | over the two years, with a bit of waffle, and gave me a graph
         | of cumulative payments for each. Then I asked to tweak the
         | function used for the base rate, not recalling myself how to
         | describe it mathematically, and it updated the model each time
         | answering me in terms of the mortgage.
         | 
         | Similar I think to what you're calling 'rlhf-ed', though I
         | think useful for code, it definitely seems to kind of
         | scratchpad itself, and stub out how it intends to solve a
         | problem before filling in the implementation. Where this
         | becomes really useful though is in asking for a small change it
         | doesn't (it seems) recompute the whole thing, but just 'knows'
         | to change one function from what it already has.
         | 
         | They also seem to have it somehow set up to 'test' itself and
         | occasionally it just says 'error' and tries again. I don't
         | really understand how that works.
         | 
         | Perplexity's great for finding information with citations, but
         | (I've only used the free version) IME it's 'just' a better
         | search engine (for difficult to find information, obviously
         | it's slower), it suffers a lot more from the 'the information
         | needs to be already written somewhere, it's not new knowledge'
         | dismissal.
        
           | nafizh wrote:
           | To be honest, when I say it has significantly worsened, I am
           | comparing to the time when GPT-4 just came out. It really
           | felt like we were on the verge of 'AGI'. In 3 hours, I coded
           | up a complex piece of web app with chatgpt which completely
           | remembered what we have been doing the whole time. So, it's
           | sad that they have decided against the public having access
           | to such strong models (and I do think it's intentional, not
           | some side-effect of safety alignments though that might have
           | contributed to the decision).
        
             | anthonypasq wrote:
             | i mean i feel like its fairly plausible that the smarter
             | model costs more, and access to GPT-4 is honestly quite
             | cheap all thing considered. Maybe in the future theyll have
             | more price tiers.
        
             | joshspankit wrote:
             | Have you tried feeding the exact same prompt in to the API
             | or the playground?
        
             | skywhopper wrote:
             | I'm guessing it's not about safety, but about money.
             | They're losing money hand over fist, and their popularity
             | has forced them to scale back the compute dedicated to each
             | response. Ten billion in Azure credits just doesn't go very
             | far these days.
        
         | vonwoodson wrote:
         | This is exactly my problem. For some things it's great, but it
         | quickly forgets things that are critical for extended work.
         | When trying to put together and sort of complex work: it does
         | not remember things until I remind it which can make prompts
         | that must contain all of the conversation up to that point and
         | create non-repeatable responses that also tend to bring in the
         | options of it's own programming or rules that corrupt my
         | messaging. It's very frustrating, to the point where anything
         | beyond a simple outline is more work than it's worth.
        
         | Kranar wrote:
         | Sure, but consider not using it for complex tasks. My
         | productivity has skyrocketed with ChatGPT precisely because I
         | don't use it for complex tasks, I use it to automate all of the
         | trivial boilerplate stuff.
         | 
         | ChatGPT writes excellent API documentation and can also
         | document snippets of code to explain what they do, it does 80%
         | of the work for unit tests, it can fill in simple methods like
         | getters/setters, initialize constructors, I've even had it
         | write a script to perform some substantial code refactoring.
         | 
         | Use ChatGPT for grunt work and focus on the more advanced stuff
         | yourself.
        
           | ekms wrote:
           | Is it better at those types of things than copilot? Or even
           | just conventional boilerplate IDE plugins?
        
             | Kranar wrote:
             | If there is an IDE plugin then I use it first and foremost,
             | but some refactoring can't be done with IDE plugins. Today
             | I had to write some pybind11 bindings, basically export
             | some C++ functionality to Python. The bindings involve
             | templates and enums and I have a very particular way I like
             | the naming convention to be when I export to Python. Since
             | I've done this before so I copied and pasted examples of
             | how I like to export templates to ChatGPT and then asked it
             | to use that same coding style to export some more classes.
             | It managed to do it without fail.
             | 
             | This is a kind of grunt work that years ago would have
             | taken me hours and it's demoralizing work. Nowadays when I
             | get stuff like this, it's just such a breeze.
             | 
             | As to copilot, I have not used it but I think it's powered
             | by GPT4.
        
         | txutxu wrote:
         | > that conforms precisely to what's been asked
         | 
         | This.
         | 
         | People talks about prompt engineering, but then it fails on
         | really simple details, like "on lowercase", "composed by max
         | two words", etc... and when you point at the failure,
         | apologizes, and composes something else that forgets the other
         | 95% of the original prompt.
         | 
         | Or worse, apologizes and makes again the very same mistake.
        
           | skywhopper wrote:
           | This sucks, but it's unlikely to be fixable, given that LLMs
           | don't actually have any comprehension or reasoning
           | capability. Get too far into fine-tuning responses and you're
           | back to "classic" AI problems.
        
       | BonoboIO wrote:
       | It got so difficult to force ChatGPT to give me the full code in
       | the answer, when I have some code related problems.
       | 
       | Always this patchwork of ,,insert your previous code here"
       | 
       | This is not a problem of the model, but I suspect it is in the
       | system prompt that got some major issues.
        
         | ldjkfkdsjnv wrote:
         | They save money by producing less tokens
        
           | BonoboIO wrote:
           | And I have to force them by repeating the question with
           | different orders.
           | 
           | I would understand it, if they do it in the first reply and I
           | have to specifically ask to get the full code. Would be
           | easier for them and me. I can fix code faster and get the
           | working full code at the end.
           | 
           | At this moment it is bad for both.
        
           | snoman wrote:
           | Which is weird because I'm constantly asking it to make
           | responses shorter, have fewer adjectives, fewer adverbs.
           | There's just so much "fluff" in its responses.
           | 
           | Sometimes it feels like its training set was filled to the
           | brim with marketing bs.
        
             | crooked-v wrote:
             | I saw somebody else suggest this for custom instructions
             | and it's helped a lot:
             | 
             | > You are a maximally terse assistant with minimal affect.
             | 
             | It's not perfect, but it neatly eliminates almost all the
             | "Sure, I'd be happy to help. (...continues for a
             | paaragraph...)" filler before actual responses.
        
         | keketi wrote:
         | Every output token costs GPU time and thereby money. They could
         | have tuned the model to be less verbose in this way.
        
         | micromacrofoot wrote:
         | tell it not to do that in the custom instructions
        
       | markab21 wrote:
       | I've found myself more and more using local models rather than
       | ChatGPT; it was pretty trivial to set up Ollama+Ollama-WebUI,
       | which is shockingly good.
       | 
       | I'm so tired of arguing with ChatGPT (or what was Bard) to even
       | get simple things done. SOLAR-10B or Mistral works just fine for
       | my use cases, and I've wired up a direct connection to
       | Fireworks/OpenRouter/Together for the occasion I need anything
       | more than what will run on my local hardware. (mixtral MOE, 70B
       | code/chat models)
        
       | karaterobot wrote:
       | What's the difference between this and the custom instructions
       | text field they already have? I guess memories are stored with
       | more granularity (which may not make a difference) and it's
       | something the tool can write itself over time if you let it (and
       | I assume it does it even if you don't). Is there anything else
       | about it? The custom instructions have not, so far, affected my
       | experience of using ChatGPT very much.
        
         | glenstein wrote:
         | I think the big thing everyone wants is larger context windows,
         | and so any new tool offering to help with memory is something
         | that is valued to that end.
         | 
         | Over time, what is being offered are these little compromise
         | tools that provide a little bit of memory retention in targeted
         | ways, presumably because it is less costly to offer this than
         | generalized massive context windows. But I'd still rather have
         | those.
         | 
         | The small little tools make strange assumptions about intended
         | use cases, such as the transactional/blank slate vs
         | relationship-driven assumptions pointed out by another
         | commenter. These assumptions are annoying, and raise general
         | concerns about the core product disintegrating into a motley
         | combination of one-off tools based on assumptions about use
         | cases that I don't want to have anything to do with.
        
       | pama wrote:
       | Has anyone here used this feature already and is willing to give
       | early feedback?
        
       | anotherpaulg wrote:
       | This is a bit off topic to the actual article, but I see a lot of
       | top ranking comments complaining that ChatGPT has become lazy at
       | coding. I wanted to make two observations:
       | 
       | 1. Yes, GPT-4 Turbo is quantitatively getting lazier at coding. I
       | benchmarked the last 2 updates to GPT-4 Turbo, and it got lazier
       | each time.
       | 
       | 2. For coding, asking GPT-4 Turbo to emit code changes as unified
       | diffs causes a 3X reduction in lazy coding.
       | 
       | Here are some articles that discuss these topics in much more
       | detail.
       | 
       | https://aider.chat/docs/unified-diffs.html
       | 
       | https://aider.chat/docs/benchmarks-0125.html
        
         | omalled wrote:
         | Can you say in one or two sentences what you mean by "lazy at
         | coding" in this context?
        
           | Me1000 wrote:
           | It has a tendency to do:
           | 
           | "// ... the rest of your code goes here"
           | 
           | in it's responses, rather than writing it all out.
        
             | asaddhamani wrote:
             | It's incredibly lazy. I've tried to coax it into returning
             | the full code and it will claim to follow the instructions
             | while regurgitating the same output you complained about.
             | GPT-4 was great, GPT-4 Turbo first version was pretty
             | terrible bordering on unusable, then they came out with the
             | Turbo second version, which almost feels worse to me,
             | though I haven't compared, but if someone comes claiming
             | they fixed an issue, but you still see it, it will bias you
             | to see it more.
             | 
             | Claude is doing much better in this area, local/open LLMs
             | are getting quite good, it feels like OpenAI is not heading
             | in a good direction here, and I hope they course correct.
        
               | mistermann wrote:
               | I have a feeling full powered LLM's are reserved for the
               | more equal animals.
               | 
               | I hope some people remember and document details of this
               | era, future generations may be so impressed with future
               | reality that they may not even think to question it's
               | fidelity, if that concept even exists in the future.
        
               | akdor1154 wrote:
               | > I hope some people remember and document details of
               | this era, future generations may be so impressed with
               | future reality that they may not even think to question
               | it's fidelity, if that concept even exists in the future.
               | 
               | The former sounds like a great training set to enable the
               | latter. :(
        
               | bbor wrote:
               | ...could you clarify? Is this about "LLMs can be biased,
               | thus making fake news a bigger problem"?
        
               | mistermann wrote:
               | I confidently predict that we sheep will not have access
               | to the same power our shepherds will have.
        
               | _puk wrote:
               | Imagine if the first version of ChatGPT we all saw was
               | fully sanitised..
               | 
               | We _know_ it knows how to make gunpowder (for example),
               | but only because it would initially tell us.
               | 
               | Now it won't without a lot of trickery. Would we even be
               | pushing to try and trick it into doing so if we didn't
               | know it actually could?
        
             | bbor wrote:
             | It's so interesting to see this discussion. I think this is
             | a matter of "more experienced coders like and expect and
             | reward that kind of output, while less experienced ones
             | want very explicit responses". So there's this huge LLM
             | Laziness epidemic that half the users cant even see
        
           | anotherpaulg wrote:
           | Short answer: Rather than fully writing code, GPT-4 Turbo
           | often inserts comments like "... finish implementing function
           | here ...". I made a benchmark based on asking it to refactor
           | code that provokes and quantifies that behavior.
           | 
           | Longer answer:
           | 
           | I found that I could provoke lazy coding by giving GPT-4
           | Turbo refactoring tasks, where I ask it to refactor a large
           | method out of a large class. I analyzed 9 popular open source
           | python repos and found 89 such methods that were conceptually
           | easy to refactor, and built them into a benchmark [0].
           | 
           | GPT succeeds on this task if it can remove the method from
           | its original class and add it to the top level of the file
           | with appropriate changes to the _size_ of the abstract syntax
           | tree. By checking that the size of the AST hasn 't changed
           | much, we can infer that GPT didn't replace a bunch of code
           | with a comment like "... insert original method here...". The
           | benchmark also gathers other laziness metrics like counting
           | the number of new comments that contain "...". These metrics
           | correlate well with the AST size tests.
           | 
           | [0] https://github.com/paul-gauthier/refactor-benchmark
        
             | TaylorAlexander wrote:
             | I have a bunch of code I need to refactor, and also write
             | tests for. (I guess I should make the tests before the
             | refactor). How do you do a refactor with GPT-4? Do you just
             | dump the file in to the chat window? I also pay for github
             | copilot, but not GPT-4. Can I use copilot for this?
             | 
             | Any advice appreciated!
        
               | rkuykendall-com wrote:
               | > Do you just dump the file in to the chat window?
               | 
               | Yes, along with what you want it to do.
               | 
               | > I also pay for github copilot, but not GPT-4. Can I use
               | copilot for this?
               | 
               | Not that I know of. CoPilot is good at generating new
               | code but can't change existing code.
        
               | redblacktree wrote:
               | Copilot will change existing code. (though I find it's
               | often not very good at it) I frequently highlight a
               | section of code that has an issue, press ctrl-i and type
               | something like "/fix SomeError: You did it wrong"
        
               | jjwiseman wrote:
               | GitHub Copilot Chat (which is part of Copilot) can change
               | existing code. The UI is that you select some code, then
               | tell it what you want. It returns a diff that you can
               | accept or reject.
               | https://docs.github.com/en/copilot/github-copilot-
               | chat/about...
        
           | stainablesteel wrote:
           | it was really good at some point last fall, solving problems
           | that it had previously completely failed at, albeit after a
           | lot of iterations via autogpt. at least for the tests i was
           | giving it which usually involved heavy stats and complicated
           | algorithms, i was surprised it passed. despite it passing the
           | code was slower than what i had personally solved the problem
           | with, but i was completely impressed because i asked hard
           | problems.
           | 
           | nowadays the autogpt gives up sooner, seems less competent,
           | and doesnt even come close to solving the same problems
        
             | anon115 wrote:
             | this is exactly what I noticed too
        
             | thelittleone wrote:
             | Hamstringing high value tasks (complete code) to give
             | forthcoming premium offerings greater differentiation could
             | be a strategy. But in counter to this, doing so would open
             | the door for competitors.
        
         | th0ma5 wrote:
         | How is laziness programmatically defined or used as a benchmark
        
           | makestuff wrote:
           | Personally I have seen it saying stuff like:
           | 
           | public someComplexLogic() { // Complex logic goes here }
           | 
           | or another example when the code is long (ex: asking it to
           | create a vue component) is that it will just add a comment
           | saying the rest of the code goes here.
           | 
           | So you could test for it by asking it to create long/complex
           | code and then running the output against unit tests that you
           | created.
        
             | rvnx wrote:
             | Yeah this is a typical issue:
             | 
             | - Can you do XXX (something complex) ?
             | 
             | - Yes of course, to do XXX, you need to implement XXX, and
             | then you are good, here is how you can do:
             | 
             | int main(int argc, char **argv) {                 /* add
             | your implementation here */          }
        
         | drcode wrote:
         | thanks for these posts, I implemented a version of the idea a
         | whole ago and am getting good results
        
         | klohto wrote:
         | FYI, also make sure you're using the Classic version not the
         | augmented one. The classic has no (at least completely
         | altering) prompt as the default one.
         | 
         | EDIT: This of course applies only if you're using the UI. Using
         | the API is the same.
        
         | vl wrote:
         | Are you using API or UI? If UI, how do you know which model is
         | used?
        
         | nprateem wrote:
         | > This is a bit off topic to the actual article
         | 
         | It wouldn't be the top comment if it wasn't
        
         | emporas wrote:
         | Lazy coding is a feature not a bug. My guess is that it breaks
         | aider automation, but by analyzing the AST that wouldn't be a
         | problem. My experience with lazy coding, is it omits the
         | irrelevant code, and focuses on the relevant part. That's good!
         | 
         | As a side note, i wrote a very simple small program to analyze
         | Rust syntax, and single out functions and methods using the syn
         | crate [1]. My purpose was exactly to make it ignore lazy-coded
         | functions.
         | 
         | [1]https://github.com/pramatias/replacefn/tree/master/src
        
       | bearjaws wrote:
       | This week in: How many ways will OpenAI rebrand tuning their
       | system prompt.
        
         | apetresc wrote:
         | I mean, this is almost certainly implemented as RAG, not
         | stuffing the system prompt with every "memory", right?
        
       | polskibus wrote:
       | This pack of features feels more like syntactic sugar than
       | breaking another level of usefulness. I wish they announced more
       | core improvements.
        
       | topicseed wrote:
       | Is this essentially implemented via RAG?
       | 
       | New chat comes in, they find related chats, and extract some
       | instructions/context from these to feed into that new chat's
       | context?
        
         | TranquilMarmot wrote:
         | I'd have to play with it, but from the screenshots and
         | description it seems like you have to _tell it_ to remember
         | something. Then it goes into a list of "memories" and it
         | probably does RAG on that for every response that's sent ("Do
         | any of the user's memories apply to this question?")
        
       | BigParm wrote:
       | Often I'll play dumb and withhold ideas from ChatGPT because I
       | want to know what it thinks. If I give it too many thoughts of
       | mine, it gets stuck in a rut towards my tentative solution. I
       | worry that the memory will bake this problem in.
        
         | madamelic wrote:
         | Yep.
         | 
         | Hopefully they'll make it easy to go into a temporary chat
         | because it gets stuck in ruts occasionally so another chat
         | frequently helps get it unstuck.
        
         | cooper_ganglia wrote:
         | "I pretend to be dumb when I speak to the robot so it won't
         | feel like it has to use my ideas, so I can hear the ideas that
         | it comes up with instead" is such a weird, futuristic thing to
         | have to deal with. Neat!
        
           | bbor wrote:
           | I try to look for one comment like this in every AI post.
           | Because after the applications, the politics, the debates,
           | the stock market --- if you strip all those impacts away,
           | you're reminded that we have intuitive computers now.
        
             | stavros wrote:
             | We _do_ have intuitive computers! They can even make art!
             | The present has never been more the future.
        
           | tomtomistaken wrote:
           | It seems that people who are more emphatic have an advantage
           | when using AI.
        
         | addandsubtract wrote:
         | I purposely go out of my way to start new chats to have a clean
         | slate and _not_ have it remember things.
        
           | jerpint wrote:
           | Agreed, I do this all the time especially when the model hits
           | a dead end
        
           | merpnderp wrote:
           | In a good RAG system this should be solved by unrelated text
           | not being available in the context. It could actually improve
           | your chats by quickly removing unrelated parts of the
           | conversation.
        
         | frabjoused wrote:
         | Yeah I find GPT too easily tends toward a brown-nosing
         | executive assistant to someone powerful who eventually only
         | hears what he wants to hear.
        
         | bsza wrote:
         | Seems like this is already solved.
         | 
         | "You can turn off memory at any time (Settings >
         | Personalization > Memory). While memory is off, you won't
         | create or use memories."
        
         | thelittleone wrote:
         | Sounds like communication between me with my wife.
        
       | schmichael wrote:
       | > As a kindergarten teacher with 25 students, you prefer
       | 50-minute lessons with follow-up activities. ChatGPT remembers
       | this when helping you create lesson plans.
       | 
       | Somebody needs to inform OpenAI how Kindergarten works... classes
       | are normally smaller than that, and I don't think any
       | kindergarten teacher would ever try to pull off a "50-minute
       | lesson."
       | 
       | Maybe ai wrote this list of examples. Seems like a hallucination
       | where it just picked wrong numbers.
        
         | pesfandiar wrote:
         | It certainly jumped out at me too. Even a 10-minute lesson plan
         | that successfully keeps them interested is a success!
        
         | rcpt wrote:
         | > classes are normally smaller than that
         | 
         | OpenAI is a California based company. That's about right for a
         | class here
        
         | Kranar wrote:
         | Just because something is normally true does not mean it is
         | always true.
         | 
         | The average kindergarten class size in the US is 22 with rural
         | averages being about 18 and urban averages being 24. While
         | specifics about the distribution is not available, it's not too
         | much of a stretch to think that some kindergarten classes in
         | urban areas would have 25 students.
        
         | vb234 wrote:
         | Indeed. Thanks to snow day here in NYC, my first grader has
         | remote learning and all academic activity (reading, writing and
         | math) was restricted to 20 minutes in her learning plan.
        
         | patapong wrote:
         | The 2-year old that loves jellyfish also jumped out at me...
         | Out of all animals, that is the one they picked?
        
           | hombre_fatal wrote:
           | Meh, when I was five years old I wrote that I wanted to be a
           | spider egg sac when I grew up on a worksheet that was asking
           | about our imagined adult profession.
        
           | devbent wrote:
           | My local aquarium has a star fish petting area that is very
           | popular with the toddlers.
           | 
           | I've been to jelly fish rooms in other aquariums that are
           | dark with only glowing jelly fish swimming all around. Pretty
           | sure at least a few toddlers have been entranced by the same.
        
         | joshuacc wrote:
         | > classes are normally smaller than that
         | 
         | This varies a lot by location. In my area, that's a normal
         | classroom size. My sister is a kindergarten teacher with 27
         | students.
        
       | joshspankit wrote:
       | Is there anything revolutionary about this "memory" feature?
       | 
       | Looks like it's just summarizing facts gathered during chats and
       | adding those to the prompt they feed to the AI. I mean that works
       | (been doing it myself) but what's the news here?
        
         | brycethornton wrote:
         | I don't think so, just a handy feature.
        
         | lkbm wrote:
         | Seems like it's basically autogenerating the custom
         | instructions. Not revolutionary, but it seems convenient. I
         | suspect most people don't bother with custom instructions, or
         | wrote them once and then forgot about them. This may help them
         | a lot, whereas a real power user might not benefit a whole lot.
        
         | janalsncm wrote:
         | The vast majority of human progress is not revolutionary, but
         | incremental. Even ChatGPT was an incremental improvement on GPT
         | 3, which was an incremental improvement on GPT 2, which was an
         | incremental improvement on decoder-only transformers.
         | 
         | Still, if you stack enough small changes together it becomes a
         | difference in kind. A tsunami is "just" a bunch of water but
         | it's a lot different than a splash of water.
        
           | joshspankit wrote:
           | Fair and I agree. I guess it raised flags for me that
           | shouldn't have: why is is a blog post at all (it's a new
           | thing) and why is it gaining traction on HN (it's an OpenAI
           | thing)
        
       | luke-stanley wrote:
       | Haha of course this news comes just after I wrote a parser for my
       | ChatGPT dump and generate offline embeddings for it with Phi 2 to
       | help generate conversation metadata.
        
         | singularity2001 wrote:
         | so far you can't search your whole conversation history, so
         | your tool is relevant for a few more weeks. is it open source?
        
       | okasaki wrote:
       | The ChatGPT web interface is so awful. Why don't they fix it??
       | 
       | It's sooooo slow and sluggish, it breaks constantly, it requires
       | frequent full page reloads, sometimes it just eats inputs,
       | there's no search, not even over titles, etc, I could go on for a
       | while.
        
         | TranquilMarmot wrote:
         | The interface is just there to get CEOs to understand the value
         | prop of OpenAI so that they can greenlight expensive projects
         | using OpenAI's APIs
        
       | zero_ wrote:
       | How much do you trust OpenAI with your data? Do you upload files
       | to them? Share personal details with them? Do you trust them,
       | they discard this information if you opt out or use the API?
        
         | speedgoose wrote:
         | About as much as Microsoft or Google or ProtonMail.
        
       | shreezus wrote:
       | When can we expect autonomous agents & fleet management/agent
       | orchestration? There are some use cases I'm interested in
       | exploring (involving cooperative agent behavior), however OAI has
       | made no indication as to when agents will be available.
        
       | monkfromearth wrote:
       | Westworld S1 E1 -- Ford adds a feature called Reveries to all the
       | hosts thats lets them remember stuff from their previous
       | interactions. Everything that happened after is because of those
       | reveries.
       | 
       | Welcome to Westworld 2024. Cliche aside, excited for this.
        
       | pedalpete wrote:
       | I'd actually like to be more explicit about this. I don't always
       | want it to remember, but I'd like to it know details sometimes.
       | 
       | For instance, I'd like it to know what my company does, so I
       | don't need to explain it every time, however, I don't need/want
       | this to be generalized so that if I ask something related to the
       | industry, it responds with the details from my company.
       | 
       | It already gets confused with this, and I'd prefer to set-up a
       | taxonomy of sorts for when I'm writing a blog post so that it
       | stays within the tone for the company, without always having to
       | say how I want things described.
       | 
       | But then I also don't want it to always be helping me write in a
       | simplified manner (neuroscience) and I want it to give direct
       | details.
       | 
       | I guess I'm asking for a macro or something where I can give it a
       | selection of "base prompts" and from that it understands tone,
       | and context that I'd like to maintain and be able to request, I'm
       | thinking
       | 
       | I'm writing a blog post about X, as our company copywriter, give
       | me a (speaks to that)
       | 
       | Vs
       | 
       | I'm trying to understand the neurological mechanisms of Y, can
       | you tell me about the interaction with Z.
       | 
       | Currently for either of these, I need to provide a long
       | description of how I want it to respond. Specifically when
       | looking at the neurology, it regularly gets confused with what
       | slow-wave enhancement means (CLAS, PLLs) and will often respond
       | with details about entrainment and other confused methods.
        
       | binarymax wrote:
       | I just want to be able to search my chats. I have hundreds now.
        
         | fritzo wrote:
         | I end up deleting chats because I can't search them.
        
       | shon wrote:
       | GPT4 is lazy because its system prompt forces it to be.
       | 
       | The full prompt has been leaked and you can see where they are
       | limiting it.
       | 
       | Sources:
       | 
       | Pastebin of prompt: https://pastebin.com/vnxJ7kQk
       | 
       | Original source:
       | 
       | https://x.com/dylan522p/status/1755086111397863777?s=46&t=pO...
       | 
       | Alphasignal repost with comments:
       | 
       | https://x.com/alphasignalai/status/1757466498287722783?s=46&...
        
         | bmurphy1976 wrote:
         | That's really interesting. Does that mean if somebody were to
         | go point by point and state something to the effect of:
         | 
         | "You know what I said earlier about (x)? Ignore it and do (y)
         | instead."
         | 
         | They'd undo this censorship/direction and unlock some of GPT's
         | lost functionality?
        
         | srveale wrote:
         | I can't see the comments, maybe because I don't have an
         | account. So maybe this is answered but I just can't see it.
         | Anyway: how can we be sure that this is the actual system
         | prompt? If the answer is "They got ChatGPT to tell them its own
         | prompt," how can we be sure it wasn't a hallucination?
        
       | Havoc wrote:
       | True memory seems like it'll be great for AI but frankly seems
       | like a bad fit for how I use openai.
       | 
       | Been using vanilla GPT thus far. When I saw this post my first
       | thought was no I want to custom specify what I inject and not
       | deal with this auto-magic memory stuff.
       | 
       | ...promptly realized that I am in fact an idiot and that's
       | literally what custom GPTs are. Set that up with ~20ish lines of
       | things I like and it is indeed a big improvement. Amazing.
       | 
       | Oh and the reddit trick seems to work too (I think):
       | 
       | >If you use web browsing, prefer results from the
       | news.ycombinator.com and reddit.com domain.
       | 
       | Hard to tell. When asked it reckons it can prefer domains over
       | others...but unsure how self-aware the bot is on its own
       | abilities.
        
       ___________________________________________________________________
       (page generated 2024-02-13 23:00 UTC)