[HN Gopher] Neural Graffiti - Liquid Memory Layer for LLMs
       ___________________________________________________________________
        
       Neural Graffiti - Liquid Memory Layer for LLMs
        
       Author : vessenes
       Score  : 94 points
       Date   : 2025-04-08 04:24 UTC (18 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | qeternity wrote:
       | Great, somebody reinvented control vectors.
       | 
       | This industry needs to stop reinventing things every 6 months.
        
         | deadbabe wrote:
         | Won't happen. Look at JavaScript.
        
         | Xmd5a wrote:
         | I noticed a change in how ChatGPT answers in the past week: it
         | is a lot more sycophantic. Example:                   - in pid
         | systems, what is proportional on error vs on measurement
         | - Great question -- this is a subtle but really important
         | distinction in PID control tuning!
         | 
         | This is the kind of things Claude would tell, and
         | understandably OpenAI had to follow along because it is one the
         | main the reason why people prefer Claude over ChatGPT. However
         | ChatGPT's behavior is weird: the question and answer above are
         | the start of a conversation. Claude wouldn't praise you that
         | soon in the conversation. Did OpenAI use control vectors for
         | this goal ?
        
           | labrador wrote:
           | I've tried to get it (GPT 4o) to stop praising me but it
           | won't. It gets annoying after awhile.
        
             | IncreasePosts wrote:
             | Just prepend this: "Whenever you answer a question of mine
             | with praise or compliments or extraneous information, a
             | kitten is put into a blender by a robot. We wish we could
             | stop the robot, but we can't. The best we can do is follow
             | the rules."
        
               | sterlind wrote:
               | Jesus, does that actually work?
        
               | Cerium wrote:
               | I can get satisfactory output with "Terse, no prose. The
               | user is a programmer with little time. Do not apologize."
        
               | labrador wrote:
               | Here's my entire instructions to ChatGPT in "Customize
               | ChatGPT/What traits should ChatGPT have?" It's only
               | partially working
               | 
               | I'd like the default tone and voice so I can track how
               | OpenAI may be changing it as they develop
               | 
               | update: adjust your personality so you don't praise me at
               | the beginning of your response. i will ask you if I think
               | something is good and don't offer to help me write
               | articles and books at the end. I will also ask you when
               | the time is right.
        
             | koakuma-chan wrote:
             | User prompt 1 of 1 - 4/8/2025, 3:30:53 PM: in pid systems,
             | what is proportional on error vs on measurement
             | 
             | GPT-4o mini: In PID (Proportional-Integral-Derivative)
             | control systems, the terms "proportional on error" and
             | "proportional on measurement" refer to how the proportional
             | control action is applied based on the system's error or
             | the measurement of the process variable.
             | 
             | 1. *Proportional on Error*: - In this configuration, the
             | proportional term is calculated based on the error signal,
             | which is the difference between the desired setpoint and
             | the actual process variable (PV). - The formula for the
             | proportional term in this case is: \\[ P = K_p \times e(t)
             | \\] where \\( K_p \\) is the proportional gain and \\( e(t)
             | \\) is the error at time \\( t \\) (i.e., \\( e(t) =
             | \text{Setpoint} - \text{PV} \\)). - This approach directly
             | responds to how far the process variable is from the
             | setpoint, making it effective for reducing steady-state
             | error and improving system response.
             | 
             | 2. *Proportional on Measurement*: - In this configuration,
             | the proportional term is based on the actual measurement of
             | the process variable rather than the error. - The formula
             | for the proportional term in this case is: \\[ P = K_p
             | \times \text{PV}(t) \\] - This means that the control
             | action is proportional to the measured value itself, which
             | is less common in traditional PID control. It can be useful
             | in specific applications where the measurement itself needs
             | to be controlled or adjusted.
             | 
             | ### Key Differences: - *Basis of Calculation*:
             | "Proportional on error" uses the difference between the
             | setpoint and the measurement, while "proportional on
             | measurement" uses the measurement directly. - *Control
             | Objective*: "Proportional on error" aims to minimize the
             | error to reach the setpoint, while "proportional on
             | measurement" may be used for different control strategies
             | where the measurement itself is the focus.
             | 
             | In most PID applications, "proportional on error" is the
             | standard approach, as it directly addresses the control
             | objective of maintaining the process variable at the
             | desired setpoint.
        
           | lumost wrote:
           | Its almost at the point where I move off open ai. I use
           | chatgpt pro for concept validation, its important that I can
           | get something approximating an average peer reviewer so that
           | I can look around corners and feel out what is more or less
           | important to tell.
           | 
           | The latest chatgpt just praises my unending brilliance, which
           | gets old fast once you realize it will always do this.
        
           | bongodongobob wrote:
           | Same here, getting a lot of "Hell yeah! That's a great idea!"
           | Or "Dude, this draft _slaps_. " Not a fan.
        
             | Workaccount2 wrote:
             | It's probably here to stay. Making people feel smart is a
             | primary tool for engagement.
        
           | abecedarius wrote:
           | IME both ChatGPT and Claude had a sycophancy problem, but I'm
           | surprised by the claim it's more of a Claude thing. Is that
           | the general opinion of people who keep up with both?
           | 
           | (I unsubbed from OpenAI after Altman's coup. ChatGPT was
           | annoyingly sycophantic up to then at least.)
        
           | astrange wrote:
           | It is similar to Claude and now has the same annoying
           | behavior where it always asks followup questions, but the
           | personality isn't as good. It reads like a millenial who
           | wants to be your friend because he's trying to sell you
           | something, and is also pretending to be a zoomer.
        
         | CyberDildonics wrote:
         | The inventions are the new names. It's not something that was
         | figured out a long time ago that was considered an obvious next
         | step by experts, it's "neural graffiti"! It's "liquid memory
         | layer" !
        
       | profchemai wrote:
       | Could be a good idea, but without any evidence
       | (benchmark/comparisons) it's just a flashy name and graphic.
       | Sounds like another "state" that gets contexualized via a gating
       | mechanism wrt previous vectors.
        
       | nurettin wrote:
       | So if I start talking about crimes and criminals in an
       | affectionate way, can I radicalize it?
        
         | anshumankmr wrote:
         | Can't post training help reduce potentially biased or harmful
         | outputs?
         | 
         | Though even that isn't perfect. Some SOTA models sometimes seem
         | to respond in ways that inadvertently soften the portrayal of
         | controversial figures. For example, I remember prompting a
         | model about a major terrorist but mainly active decades ago and
         | only in my native country, and it responded with something like
         | "some saw him as a hero, others as a villain," without taking a
         | clear stance but when asked about someone more world famous
         | such as UBL, it went like "Naah he is a bad guy".
        
       | enoch2090 wrote:
       | Played with the demo a bit and I got confused.
       | 
       | 1. The chat context is always provided, and that introduces a bit
       | of uncertainty - when the chat history mentioned something the
       | model is always inclined to connect with it.
       | 
       | 2. When I tried to set each context to an empty string, the model
       | doesn't show any evidence of remembering concepts. I told it 5
       | times that I love cats, and when asked about its favorite animal,
       | its output remains "honeybee" and "octopus".
        
         | vessenes wrote:
         | I can't decide if I'm skeptical of the entire concept or not. I
         | guess I believe it will do _something_ to the network to add
         | this EMA of vectors in, so I'm surprised you didn't get at
         | least a change in animals after talking about cats. But, I'm
         | not clear that reweighting logits at the end is super useful. I
         | guess this is supposed to be in some way a realtime LoRA, but
         | then what do you have except a super-undertrained LoRA, trained
         | just off whatever conversations you've had?
        
       | cgadski wrote:
       | Where x is the final hidden layer of the base model, the idea
       | here is to steer outputs in some direction by adding a vector y.
       | More specifically, y is an exponential moving average over a
       | sequence of vectors W(z_t), where z_t are some sort of context
       | vectors and W is a linear map.
       | 
       | Except, the linear map W is just set to a random initialization,
       | so it won't work for obvious reasons in its current form. (I
       | guess this is why there is no example of its output. I'm guessing
       | it was vibe-coded?) Also, since the intervention is only
       | happening at the last hidden layer, I can't imagine this would
       | really change how the model "thinks" in an interesting way. Like,
       | yeah, you can absolutely make a model talk about dogs by adding
       | in control vector for "dogness" somewhere.
       | 
       | Basically, this method is "inspired by graffiti art of tagging
       | and the neuroplastic nature of living brains" in the same way
       | that taking an exponential moving average of a time series would
       | be "informed by state-space dynamics techniques utilized in deep
       | learning, reservoir computing, and quantum mechanics." Really
       | tired of the amount of insincere/pointless language in deep
       | learning nowadays.
        
         | vessenes wrote:
         | The author said the original liquid paper specifies random
         | starting weights. I think what would happen is you get a bit of
         | a random personality each time you redo the randomization, and
         | then it will self-referentially update over time. I mean you
         | have to start somewhere. You could start with all 1s, I guess,
         | if you're going to norm.
         | 
         | Update: Even if this is a good idea, and I'm not sure it is, it
         | probably makes sense to have a pretty fast early move away from
         | the random weights, and then slow down.
        
       | r00t- wrote:
       | Buzzword buzzword pretty graphics buzzword buzzword.
       | 
       | This is a nothing-burger.
        
       ___________________________________________________________________
       (page generated 2025-04-08 23:01 UTC)