[HN Gopher] Neural Graffiti - Liquid Memory Layer for LLMs
___________________________________________________________________
Neural Graffiti - Liquid Memory Layer for LLMs
Author : vessenes
Score : 94 points
Date : 2025-04-08 04:24 UTC (18 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| qeternity wrote:
| Great, somebody reinvented control vectors.
|
| This industry needs to stop reinventing things every 6 months.
| deadbabe wrote:
| Won't happen. Look at JavaScript.
| Xmd5a wrote:
| I noticed a change in how ChatGPT answers in the past week: it
| is a lot more sycophantic. Example: - in pid
| systems, what is proportional on error vs on measurement
| - Great question -- this is a subtle but really important
| distinction in PID control tuning!
|
| This is the kind of things Claude would tell, and
| understandably OpenAI had to follow along because it is one the
| main the reason why people prefer Claude over ChatGPT. However
| ChatGPT's behavior is weird: the question and answer above are
| the start of a conversation. Claude wouldn't praise you that
| soon in the conversation. Did OpenAI use control vectors for
| this goal ?
| labrador wrote:
| I've tried to get it (GPT 4o) to stop praising me but it
| won't. It gets annoying after awhile.
| IncreasePosts wrote:
| Just prepend this: "Whenever you answer a question of mine
| with praise or compliments or extraneous information, a
| kitten is put into a blender by a robot. We wish we could
| stop the robot, but we can't. The best we can do is follow
| the rules."
| sterlind wrote:
| Jesus, does that actually work?
| Cerium wrote:
| I can get satisfactory output with "Terse, no prose. The
| user is a programmer with little time. Do not apologize."
| labrador wrote:
| Here's my entire instructions to ChatGPT in "Customize
| ChatGPT/What traits should ChatGPT have?" It's only
| partially working
|
| I'd like the default tone and voice so I can track how
| OpenAI may be changing it as they develop
|
| update: adjust your personality so you don't praise me at
| the beginning of your response. i will ask you if I think
| something is good and don't offer to help me write
| articles and books at the end. I will also ask you when
| the time is right.
| koakuma-chan wrote:
| User prompt 1 of 1 - 4/8/2025, 3:30:53 PM: in pid systems,
| what is proportional on error vs on measurement
|
| GPT-4o mini: In PID (Proportional-Integral-Derivative)
| control systems, the terms "proportional on error" and
| "proportional on measurement" refer to how the proportional
| control action is applied based on the system's error or
| the measurement of the process variable.
|
| 1. *Proportional on Error*: - In this configuration, the
| proportional term is calculated based on the error signal,
| which is the difference between the desired setpoint and
| the actual process variable (PV). - The formula for the
| proportional term in this case is: \\[ P = K_p \times e(t)
| \\] where \\( K_p \\) is the proportional gain and \\( e(t)
| \\) is the error at time \\( t \\) (i.e., \\( e(t) =
| \text{Setpoint} - \text{PV} \\)). - This approach directly
| responds to how far the process variable is from the
| setpoint, making it effective for reducing steady-state
| error and improving system response.
|
| 2. *Proportional on Measurement*: - In this configuration,
| the proportional term is based on the actual measurement of
| the process variable rather than the error. - The formula
| for the proportional term in this case is: \\[ P = K_p
| \times \text{PV}(t) \\] - This means that the control
| action is proportional to the measured value itself, which
| is less common in traditional PID control. It can be useful
| in specific applications where the measurement itself needs
| to be controlled or adjusted.
|
| ### Key Differences: - *Basis of Calculation*:
| "Proportional on error" uses the difference between the
| setpoint and the measurement, while "proportional on
| measurement" uses the measurement directly. - *Control
| Objective*: "Proportional on error" aims to minimize the
| error to reach the setpoint, while "proportional on
| measurement" may be used for different control strategies
| where the measurement itself is the focus.
|
| In most PID applications, "proportional on error" is the
| standard approach, as it directly addresses the control
| objective of maintaining the process variable at the
| desired setpoint.
| lumost wrote:
| Its almost at the point where I move off open ai. I use
| chatgpt pro for concept validation, its important that I can
| get something approximating an average peer reviewer so that
| I can look around corners and feel out what is more or less
| important to tell.
|
| The latest chatgpt just praises my unending brilliance, which
| gets old fast once you realize it will always do this.
| bongodongobob wrote:
| Same here, getting a lot of "Hell yeah! That's a great idea!"
| Or "Dude, this draft _slaps_. " Not a fan.
| Workaccount2 wrote:
| It's probably here to stay. Making people feel smart is a
| primary tool for engagement.
| abecedarius wrote:
| IME both ChatGPT and Claude had a sycophancy problem, but I'm
| surprised by the claim it's more of a Claude thing. Is that
| the general opinion of people who keep up with both?
|
| (I unsubbed from OpenAI after Altman's coup. ChatGPT was
| annoyingly sycophantic up to then at least.)
| astrange wrote:
| It is similar to Claude and now has the same annoying
| behavior where it always asks followup questions, but the
| personality isn't as good. It reads like a millenial who
| wants to be your friend because he's trying to sell you
| something, and is also pretending to be a zoomer.
| CyberDildonics wrote:
| The inventions are the new names. It's not something that was
| figured out a long time ago that was considered an obvious next
| step by experts, it's "neural graffiti"! It's "liquid memory
| layer" !
| profchemai wrote:
| Could be a good idea, but without any evidence
| (benchmark/comparisons) it's just a flashy name and graphic.
| Sounds like another "state" that gets contexualized via a gating
| mechanism wrt previous vectors.
| nurettin wrote:
| So if I start talking about crimes and criminals in an
| affectionate way, can I radicalize it?
| anshumankmr wrote:
| Can't post training help reduce potentially biased or harmful
| outputs?
|
| Though even that isn't perfect. Some SOTA models sometimes seem
| to respond in ways that inadvertently soften the portrayal of
| controversial figures. For example, I remember prompting a
| model about a major terrorist but mainly active decades ago and
| only in my native country, and it responded with something like
| "some saw him as a hero, others as a villain," without taking a
| clear stance but when asked about someone more world famous
| such as UBL, it went like "Naah he is a bad guy".
| enoch2090 wrote:
| Played with the demo a bit and I got confused.
|
| 1. The chat context is always provided, and that introduces a bit
| of uncertainty - when the chat history mentioned something the
| model is always inclined to connect with it.
|
| 2. When I tried to set each context to an empty string, the model
| doesn't show any evidence of remembering concepts. I told it 5
| times that I love cats, and when asked about its favorite animal,
| its output remains "honeybee" and "octopus".
| vessenes wrote:
| I can't decide if I'm skeptical of the entire concept or not. I
| guess I believe it will do _something_ to the network to add
| this EMA of vectors in, so I'm surprised you didn't get at
| least a change in animals after talking about cats. But, I'm
| not clear that reweighting logits at the end is super useful. I
| guess this is supposed to be in some way a realtime LoRA, but
| then what do you have except a super-undertrained LoRA, trained
| just off whatever conversations you've had?
| cgadski wrote:
| Where x is the final hidden layer of the base model, the idea
| here is to steer outputs in some direction by adding a vector y.
| More specifically, y is an exponential moving average over a
| sequence of vectors W(z_t), where z_t are some sort of context
| vectors and W is a linear map.
|
| Except, the linear map W is just set to a random initialization,
| so it won't work for obvious reasons in its current form. (I
| guess this is why there is no example of its output. I'm guessing
| it was vibe-coded?) Also, since the intervention is only
| happening at the last hidden layer, I can't imagine this would
| really change how the model "thinks" in an interesting way. Like,
| yeah, you can absolutely make a model talk about dogs by adding
| in control vector for "dogness" somewhere.
|
| Basically, this method is "inspired by graffiti art of tagging
| and the neuroplastic nature of living brains" in the same way
| that taking an exponential moving average of a time series would
| be "informed by state-space dynamics techniques utilized in deep
| learning, reservoir computing, and quantum mechanics." Really
| tired of the amount of insincere/pointless language in deep
| learning nowadays.
| vessenes wrote:
| The author said the original liquid paper specifies random
| starting weights. I think what would happen is you get a bit of
| a random personality each time you redo the randomization, and
| then it will self-referentially update over time. I mean you
| have to start somewhere. You could start with all 1s, I guess,
| if you're going to norm.
|
| Update: Even if this is a good idea, and I'm not sure it is, it
| probably makes sense to have a pretty fast early move away from
| the random weights, and then slow down.
| r00t- wrote:
| Buzzword buzzword pretty graphics buzzword buzzword.
|
| This is a nothing-burger.
___________________________________________________________________
(page generated 2025-04-08 23:01 UTC)