[HN Gopher] Representation Engineering: Mistral-7B on Acid
___________________________________________________________________
Representation Engineering: Mistral-7B on Acid
Author : alexmolas
Score : 308 points
Date : 2024-02-17 23:26 UTC (23 hours ago)
(HTM) web link (vgel.me)
(TXT) w3m dump (vgel.me)
| batch12 wrote:
| Interesting, seems like control vectors could reduce the need to
| fine-tune a model.
| TOMDM wrote:
| Not only that, you can change the behavior of the model as
| needed. With 5 finetunes you need to host 5 copies or load and
| unload them.
|
| With control vectors you can modify the model as needed
| batch12 wrote:
| I think you could layer them too
| yberreby wrote:
| > With 5 finetunes you need to host 5 copies or load and
| unload them.
|
| If you use LoRA, which many do when fine-tuning nowadays, you
| don't need five full copies. You only need to store adapters,
| which can be in the tens of MBs range for a given finetune.
| sanxiyn wrote:
| You can also batch requests using different LoRAs. See
| "S-LoRA: Serving Thousands of Concurrent LoRA Adapters".
| https://arxiv.org/abs/2311.03285
| pamelafox wrote:
| Very interesting! Can you see those helping for RAG scenarios?
| Specifically:
|
| - decreasing models tendency to answer with ungrounded answers
|
| - increase models ability to respond with the correct syntax for
| citations- the open models like llama2 dont seem to obey my
| prompt's syntax instructions.
| batch12 wrote:
| For the second item, I've had luck using grammar to overcome
| this issue. The easiest one to implement I've seen so far is
| Microsoft's guidance-ai.
| sanxiyn wrote:
| You can use outlines https://github.com/outlines-dev/outlines
| to let models generate with correct syntax.
| pamelafox wrote:
| Thanks! I havent had to use a syntax-enforcing framework with
| gpt-35, I'll try outlines and guidance out to see if they
| help enforce syntax for the locally runnable models.
| turnsout wrote:
| Nice! The anti-jailbreaking angle is extremely interesting for
| those of us working on commercial applications.
| tudorw wrote:
| Nice, so, can I get a visual way to browse for potentially
| powerful control vectors :)
| simonw wrote:
| I'd never seen an LLM summarized like this before, and I really
| like it: hidden_state =
| self.embeddings(input_tokens) for layer in
| self.layers: hidden_state = layer(hidden_state)
| return transform_into_logits(hidden_state)
| rakejake wrote:
| I don't follow. Isn't this the flow for practically every
| neutral network i.e you index the sampled inputs from the
| embedding Matrix, forward this through every hidden layer and
| then finally transform to the dimensions of your tokens so that
| it can be interpreted as log-counts?
| simonw wrote:
| Yes, but I've never seen it expressed so clearly as
| pseudocode before.
| elcomet wrote:
| This is not specific to llms. So not really informative of
| how llms work. It also works for CNNs, LSTM, MLPs, or even
| any data processing program..
| sigmoid10 wrote:
| Not really. LSTM for example would require a recursive
| element where you update the hidden state and then pass
| it through the same layer again as you complete the
| output sequence. In fact the pseudocode shows very nicely
| how much simpler transformers are. And MLP is already a
| component in the transformer architecture.
| danieldk wrote:
| No? You could perfectly plug in an RNN or bidirectional
| RNN for _layer_. This is the pseudocode for applying
| multiple layers. It does not really matter what these
| layers are, transformer, RNN, convolution, dilated
| convolutions, etc. The recurrence happens within a layer,
| not between layers.
| alexmolas wrote:
| Isn't this the typical representation we used back then when
| working with LSTMs?
| sigmoid10 wrote:
| No, because LSTMs are recurrent. You couldn't use the same
| algorithm outlined here. Instead you'd have to iteratively
| pass elements of the sequence through the same layer over and
| over.
| danieldk wrote:
| You are confused. The recurrence is within a layer, not
| between layers. The algorithm shown is for applying a stack
| of layers, but it doesn't really matter what the layers
| are. You can do the same (and people have been doing the
| same) with RNNs, convolutional networks, etc.
|
| In reality it would typically be more complex for decoders,
| because you want to pass along a cache (such as a key-value
| cache in a transformer), add residual connections, etc.
| cobbal wrote:
| This article was very fun, and felt like a good counterpoint to
| the "You Sound Like a Bot" post recently that was talking about
| how AI is getting bland.
|
| On a less serious note. This sentence should be something a
| fiction writer knows will only end in trouble for humanity:
|
| > I especially challenge someone to find a "self-awareness"
| vector that isn't contaminated by ... human emotion!
| isoprophlex wrote:
| What a fantastic article, well done!
|
| > When used with the prompt below, the honesty vector doesn't
| change the model's behavior--instead, it changes the model's
| judgment of someone else's behavior! This is the same honesty
| vector as before--generated by asking the model to act honest or
| untruthful! [...] How do you explain this?
|
| Isn't the control vector just pushing text generation towards the
| concept of honesty/dishonesty? An LLM is 'just' a text generator,
| so you get added honesty/dishonesty irrespective of where in the
| bot/human conversation text generation is occuring?
| loa_in_ wrote:
| I agree. More sophisticated model might have two or more to
| follow narrating different characters... Which kind of brings a
| concept of character slots into the dimension space
| Der_Einzige wrote:
| Awhile ago, I wrote a snarky complaint about the fact that work
| like this didn't exist for far too long.
| https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...
| Dwedit wrote:
| Not to be confused with the other story from a month ago about
| giving LLMs "DRuGS".
| okwhateverdude wrote:
| But also not that far away from that method, too.
| holoduke wrote:
| Reminds me of the Westworld series in which they use these ipad
| like devices with sliders to change behavior of AIs. Little bit
| more humor, little bit more aggressive. Nice to these control
| options and its quick as well.
| hskalin wrote:
| Playing with LLMs like this always makes me feel like one of
| those Westworld engineers. Especially when I ask LLMs to
| roleplay. It also kind of freaks me out sometimes
| promiseofbeans wrote:
| Interstellar! "Set humor to 80%"
| vood wrote:
| This is very well written and entertaining post. I enjoyed
| reading it.
|
| Selfishly, would you mind sharing literature or blog posts that
| led you to this level of understanding of LLMs? I'm trying hard
| to understand the inner workings via experiments but definitely
| far behind your expertise.
|
| Thanks
| binsquare wrote:
| Very hopeful to see the future of accessing models with the
| ability to inject vectors by layer instead of just a straight
| prompt + existing parameters
| CuriouslyC wrote:
| LoRAs have been a thing for a while, I wouldn't be surprised to
| see their integration into openAI/mistral apis more. OpenAI
| fine tunes are so stupid expensive that they're pointless.
| WiSaGaN wrote:
| Great article. It was a joy to read. I have one question though:
| Why do we integrate the control vector across all layers of a
| neural network, rather than limiting its application to just the
| final layer or a subset of layers? Given that each vector
| influences every layer it passes through, resulting in a
| cumulative effect, isn't there a risk of excessively skewing the
| data representation?
| semi-extrinsic wrote:
| As the author stated in this post, it's not actually one
| vector, but a list of one vector per layer. If I understand it
| correctly, these vectors can have different total magnitude
| across the layers. If the PCA (or other technique) identifies
| that layers 17, 36 and 41 are important for "concept X", the
| vectors for those layers will be the strongest when repeng'ing
| for that concept.
| sigmoid10 wrote:
| The final layer will not encode high level concepts anymore,
| it's essentially just tokens from the vocabulary. It would be
| impossible to encode abstract things like "niceness" in it. As
| long as we don't know exactly at which layers this behaviour
| emerges, randomly choosing a subset also won't work. So what
| they did is apply a custom vector to _every_ layer and let PCA
| figure out which of these vectors are actually necessary.
| Curiously, looking at these vectors should also tell you more
| about where and how the model processes these things.
| benob wrote:
| This reminds me of bias tuning, a LoRA competitor. One can get
| decent adapters by only finetuning a vector added to each linear
| layer activations. I think I saw it first while reading [1] but
| there are other instances.
|
| [1] https://arxiv.org/pdf/2304.15010.pdf
| elcomet wrote:
| Please try to share abstract links instead of pdf links, for
| mobile or low connection readers.
| aspenmayer wrote:
| A fine suggestion. For you and others:
|
| https://arxiv.org/abs/2304.15010
|
| also available at:
|
| https://doi.org/10.48550/arXiv.2304.15010
| spangry wrote:
| Am I crazy for saying that I think the implications of this are
| monumental? It's entirely possible I just don't correctly
| understand how this works.
|
| Doesn't this mean that instead of interacting with a single
| global ChatGPT (or Bard) model, we'll istead find ourselves
| interacting with a personalised version since OpenAI can just
| store my individualised 'control vectors' (which alter ChatGPT's
| output to more closely match my individual preferences) and apply
| them at prompt-time? And doesn't this same logic flow through to
| personalisation of generative entertainment AI (e.g. my own
| personal, never-ending TV show where each episode is better than
| the last)?
|
| If the above is right then there will be powerful network effects
| at both the global and individual level in and across these
| markets, which means we'll eventually end up with a single mega-
| corp monopolising all of these markets simultaneously in the
| future?
|
| Add in individual biometric / biofeedback data from VR headsets
| and wearables, combined with personalised generative video
| entertainment, and I think we're in for a rather interesting
| future.
| moritzwarhier wrote:
| > And doesn't this same logic flow through to personalisation
| of generative entertainment AI (e.g. my own personal, never-
| ending TV show where each episode is better than the last)?
|
| I'm not sure I'm following the leap from convincing sentences
| to convincing video entertainment yet - but maybe we will end
| up there at some point, I guess?
|
| Infinite Jest (the 90s book) really was onto something with its
| McGuffin plot device:
|
| > These narratives are connected via a film, Infinite Jest,
| also called "the Entertainment" or "the samizdat". The film is
| so compelling that its viewers lose all interest in anything
| other than repeatedly viewing it, and thus eventually die.
|
| (Wikipedia)
|
| Some people might find references to this novel tiresome and
| don't think much of its author (RIP), but I still love it. It
| was one of the most immersive reads I've ever enjoyed.
|
| I'm glad to have read it when I was young (at the time it was
| just translated into German and kind of hyped because of DFWs
| death).
|
| Have never read anything like it since, and some passages
| grabbed me emotionally in a way that remembering the read feels
| like remembering an episode of my own life.
|
| Surely today I'd lack the patience and even by then I remember
| almost skipping one passage of the book that bored the hell out
| of me (Eschaton ball/war game, differential equations,
| something something...)
|
| But the rest of the book, the parts about substance addiction
| as well as consumerism, and the intangible atmosphere of the
| book, the characters, the vivid description of modern emotional
| pain and loneliness... it is really something else.
|
| Although said movie is only a plot device in the novel, it also
| sums up the core topics of the book in a neat idea / thought
| experiment.
|
| The whole complex of themes in this book seems very prophetic
| and apt looking at our modern society.
|
| A society that seems to be centered around addiction and greed
| more than ever before, and where politics begin feeling
| surreal, absurd, and more connected to media than to actual
| life.
| FL33TW00D wrote:
| Also the audio book narrated by Sean Pratt is truly excellent
| (I would recommend reading the book yourself first).
| spangry wrote:
| Sounds like a great book, I think you've sold me on buying a
| copy.
|
| Essentially I think there are three levels of positive
| network effects that will push us towards a future mega AI
| monopolist:
|
| - Single platform network effects: all the interactions
| people have with ChatGPT generate additional training data
| that Open AI can use to improve future versions, creating
| huge first mover advantage.
|
| - Individual-level network effects: Control vectors will make
| it feasible for Open AI to offer individualised ChatGPT
| tailored to individual preferences. The more you interact
| with ChatGPT, the better it adapts to your preferences.
|
| - Cross platform network effects: If Open AI offer a
| generative video entertainment service in future, they will
| be able to generate personalised prompts for this using my
| personalised ChatGPT weights. These network effects are
| compounded by multi-modal model cross domain learning - the
| generative text mode gets more skillful due to the video
| model improving (and vice versa). There's a Microsoft paper
| on this from about a year ago now.
|
| So, in the future scenario, let's assume ChatGPT is now the
| dominant monopolist 'text oracle / assistant AI' - because of
| the "human interaction / training data" network effects,
| ChatGPT is far and away the best assistant AI and getting
| better at a faster rate than any of its now tiny competitors
| (single platform network effects).
|
| You, and most other people you know, interact with ChatGPT
| many times a day now, because it's embedded in smartphones,
| Alexa-type devices, your car, even your robot vaccum cleaner.
| You just ask it stuff and it tells you the answer - or
| rather, the answer that you individually find the most
| pleasing as OpenAI keeps a database of 'individual control
| vectors' that essentially mean you have your own personal
| version of ChatGPT that exactly matches your preferences
| (individual network effects).
|
| Generative video entertainment is also offered by OpenAI -
| essentially you can get it to generate a new episode of your
| own personalised, never-ending TV show on demand. It's the
| best TV show you've ever seen because it's made just for you
| according to your exact inferred preferences.
|
| Sure, there are other personalised generative TV show
| offerings, but none can hold a candle to Open AI's offering.
| Why? Because OpenAI uses your individually customised ChatGPT
| model to generate the prompt for your TV episode generator
| service.
|
| Because you interact with ChatGPT so much, it knows exactly
| what your preferences are and so is way better at generating
| prompts that produce episodes you like. In fact, because you
| interact with ChatGPT multiple times throughout the day it is
| able to infer what your mood is like on that particular day
| and generate a video prompt that caters to that too.
|
| So you put on your Open AI VR glasses, barely even aware of
| the Open AI fitness tracker you have on your wrist, put your
| feet up (so your Open AI robot vacuum can work unobstructed)
| and you settle in to watch another episode of the best TV
| series you've ever seen.
|
| As you watch, your eye movements, heart rate, skin
| conductivity data etc. are all sent back to Open AI so the
| model can tell exactly how you are reacting to the video
| content it is generating at any given moment, and your
| individual control vectors are continuously updated.
|
| Some of this data (from all users) is then used to further
| train the base video generating AI model, since they've
| discovered that we all react fairly uniformly to certain
| audio-visual stimuli, so that can globally improve their
| generative model (more global network effects). But also they
| can update your individualised weights based on your
| individual idiosyncratic reactions to various stimuli.
| Consequently, every new episode of this endless TV show is
| better than the last - it just keeps getting better and
| better. It's a similar story when you listen to your Open AI
| personalised generative music stream while sitting in your
| driverless Open AI car on your way to work.
|
| The multiple levels of network effects are so strong that no-
| one can hope to compete with Open AI across these different
| AI modalities. They just keep expanding and expanding into
| adjacent markets, obliterating the competition simply by
| adding a new domain relevant modality to their monstrous
| multi-modal AI.
|
| Replace "Open AI" with "Facebook" or "Google" depending on
| who you think will win the AI mega platform war. Mark my
| words - these three companies will be creating new
| partnerships, releasing new products or just straight out
| acquiring companies in other related domains so they can
| gather more and more training data to feed to their multi-
| modal AI. In particular they'll move into markets where they
| can set up a interaction -> gather new training -> retrain
| model loop. Whoever takes the overall lead and doesn't
| squander it will end up leaving their competitors in the dust
| as they go on to monopolise market after market where they
| can create this loop.
|
| At that point I can't imagine true democracy surviving. We'll
| all still participate in the voting rituals, but we'll be
| voting for whichever party most suits the AI monopolist's
| interests since they can just globally update all control
| weights across all platforms to gently nudge us towards
| voting for their preferred party - comprehensive and
| personalised propaganda, that's impossible to detect, with
| the stroke of a table update.
|
| There can only be one!
| glenstein wrote:
| >which means we'll eventually end up with a single mega-corp
| monopolising all of these markets simultaneously in the future?
|
| I think you were right up until here. I think it's not
| necessarily the case that everything will be consolidated into
| control by a single mega corp. Not because it's impossible, but
| because that is the type of thing that is contingent on factors
| that could break one way or another, and what will control
| that, I think, is not some a priori general principle but some
| contingent facts that have not been settled yet. There are
| numerous participants in this space for now, the ideas use
| cases aren't quite fully mature just yet, so we'll have to see.
| rakejake wrote:
| Yes, with a control vector per user-persona pair.
|
| In the blog, they start with a fixed number of personas (happy,
| sad, baseline) and then use PCA to figure out the control
| vectors for each persona. You could easily do this for each
| distinct user-persona (provided you can come up with the data).
| mad0 wrote:
| A very non-technical take from my side, but those control vectors
| really remind me of hormones in humans. They modify large swathes
| of model behaviour at once.
|
| I give it 10 years before we see AI psychiatrists prescribe a
| happiness control vector supplementation for your pet assistant.
| moffkalast wrote:
| Yeah feels like some humans could use a temperature slider as
| well.
| wruza wrote:
| Some humans could use a better life with no wars, constant
| rise of basic life costs or feeling that they are just a
| tool. This subsystem regulates relationships in a group, it
| wasn't invented for funny yelling at each other.
| moffkalast wrote:
| Well yes, but those things are external and would be part
| of the context as it were.
|
| <|im_start|>system
|
| You are satisfied with your life. Wars and life costs do
| not bother you. You are loved and a valued member of
| society.<|im_end|>
|
| If only it were that simple for us.
| ben_w wrote:
| The puzzle at the end sounds very human. The more dishonest see
| dishonesty in more places, even if they see something that isn't
| there.
|
| More broadly, I notice more of whatever I'm focusing on.
|
| > OK, now that you're locked in, here's a weird example. When
| used with the prompt below, the honesty vector doesn't change the
| model's behavior--instead, it changes the model's judgment of
| someone else's behavior! This is the same honesty vector as
| before--generated by asking the model to act honest or
| untruthful!
| spangry wrote:
| We assume others think the way we do - in other words we
| project. Makes sense - the only mental model I know of is my
| own so when I try to approximate someone else's mental model
| I'm just fine-tuning my 'base' mental model with information I
| know about the other person.
|
| I wonder if this is the basis of empathy - if I can train more
| accurate 'fine-tuned' models in my brain I should have greater
| capacity for empathy. Although there's undoubtably more to it
| than that, if the above is true you'd expect to see a positive
| correlation between empathy and intelligence.
| webmaven wrote:
| The other possible interpretation is that the 'dishonest' reply
| is simply a lie, in exactly the same way as "the sky is green".
| penjelly wrote:
| a big step towards making these systems less opaque
| webmaven wrote:
| Hmm. Is it possible to apply multiple vectors at the same time?
|
| Eg. Trippy and sad, honest and self-aware, lazy and creative,
| etc.
| thomashop wrote:
| Yes
| rgbrgb wrote:
| At first glance this looks very similar to just adding the
| contrastive prompts to the beginning of the system prompt to
| "prepare" the logits. What am I missing?
| 65a wrote:
| The inference side (adding something * something else) to every
| layer seems a lot like what happens with a LoRA? If so is it
| possible to encode a Control Vector as a LoRA for the purposes of
| using this with existing inference frameworks without too much
| trouble? Or is my understanding way off?
___________________________________________________________________
(page generated 2024-02-18 23:01 UTC)