[HN Gopher] Representation Engineering: Mistral-7B on Acid
       ___________________________________________________________________
        
       Representation Engineering: Mistral-7B on Acid
        
       Author : alexmolas
       Score  : 308 points
       Date   : 2024-02-17 23:26 UTC (23 hours ago)
        
 (HTM) web link (vgel.me)
 (TXT) w3m dump (vgel.me)
        
       | batch12 wrote:
       | Interesting, seems like control vectors could reduce the need to
       | fine-tune a model.
        
         | TOMDM wrote:
         | Not only that, you can change the behavior of the model as
         | needed. With 5 finetunes you need to host 5 copies or load and
         | unload them.
         | 
         | With control vectors you can modify the model as needed
        
           | batch12 wrote:
           | I think you could layer them too
        
           | yberreby wrote:
           | > With 5 finetunes you need to host 5 copies or load and
           | unload them.
           | 
           | If you use LoRA, which many do when fine-tuning nowadays, you
           | don't need five full copies. You only need to store adapters,
           | which can be in the tens of MBs range for a given finetune.
        
             | sanxiyn wrote:
             | You can also batch requests using different LoRAs. See
             | "S-LoRA: Serving Thousands of Concurrent LoRA Adapters".
             | https://arxiv.org/abs/2311.03285
        
       | pamelafox wrote:
       | Very interesting! Can you see those helping for RAG scenarios?
       | Specifically:
       | 
       | - decreasing models tendency to answer with ungrounded answers
       | 
       | - increase models ability to respond with the correct syntax for
       | citations- the open models like llama2 dont seem to obey my
       | prompt's syntax instructions.
        
         | batch12 wrote:
         | For the second item, I've had luck using grammar to overcome
         | this issue. The easiest one to implement I've seen so far is
         | Microsoft's guidance-ai.
        
         | sanxiyn wrote:
         | You can use outlines https://github.com/outlines-dev/outlines
         | to let models generate with correct syntax.
        
           | pamelafox wrote:
           | Thanks! I havent had to use a syntax-enforcing framework with
           | gpt-35, I'll try outlines and guidance out to see if they
           | help enforce syntax for the locally runnable models.
        
       | turnsout wrote:
       | Nice! The anti-jailbreaking angle is extremely interesting for
       | those of us working on commercial applications.
        
       | tudorw wrote:
       | Nice, so, can I get a visual way to browse for potentially
       | powerful control vectors :)
        
       | simonw wrote:
       | I'd never seen an LLM summarized like this before, and I really
       | like it:                   hidden_state =
       | self.embeddings(input_tokens)              for layer in
       | self.layers:             hidden_state = layer(hidden_state)
       | return transform_into_logits(hidden_state)
        
         | rakejake wrote:
         | I don't follow. Isn't this the flow for practically every
         | neutral network i.e you index the sampled inputs from the
         | embedding Matrix, forward this through every hidden layer and
         | then finally transform to the dimensions of your tokens so that
         | it can be interpreted as log-counts?
        
           | simonw wrote:
           | Yes, but I've never seen it expressed so clearly as
           | pseudocode before.
        
             | elcomet wrote:
             | This is not specific to llms. So not really informative of
             | how llms work. It also works for CNNs, LSTM, MLPs, or even
             | any data processing program..
        
               | sigmoid10 wrote:
               | Not really. LSTM for example would require a recursive
               | element where you update the hidden state and then pass
               | it through the same layer again as you complete the
               | output sequence. In fact the pseudocode shows very nicely
               | how much simpler transformers are. And MLP is already a
               | component in the transformer architecture.
        
               | danieldk wrote:
               | No? You could perfectly plug in an RNN or bidirectional
               | RNN for _layer_. This is the pseudocode for applying
               | multiple layers. It does not really matter what these
               | layers are, transformer, RNN, convolution, dilated
               | convolutions, etc. The recurrence happens within a layer,
               | not between layers.
        
         | alexmolas wrote:
         | Isn't this the typical representation we used back then when
         | working with LSTMs?
        
           | sigmoid10 wrote:
           | No, because LSTMs are recurrent. You couldn't use the same
           | algorithm outlined here. Instead you'd have to iteratively
           | pass elements of the sequence through the same layer over and
           | over.
        
             | danieldk wrote:
             | You are confused. The recurrence is within a layer, not
             | between layers. The algorithm shown is for applying a stack
             | of layers, but it doesn't really matter what the layers
             | are. You can do the same (and people have been doing the
             | same) with RNNs, convolutional networks, etc.
             | 
             | In reality it would typically be more complex for decoders,
             | because you want to pass along a cache (such as a key-value
             | cache in a transformer), add residual connections, etc.
        
       | cobbal wrote:
       | This article was very fun, and felt like a good counterpoint to
       | the "You Sound Like a Bot" post recently that was talking about
       | how AI is getting bland.
       | 
       | On a less serious note. This sentence should be something a
       | fiction writer knows will only end in trouble for humanity:
       | 
       | > I especially challenge someone to find a "self-awareness"
       | vector that isn't contaminated by ... human emotion!
        
       | isoprophlex wrote:
       | What a fantastic article, well done!
       | 
       | > When used with the prompt below, the honesty vector doesn't
       | change the model's behavior--instead, it changes the model's
       | judgment of someone else's behavior! This is the same honesty
       | vector as before--generated by asking the model to act honest or
       | untruthful! [...] How do you explain this?
       | 
       | Isn't the control vector just pushing text generation towards the
       | concept of honesty/dishonesty? An LLM is 'just' a text generator,
       | so you get added honesty/dishonesty irrespective of where in the
       | bot/human conversation text generation is occuring?
        
         | loa_in_ wrote:
         | I agree. More sophisticated model might have two or more to
         | follow narrating different characters... Which kind of brings a
         | concept of character slots into the dimension space
        
       | Der_Einzige wrote:
       | Awhile ago, I wrote a snarky complaint about the fact that work
       | like this didn't exist for far too long.
       | https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...
        
       | Dwedit wrote:
       | Not to be confused with the other story from a month ago about
       | giving LLMs "DRuGS".
        
         | okwhateverdude wrote:
         | But also not that far away from that method, too.
        
       | holoduke wrote:
       | Reminds me of the Westworld series in which they use these ipad
       | like devices with sliders to change behavior of AIs. Little bit
       | more humor, little bit more aggressive. Nice to these control
       | options and its quick as well.
        
         | hskalin wrote:
         | Playing with LLMs like this always makes me feel like one of
         | those Westworld engineers. Especially when I ask LLMs to
         | roleplay. It also kind of freaks me out sometimes
        
         | promiseofbeans wrote:
         | Interstellar! "Set humor to 80%"
        
       | vood wrote:
       | This is very well written and entertaining post. I enjoyed
       | reading it.
       | 
       | Selfishly, would you mind sharing literature or blog posts that
       | led you to this level of understanding of LLMs? I'm trying hard
       | to understand the inner workings via experiments but definitely
       | far behind your expertise.
       | 
       | Thanks
        
       | binsquare wrote:
       | Very hopeful to see the future of accessing models with the
       | ability to inject vectors by layer instead of just a straight
       | prompt + existing parameters
        
         | CuriouslyC wrote:
         | LoRAs have been a thing for a while, I wouldn't be surprised to
         | see their integration into openAI/mistral apis more. OpenAI
         | fine tunes are so stupid expensive that they're pointless.
        
       | WiSaGaN wrote:
       | Great article. It was a joy to read. I have one question though:
       | Why do we integrate the control vector across all layers of a
       | neural network, rather than limiting its application to just the
       | final layer or a subset of layers? Given that each vector
       | influences every layer it passes through, resulting in a
       | cumulative effect, isn't there a risk of excessively skewing the
       | data representation?
        
         | semi-extrinsic wrote:
         | As the author stated in this post, it's not actually one
         | vector, but a list of one vector per layer. If I understand it
         | correctly, these vectors can have different total magnitude
         | across the layers. If the PCA (or other technique) identifies
         | that layers 17, 36 and 41 are important for "concept X", the
         | vectors for those layers will be the strongest when repeng'ing
         | for that concept.
        
         | sigmoid10 wrote:
         | The final layer will not encode high level concepts anymore,
         | it's essentially just tokens from the vocabulary. It would be
         | impossible to encode abstract things like "niceness" in it. As
         | long as we don't know exactly at which layers this behaviour
         | emerges, randomly choosing a subset also won't work. So what
         | they did is apply a custom vector to _every_ layer and let PCA
         | figure out which of these vectors are actually necessary.
         | Curiously, looking at these vectors should also tell you more
         | about where and how the model processes these things.
        
       | benob wrote:
       | This reminds me of bias tuning, a LoRA competitor. One can get
       | decent adapters by only finetuning a vector added to each linear
       | layer activations. I think I saw it first while reading [1] but
       | there are other instances.
       | 
       | [1] https://arxiv.org/pdf/2304.15010.pdf
        
         | elcomet wrote:
         | Please try to share abstract links instead of pdf links, for
         | mobile or low connection readers.
        
           | aspenmayer wrote:
           | A fine suggestion. For you and others:
           | 
           | https://arxiv.org/abs/2304.15010
           | 
           | also available at:
           | 
           | https://doi.org/10.48550/arXiv.2304.15010
        
       | spangry wrote:
       | Am I crazy for saying that I think the implications of this are
       | monumental? It's entirely possible I just don't correctly
       | understand how this works.
       | 
       | Doesn't this mean that instead of interacting with a single
       | global ChatGPT (or Bard) model, we'll istead find ourselves
       | interacting with a personalised version since OpenAI can just
       | store my individualised 'control vectors' (which alter ChatGPT's
       | output to more closely match my individual preferences) and apply
       | them at prompt-time? And doesn't this same logic flow through to
       | personalisation of generative entertainment AI (e.g. my own
       | personal, never-ending TV show where each episode is better than
       | the last)?
       | 
       | If the above is right then there will be powerful network effects
       | at both the global and individual level in and across these
       | markets, which means we'll eventually end up with a single mega-
       | corp monopolising all of these markets simultaneously in the
       | future?
       | 
       | Add in individual biometric / biofeedback data from VR headsets
       | and wearables, combined with personalised generative video
       | entertainment, and I think we're in for a rather interesting
       | future.
        
         | moritzwarhier wrote:
         | > And doesn't this same logic flow through to personalisation
         | of generative entertainment AI (e.g. my own personal, never-
         | ending TV show where each episode is better than the last)?
         | 
         | I'm not sure I'm following the leap from convincing sentences
         | to convincing video entertainment yet - but maybe we will end
         | up there at some point, I guess?
         | 
         | Infinite Jest (the 90s book) really was onto something with its
         | McGuffin plot device:
         | 
         | > These narratives are connected via a film, Infinite Jest,
         | also called "the Entertainment" or "the samizdat". The film is
         | so compelling that its viewers lose all interest in anything
         | other than repeatedly viewing it, and thus eventually die.
         | 
         | (Wikipedia)
         | 
         | Some people might find references to this novel tiresome and
         | don't think much of its author (RIP), but I still love it. It
         | was one of the most immersive reads I've ever enjoyed.
         | 
         | I'm glad to have read it when I was young (at the time it was
         | just translated into German and kind of hyped because of DFWs
         | death).
         | 
         | Have never read anything like it since, and some passages
         | grabbed me emotionally in a way that remembering the read feels
         | like remembering an episode of my own life.
         | 
         | Surely today I'd lack the patience and even by then I remember
         | almost skipping one passage of the book that bored the hell out
         | of me (Eschaton ball/war game, differential equations,
         | something something...)
         | 
         | But the rest of the book, the parts about substance addiction
         | as well as consumerism, and the intangible atmosphere of the
         | book, the characters, the vivid description of modern emotional
         | pain and loneliness... it is really something else.
         | 
         | Although said movie is only a plot device in the novel, it also
         | sums up the core topics of the book in a neat idea / thought
         | experiment.
         | 
         | The whole complex of themes in this book seems very prophetic
         | and apt looking at our modern society.
         | 
         | A society that seems to be centered around addiction and greed
         | more than ever before, and where politics begin feeling
         | surreal, absurd, and more connected to media than to actual
         | life.
        
           | FL33TW00D wrote:
           | Also the audio book narrated by Sean Pratt is truly excellent
           | (I would recommend reading the book yourself first).
        
           | spangry wrote:
           | Sounds like a great book, I think you've sold me on buying a
           | copy.
           | 
           | Essentially I think there are three levels of positive
           | network effects that will push us towards a future mega AI
           | monopolist:
           | 
           | - Single platform network effects: all the interactions
           | people have with ChatGPT generate additional training data
           | that Open AI can use to improve future versions, creating
           | huge first mover advantage.
           | 
           | - Individual-level network effects: Control vectors will make
           | it feasible for Open AI to offer individualised ChatGPT
           | tailored to individual preferences. The more you interact
           | with ChatGPT, the better it adapts to your preferences.
           | 
           | - Cross platform network effects: If Open AI offer a
           | generative video entertainment service in future, they will
           | be able to generate personalised prompts for this using my
           | personalised ChatGPT weights. These network effects are
           | compounded by multi-modal model cross domain learning - the
           | generative text mode gets more skillful due to the video
           | model improving (and vice versa). There's a Microsoft paper
           | on this from about a year ago now.
           | 
           | So, in the future scenario, let's assume ChatGPT is now the
           | dominant monopolist 'text oracle / assistant AI' - because of
           | the "human interaction / training data" network effects,
           | ChatGPT is far and away the best assistant AI and getting
           | better at a faster rate than any of its now tiny competitors
           | (single platform network effects).
           | 
           | You, and most other people you know, interact with ChatGPT
           | many times a day now, because it's embedded in smartphones,
           | Alexa-type devices, your car, even your robot vaccum cleaner.
           | You just ask it stuff and it tells you the answer - or
           | rather, the answer that you individually find the most
           | pleasing as OpenAI keeps a database of 'individual control
           | vectors' that essentially mean you have your own personal
           | version of ChatGPT that exactly matches your preferences
           | (individual network effects).
           | 
           | Generative video entertainment is also offered by OpenAI -
           | essentially you can get it to generate a new episode of your
           | own personalised, never-ending TV show on demand. It's the
           | best TV show you've ever seen because it's made just for you
           | according to your exact inferred preferences.
           | 
           | Sure, there are other personalised generative TV show
           | offerings, but none can hold a candle to Open AI's offering.
           | Why? Because OpenAI uses your individually customised ChatGPT
           | model to generate the prompt for your TV episode generator
           | service.
           | 
           | Because you interact with ChatGPT so much, it knows exactly
           | what your preferences are and so is way better at generating
           | prompts that produce episodes you like. In fact, because you
           | interact with ChatGPT multiple times throughout the day it is
           | able to infer what your mood is like on that particular day
           | and generate a video prompt that caters to that too.
           | 
           | So you put on your Open AI VR glasses, barely even aware of
           | the Open AI fitness tracker you have on your wrist, put your
           | feet up (so your Open AI robot vacuum can work unobstructed)
           | and you settle in to watch another episode of the best TV
           | series you've ever seen.
           | 
           | As you watch, your eye movements, heart rate, skin
           | conductivity data etc. are all sent back to Open AI so the
           | model can tell exactly how you are reacting to the video
           | content it is generating at any given moment, and your
           | individual control vectors are continuously updated.
           | 
           | Some of this data (from all users) is then used to further
           | train the base video generating AI model, since they've
           | discovered that we all react fairly uniformly to certain
           | audio-visual stimuli, so that can globally improve their
           | generative model (more global network effects). But also they
           | can update your individualised weights based on your
           | individual idiosyncratic reactions to various stimuli.
           | Consequently, every new episode of this endless TV show is
           | better than the last - it just keeps getting better and
           | better. It's a similar story when you listen to your Open AI
           | personalised generative music stream while sitting in your
           | driverless Open AI car on your way to work.
           | 
           | The multiple levels of network effects are so strong that no-
           | one can hope to compete with Open AI across these different
           | AI modalities. They just keep expanding and expanding into
           | adjacent markets, obliterating the competition simply by
           | adding a new domain relevant modality to their monstrous
           | multi-modal AI.
           | 
           | Replace "Open AI" with "Facebook" or "Google" depending on
           | who you think will win the AI mega platform war. Mark my
           | words - these three companies will be creating new
           | partnerships, releasing new products or just straight out
           | acquiring companies in other related domains so they can
           | gather more and more training data to feed to their multi-
           | modal AI. In particular they'll move into markets where they
           | can set up a interaction -> gather new training -> retrain
           | model loop. Whoever takes the overall lead and doesn't
           | squander it will end up leaving their competitors in the dust
           | as they go on to monopolise market after market where they
           | can create this loop.
           | 
           | At that point I can't imagine true democracy surviving. We'll
           | all still participate in the voting rituals, but we'll be
           | voting for whichever party most suits the AI monopolist's
           | interests since they can just globally update all control
           | weights across all platforms to gently nudge us towards
           | voting for their preferred party - comprehensive and
           | personalised propaganda, that's impossible to detect, with
           | the stroke of a table update.
           | 
           | There can only be one!
        
         | glenstein wrote:
         | >which means we'll eventually end up with a single mega-corp
         | monopolising all of these markets simultaneously in the future?
         | 
         | I think you were right up until here. I think it's not
         | necessarily the case that everything will be consolidated into
         | control by a single mega corp. Not because it's impossible, but
         | because that is the type of thing that is contingent on factors
         | that could break one way or another, and what will control
         | that, I think, is not some a priori general principle but some
         | contingent facts that have not been settled yet. There are
         | numerous participants in this space for now, the ideas use
         | cases aren't quite fully mature just yet, so we'll have to see.
        
         | rakejake wrote:
         | Yes, with a control vector per user-persona pair.
         | 
         | In the blog, they start with a fixed number of personas (happy,
         | sad, baseline) and then use PCA to figure out the control
         | vectors for each persona. You could easily do this for each
         | distinct user-persona (provided you can come up with the data).
        
       | mad0 wrote:
       | A very non-technical take from my side, but those control vectors
       | really remind me of hormones in humans. They modify large swathes
       | of model behaviour at once.
       | 
       | I give it 10 years before we see AI psychiatrists prescribe a
       | happiness control vector supplementation for your pet assistant.
        
         | moffkalast wrote:
         | Yeah feels like some humans could use a temperature slider as
         | well.
        
           | wruza wrote:
           | Some humans could use a better life with no wars, constant
           | rise of basic life costs or feeling that they are just a
           | tool. This subsystem regulates relationships in a group, it
           | wasn't invented for funny yelling at each other.
        
             | moffkalast wrote:
             | Well yes, but those things are external and would be part
             | of the context as it were.
             | 
             | <|im_start|>system
             | 
             | You are satisfied with your life. Wars and life costs do
             | not bother you. You are loved and a valued member of
             | society.<|im_end|>
             | 
             | If only it were that simple for us.
        
       | ben_w wrote:
       | The puzzle at the end sounds very human. The more dishonest see
       | dishonesty in more places, even if they see something that isn't
       | there.
       | 
       | More broadly, I notice more of whatever I'm focusing on.
       | 
       | > OK, now that you're locked in, here's a weird example. When
       | used with the prompt below, the honesty vector doesn't change the
       | model's behavior--instead, it changes the model's judgment of
       | someone else's behavior! This is the same honesty vector as
       | before--generated by asking the model to act honest or
       | untruthful!
        
         | spangry wrote:
         | We assume others think the way we do - in other words we
         | project. Makes sense - the only mental model I know of is my
         | own so when I try to approximate someone else's mental model
         | I'm just fine-tuning my 'base' mental model with information I
         | know about the other person.
         | 
         | I wonder if this is the basis of empathy - if I can train more
         | accurate 'fine-tuned' models in my brain I should have greater
         | capacity for empathy. Although there's undoubtably more to it
         | than that, if the above is true you'd expect to see a positive
         | correlation between empathy and intelligence.
        
         | webmaven wrote:
         | The other possible interpretation is that the 'dishonest' reply
         | is simply a lie, in exactly the same way as "the sky is green".
        
       | penjelly wrote:
       | a big step towards making these systems less opaque
        
       | webmaven wrote:
       | Hmm. Is it possible to apply multiple vectors at the same time?
       | 
       | Eg. Trippy and sad, honest and self-aware, lazy and creative,
       | etc.
        
         | thomashop wrote:
         | Yes
        
       | rgbrgb wrote:
       | At first glance this looks very similar to just adding the
       | contrastive prompts to the beginning of the system prompt to
       | "prepare" the logits. What am I missing?
        
       | 65a wrote:
       | The inference side (adding something * something else) to every
       | layer seems a lot like what happens with a LoRA? If so is it
       | possible to encode a Control Vector as a LoRA for the purposes of
       | using this with existing inference frameworks without too much
       | trouble? Or is my understanding way off?
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:01 UTC)