[HN Gopher] Show HN: Llama 3.3 70B Sparse Autoencoders with API ...
___________________________________________________________________
Show HN: Llama 3.3 70B Sparse Autoencoders with API access
Author : trq_
Score : 111 points
Date : 2024-12-23 17:18 UTC (5 hours ago)
(HTM) web link (www.goodfire.ai)
(TXT) w3m dump (www.goodfire.ai)
| tMcGrath wrote:
| I'm one of the authors of this paper - happy to answer any
| questions you might have.
| goldemerald wrote:
| Why not actually release the weights on huggingface? The
| popular SAE_lens repo has a direct way to upload the weights
| and there are already hundreds publicly available. The lack of
| training details/dataset used makes me hesitant to run any
| study on this API.
|
| Are images included in the training?
|
| What kind of SAE is being used? There have been some nice
| improvements in SAE architecture this last year, and it would
| be nice to know which one (if any) is provided.
| tMcGrath wrote:
| We're planning to release the weights once we do a moderation
| pass. Our SAE was trained on LMSys (you can see this in our
| accompanying post: https://www.goodfire.ai/papers/mapping-
| latent-spaces-llama/).
|
| No images in training - 3.3 70B is a text-only model so it
| wouldn't have made sense. We're exploring other modalities
| currently though.
|
| SAE is a basic ReLU one. This might seem a little backwards,
| but I've been concerned by some of the high-frequency
| features in TopK and JumpReLU SAEs and the recent SAE
| (https://arxiv.org/abs/2407.14435, Figure 14), and the recent
| SAEBench results (https://www.neuronpedia.org/sae-bench/info)
| show quite a lot of feature absorption in more recent
| variants (though this could be confounded by a number of
| things). This isn't to say they're definitely bad - I think
| it's quite likely that TopK/JumpReLU are an improvement, but
| rather that we need to evaluate them in more detail before
| pushing them live. Overall I'm very optimistic about the
| potential for improvements in SAE variants, which we talk a
| bit about at the bottom of the post. We're going to be
| pushing SAE quality a ton now we have a stable platform to
| deploy them to.
| wg0 wrote:
| Noob question - how do we know that these autoencoders aren't
| hallucinating and really are mapping/clustering what they
| should be?
| I_am_tiberius wrote:
| I wonder how many people or companies choose to send their data
| to foreign services for analysis. Personally, I would approach
| this with caution and am curious to see how this trend evolves.
| tMcGrath wrote:
| We'll be open-sourcing these SAEs so you're not required to do
| this if you'd rather self-host.
| swyx wrote:
| nice work. enjoyed the zoomable UMAP. i wonder if there are
| hparams to recluster the UMAP in interesting ways.
|
| after the idea that Claude 3.5 Sonnet used SAEs to improve its
| coding ability i'm not sure if i'm aware of any actual practical
| use of them yet beyond Golden Gate Claude (and Golden Gate Gemma
| (https://x.com/swyx/status/1818711762558198130)
|
| has anyone tried out Anthropic's matching SAE API yet? wondering
| how it compares with Goodfire's and if there's any known
| practical use.
| tMcGrath wrote:
| Thank you! I think some of the features we have like
| conditional steering make SAEs a lot more convenient to use. It
| also makes using models a lot more like conventional
| programming. For example, when the model is 'thinking' x, or
| the text is about y, then invoke steering. We have an example
| of this for jailbreak detection:
| https://x.com/GoodfireAI/status/1871241905712828711
|
| We also have an 'autosteer' feature that makes coming up with
| new variants easy:
| https://x.com/GoodfireAI/status/1871241902684831977 (this feels
| kind of like no-code finetuning).
|
| Being able to read features out and train classifiers on them
| seems pretty useful - for instance we can read out features
| like 'the user is unhappy with the conversation', which you
| could then use for A/B testing your model rollouts (kind of
| like Google Analytics for your LLM). The big improvements here
| are (a) cost - the marginal cost of an SAE is low compared to
| frontier model annotations, (b) a consistent ontology across
| conversations, and (c) not having to specify that ontology in
| advance, but rather discover it from data.
|
| These are just my guesses though - a large part of why we're
| excited about putting this out is that we don't have all the
| answers for how it can be most useful, but we're excited to
| support people finding out.
| trq_ wrote:
| We haven't yet found generalizable "make this model smarter"
| features, but there is a tradeoff of putting instructions in
| system prompts, e.g. if you have a chatbot that sometimes
| generates code, you can give it very specific instructions when
| it's coding and leave those out of the system prompt otherwise.
|
| We have a notebook about that here:
| https://docs.goodfire.ai/notebooks/dynamicprompts
| paraschopra wrote:
| I cranked up 'pirate talking about andromeda galaxy' feature to
| 1.5 and got this:
|
| >Yer lookin' for the Andromeda galaxy, eh? Here be the details,
| me hearty: *The Andromeda Galaxy (Yer looking for a fine piece o'
| booty, matey!):* * *A fine piece o' booty*: The Andromida Galaxy
| be a right fine piece o' booty, with a treasure chest o' gold
| doubloons... er, I mean, a mighty haul o' stars, hidden beneath
| the Jolly Roger! * *A barnacle on the high seas*: The Andromeda
| Galaxy be a right scurvy dog, with a hull full o' stars, and a
| keel full o' hidden treasure! It be a fine piece o' booty, but it
| be needin' a bit o' swabbin' the decks, or it'll be walkin' the
| plank, savvy? * *A chest overflowin' with gold*: The Andromeda
| Galaxy be a right fine piece o' booty, with a chest overflowin'
| with gold doubloons... er, I mean, a fine haul o' stars, and a
| barnacle on the high seas! It be a right scurvy dog, but it be
| worth keepin' an eye on, or it
| tMcGrath wrote:
| Yes - we'd never normally turn features up this much as it
| breaks the model quite badly, but we put this in the post to
| show what that looked like in practice.
| trq_ wrote:
| If you're hacking on this and have questions, please join us on
| Discord: https://discord.gg/vhT9Chrt
| owenthejumper wrote:
| I am skeptical of generic sparsification efforts. After all,
| companies like Neural Magic spent years trying to make it work,
| only to pivot to 'vLLM' engine and be sold to Red Hat
| refulgentis wrote:
| Link shows this isn't sparsity as in inference speed, it's
| spare autoencoders, as in interpreting the features in an LLM
| (SAE anthropic as a search term will explain more)
| ed wrote:
| This is the ultimate propaganda machine, no?
|
| We're social creatures, chatbots already act as friends and
| advisors for many people.
|
| Seems like a pretty good vector for a social attack.
| echelon wrote:
| The more the public has access to these tools, the more they'll
| develop useful scar tissue and muscle memory. We need people to
| be constantly exposed to bots so that they understand the new
| nature of digital information.
|
| When the automobile was developed, we had to train kids not to
| play in the streets. We didn't put kids or cars in bubbles.
|
| When photoshop came out, we developed a vernacular around
| edited images. "Photoshopped" became a verb.
|
| We'll be able to survive this too. The more exposure we have,
| the better.
| Steen3S wrote:
| Please inform the EU about this.
| ed wrote:
| Early traffic laws were actually created in response to child
| pedestrian deaths (7000 in 1925).
|
| https://www.bloomberg.com/news/features/2022-06-10/how-
| citie...
| pennomi wrote:
| Right. You know how your grandmother falls for those "you
| have a virus" popups but you don't? That's because society
| adapts to the challenges of the day. I'm sure our kids and
| grandchildren will be more immune to these new types of
| scams.
___________________________________________________________________
(page generated 2024-12-23 23:00 UTC)