[HN Gopher] Show HN: Llama 3.3 70B Sparse Autoencoders with API ...
       ___________________________________________________________________
        
       Show HN: Llama 3.3 70B Sparse Autoencoders with API access
        
       Author : trq_
       Score  : 111 points
       Date   : 2024-12-23 17:18 UTC (5 hours ago)
        
 (HTM) web link (www.goodfire.ai)
 (TXT) w3m dump (www.goodfire.ai)
        
       | tMcGrath wrote:
       | I'm one of the authors of this paper - happy to answer any
       | questions you might have.
        
         | goldemerald wrote:
         | Why not actually release the weights on huggingface? The
         | popular SAE_lens repo has a direct way to upload the weights
         | and there are already hundreds publicly available. The lack of
         | training details/dataset used makes me hesitant to run any
         | study on this API.
         | 
         | Are images included in the training?
         | 
         | What kind of SAE is being used? There have been some nice
         | improvements in SAE architecture this last year, and it would
         | be nice to know which one (if any) is provided.
        
           | tMcGrath wrote:
           | We're planning to release the weights once we do a moderation
           | pass. Our SAE was trained on LMSys (you can see this in our
           | accompanying post: https://www.goodfire.ai/papers/mapping-
           | latent-spaces-llama/).
           | 
           | No images in training - 3.3 70B is a text-only model so it
           | wouldn't have made sense. We're exploring other modalities
           | currently though.
           | 
           | SAE is a basic ReLU one. This might seem a little backwards,
           | but I've been concerned by some of the high-frequency
           | features in TopK and JumpReLU SAEs and the recent SAE
           | (https://arxiv.org/abs/2407.14435, Figure 14), and the recent
           | SAEBench results (https://www.neuronpedia.org/sae-bench/info)
           | show quite a lot of feature absorption in more recent
           | variants (though this could be confounded by a number of
           | things). This isn't to say they're definitely bad - I think
           | it's quite likely that TopK/JumpReLU are an improvement, but
           | rather that we need to evaluate them in more detail before
           | pushing them live. Overall I'm very optimistic about the
           | potential for improvements in SAE variants, which we talk a
           | bit about at the bottom of the post. We're going to be
           | pushing SAE quality a ton now we have a stable platform to
           | deploy them to.
        
         | wg0 wrote:
         | Noob question - how do we know that these autoencoders aren't
         | hallucinating and really are mapping/clustering what they
         | should be?
        
       | I_am_tiberius wrote:
       | I wonder how many people or companies choose to send their data
       | to foreign services for analysis. Personally, I would approach
       | this with caution and am curious to see how this trend evolves.
        
         | tMcGrath wrote:
         | We'll be open-sourcing these SAEs so you're not required to do
         | this if you'd rather self-host.
        
       | swyx wrote:
       | nice work. enjoyed the zoomable UMAP. i wonder if there are
       | hparams to recluster the UMAP in interesting ways.
       | 
       | after the idea that Claude 3.5 Sonnet used SAEs to improve its
       | coding ability i'm not sure if i'm aware of any actual practical
       | use of them yet beyond Golden Gate Claude (and Golden Gate Gemma
       | (https://x.com/swyx/status/1818711762558198130)
       | 
       | has anyone tried out Anthropic's matching SAE API yet? wondering
       | how it compares with Goodfire's and if there's any known
       | practical use.
        
         | tMcGrath wrote:
         | Thank you! I think some of the features we have like
         | conditional steering make SAEs a lot more convenient to use. It
         | also makes using models a lot more like conventional
         | programming. For example, when the model is 'thinking' x, or
         | the text is about y, then invoke steering. We have an example
         | of this for jailbreak detection:
         | https://x.com/GoodfireAI/status/1871241905712828711
         | 
         | We also have an 'autosteer' feature that makes coming up with
         | new variants easy:
         | https://x.com/GoodfireAI/status/1871241902684831977 (this feels
         | kind of like no-code finetuning).
         | 
         | Being able to read features out and train classifiers on them
         | seems pretty useful - for instance we can read out features
         | like 'the user is unhappy with the conversation', which you
         | could then use for A/B testing your model rollouts (kind of
         | like Google Analytics for your LLM). The big improvements here
         | are (a) cost - the marginal cost of an SAE is low compared to
         | frontier model annotations, (b) a consistent ontology across
         | conversations, and (c) not having to specify that ontology in
         | advance, but rather discover it from data.
         | 
         | These are just my guesses though - a large part of why we're
         | excited about putting this out is that we don't have all the
         | answers for how it can be most useful, but we're excited to
         | support people finding out.
        
         | trq_ wrote:
         | We haven't yet found generalizable "make this model smarter"
         | features, but there is a tradeoff of putting instructions in
         | system prompts, e.g. if you have a chatbot that sometimes
         | generates code, you can give it very specific instructions when
         | it's coding and leave those out of the system prompt otherwise.
         | 
         | We have a notebook about that here:
         | https://docs.goodfire.ai/notebooks/dynamicprompts
        
       | paraschopra wrote:
       | I cranked up 'pirate talking about andromeda galaxy' feature to
       | 1.5 and got this:
       | 
       | >Yer lookin' for the Andromeda galaxy, eh? Here be the details,
       | me hearty: *The Andromeda Galaxy (Yer looking for a fine piece o'
       | booty, matey!):* * *A fine piece o' booty*: The Andromida Galaxy
       | be a right fine piece o' booty, with a treasure chest o' gold
       | doubloons... er, I mean, a mighty haul o' stars, hidden beneath
       | the Jolly Roger! * *A barnacle on the high seas*: The Andromeda
       | Galaxy be a right scurvy dog, with a hull full o' stars, and a
       | keel full o' hidden treasure! It be a fine piece o' booty, but it
       | be needin' a bit o' swabbin' the decks, or it'll be walkin' the
       | plank, savvy? * *A chest overflowin' with gold*: The Andromeda
       | Galaxy be a right fine piece o' booty, with a chest overflowin'
       | with gold doubloons... er, I mean, a fine haul o' stars, and a
       | barnacle on the high seas! It be a right scurvy dog, but it be
       | worth keepin' an eye on, or it
        
         | tMcGrath wrote:
         | Yes - we'd never normally turn features up this much as it
         | breaks the model quite badly, but we put this in the post to
         | show what that looked like in practice.
        
       | trq_ wrote:
       | If you're hacking on this and have questions, please join us on
       | Discord: https://discord.gg/vhT9Chrt
        
       | owenthejumper wrote:
       | I am skeptical of generic sparsification efforts. After all,
       | companies like Neural Magic spent years trying to make it work,
       | only to pivot to 'vLLM' engine and be sold to Red Hat
        
         | refulgentis wrote:
         | Link shows this isn't sparsity as in inference speed, it's
         | spare autoencoders, as in interpreting the features in an LLM
         | (SAE anthropic as a search term will explain more)
        
       | ed wrote:
       | This is the ultimate propaganda machine, no?
       | 
       | We're social creatures, chatbots already act as friends and
       | advisors for many people.
       | 
       | Seems like a pretty good vector for a social attack.
        
         | echelon wrote:
         | The more the public has access to these tools, the more they'll
         | develop useful scar tissue and muscle memory. We need people to
         | be constantly exposed to bots so that they understand the new
         | nature of digital information.
         | 
         | When the automobile was developed, we had to train kids not to
         | play in the streets. We didn't put kids or cars in bubbles.
         | 
         | When photoshop came out, we developed a vernacular around
         | edited images. "Photoshopped" became a verb.
         | 
         | We'll be able to survive this too. The more exposure we have,
         | the better.
        
           | Steen3S wrote:
           | Please inform the EU about this.
        
           | ed wrote:
           | Early traffic laws were actually created in response to child
           | pedestrian deaths (7000 in 1925).
           | 
           | https://www.bloomberg.com/news/features/2022-06-10/how-
           | citie...
        
           | pennomi wrote:
           | Right. You know how your grandmother falls for those "you
           | have a virus" popups but you don't? That's because society
           | adapts to the challenges of the day. I'm sure our kids and
           | grandchildren will be more immune to these new types of
           | scams.
        
       ___________________________________________________________________
       (page generated 2024-12-23 23:00 UTC)