[HN Gopher] Hermes 3: The First Fine-Tuned Llama 3.1 405B Model
       ___________________________________________________________________
        
       Hermes 3: The First Fine-Tuned Llama 3.1 405B Model
        
       Author : mkaic
       Score  : 57 points
       Date   : 2024-08-15 20:30 UTC (2 hours ago)
        
 (HTM) web link (lambdalabs.com)
 (TXT) w3m dump (lambdalabs.com)
        
       | phren0logy wrote:
       | I look forward to trying this out, mostly because I'm very
       | frustrated with censored models.
       | 
       | I am experimenting with summarizing and navigating documents for
       | forensic psychiatry work, much of which involves subjects that
       | instantly hit the guard rails of LLMs. So far, I have had zero
       | luck getting help from OpenAI/Anthropic or vendors of their
       | models to request an exception for uncensored models. I need
       | powerful models with good, hipaa-compliant privacy, that won't
       | balk at topics that have serious effects on people's lives.
       | 
       | Look, I'm not excited to read hundreds of pages about horrible
       | topics, either. If there were a way to reduce the vicarious
       | trauma of people who do this work without sacrificing accuracy,
       | it would be nice. I'd like to at least experiment. But I'm not
       | going to hold my breath.
        
         | stavros wrote:
         | Have you tried any abliterated models?
        
           | ustad wrote:
           | Any recommendations?
        
             | stavros wrote:
             | No specific ones, but there are some abliteration LoRas for
             | Llama (8B and 70B, I think). Those should be good for what
             | you want.
        
             | pizza wrote:
             | failspy's or mlabonne's models. Or just look for any model
             | with 'abliterated' in the title. Eg try failspy/meta-
             | llama-3-8b-instruct-abliterated-v3 though of course bigger
             | models will probably be better
        
             | chpatrick wrote:
             | LMStudio +
             | https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-
             | lorab...
        
           | phren0logy wrote:
           | Yes, with mixed results.
        
         | pnw wrote:
         | I just tried it and it appears to be censored. "Providing
         | instructions on creating such materials is not advisable for
         | safety and legal reasons."
        
           | phren0logy wrote:
           | Well, there goes that idea. The Dolphin ones appear to be the
           | most useful.
        
             | kainan-ai wrote:
             | Hermes 3 will follow the sys prompt pretty closely if you
             | have a version where you can edit it. In the discord there
             | were a few times it jailbroke pretty aggressively in spite
             | of the blank system prompt.
        
           | kainan-ai wrote:
           | If you take the base model and put in a decent system prompt
           | Hermes 3 405b will follow your system prompt instructions
           | pretty well. The one in the discord has a blank system prompt
           | and is just taking the chat as context.
        
         | poisson-fish wrote:
         | try google's gemini models, safety filtering can be completely
         | disabled via cloud studio or api
        
           | naiv wrote:
           | Looks like this is only possible with some prior manual
           | action:
           | 
           | To access the BLOCK_NONE setting, you can:
           | 
           | Apply for the allowlist through the Gemini safety filter
           | allowlist form,
           | 
           | or
           | 
           | Switch your account type to monthly invoiced billing with the
           | Google Cloud invoiced billing reference.
        
           | phren0logy wrote:
           | No, because Google still won't explicitly clarify
           | privacy/HIPAA-compliance on these.
        
         | d13 wrote:
         | All base, "text-completion" models are uncensored, including
         | Llama 3. You can make text-completion models behave like an
         | uncensored "instruct" (chat) model simply by providing it with
         | 10 to 20 examples of a chat dialogue in the initial prompt
         | context, making sure to use the model's exact prompt format.
         | Once the model notices the pattern, it will continue like that.
         | 
         | Surprisingly few people seem to know this. But, this is how
         | chat models were created in the GPT3/2 era before instruct
         | models became the norm.
        
         | oidar wrote:
         | Mistrial-Nemo should be able to do this.
        
           | phren0logy wrote:
           | This is my current go-to. It's not SOTA, but at least it does
           | _something_.
        
           | fsiefken wrote:
           | Mistral Large 2 is good too, if you've got the memory
           | https://ollama.com/library/mistral-large
        
         | torginus wrote:
         | How good are these models at summarization anyways? I tried
         | uploading obscure books I've already read, to GPT4 and Claude 3
         | and asked them to summarize the plot and particular details, as
         | well as asking how many times does a particular thing happen in
         | the book, and the results have been hit and miss.
         | 
         | I certainly would not trust these models to create
         | comprehensive and correct summaries of highly sensitive
         | records.
        
           | kainan-ai wrote:
           | Yeah its more creative than other fine tunes, you'd need to
           | make a pretty strict system prompt then test before doing
           | anything with sensitive records
        
           | simonw wrote:
           | Asking "how many times does a particular thing happen in the
           | book" is always going to be hard, because LLMs are
           | notoriously bad at counting.
        
         | kainan-ai wrote:
         | You can try it out right now in the Nous Research discord, its
         | also up on Lambda labs' new chat thing.
        
       | sivers wrote:
       | PAYMENT TANGENT for my fellow entrepreneurs here that take
       | Visa/Mastercard payments:
       | 
       | I tried to sign up to Lambda Labs just now to check out Hermes 3.
       | 
       | Created an account, verified my email address, entered my billing
       | info...
       | 
       | ... but then it says they only accept CREDIT cards, NOT DEBIT
       | cards.
       | 
       | I had never heard of this, so I tried it anyway. I entered my
       | business Mastercard (from mercury.com FWIW), that's never been
       | rejected anywhere, and immediately got the response that they
       | couldn't accept it because it's a debit card.
       | 
       | Anyone know why a business would choose to only accept credit not
       | debit cards?
       | 
       | I don't have any credit cards, neither personal nor business, and
       | never found a need for one.
       | 
       | So I deleted my account at Lambda Labs, which was kind of
       | disappointing since I was looking forward to trying this.
        
         | mtremsal wrote:
         | > Anyone know why a business would choose to only accept credit
         | not debit cards?
         | 
         | Maybe they want to place a temporary charge to verify the
         | card's valid? I don't believe you can do so with a debit card.
        
           | girvo wrote:
           | I believe you can: my Visa Debit has temporary charges placed
           | on it all the time.
        
         | throwaway240403 wrote:
         | That seems completely backwards? Debit interchange fees are
         | usually lower aren't they? and if you run it with a pin as a
         | debit there's almost no charge for the vendor.
         | 
         | Definitely weird, as everything I know about the incentives for
         | that go in the other direction for a vendor.
        
       | michaelbrave wrote:
       | it doesn't seem downloadable to run locally, a shame.
        
         | etiam wrote:
         | Isn't it this one?
         | https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-405B/...
         | 
         | Fairly heavy run locally of course, but I guess enough people
         | here are fortunate enough to be on gear that can manage it.
        
         | kainan-ai wrote:
         | Yeah its on hf. You can also try it out in the Nous discord or
         | lamda labs if you don't have the h100s to spare. Fairly certain
         | anyone with enough compute can use it or throw it up on their
         | site.
        
       | hbrundage wrote:
       | Isn't 63% => 54% regression on MMLU-Pro a huge issue? They said
       | that it excels at advanced reasoning but that seems like a big
       | drawback there.
        
         | kainan-ai wrote:
         | Yeah it doesn't win in every category. I will say watching it
         | in the discord I saw its performance vary widely so the context
         | and sys prompt plays a huge role. Initially it did great and
         | solved some pretty heavy logic questions but after the context
         | was loaded with trolling it degraded quite a bit and couldn't
         | solve problems it previously was able to.
        
       | fsiefken wrote:
       | It's good, but I'm already paying for GPT4o and Sonnet. How much
       | memory does this need? If Alex Cheema (Exo Labs, Oxford)
       | https://x.com/ac_crypto/status/1815969489990869369 could run
       | Llama 3.1 405 Model on 2 macbooks, does this mean this can run on
       | one macbook?
        
       | lukevp wrote:
       | Strange to name something related to Meta the same as a product
       | by Meta (the Hermes JS Engine).
        
       | SubiculumCode wrote:
       | i understand finetuning for specific purposes/topics, but don't
       | really understand finetunes that seem to still be marketed as
       | "generalist", as surely what meta put out would be tuned to
       | perform as well as they can across a whole host of measures.
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:00 UTC)