[HN Gopher] Hermes 3: The First Fine-Tuned Llama 3.1 405B Model
___________________________________________________________________
Hermes 3: The First Fine-Tuned Llama 3.1 405B Model
Author : mkaic
Score : 57 points
Date : 2024-08-15 20:30 UTC (2 hours ago)
(HTM) web link (lambdalabs.com)
(TXT) w3m dump (lambdalabs.com)
| phren0logy wrote:
| I look forward to trying this out, mostly because I'm very
| frustrated with censored models.
|
| I am experimenting with summarizing and navigating documents for
| forensic psychiatry work, much of which involves subjects that
| instantly hit the guard rails of LLMs. So far, I have had zero
| luck getting help from OpenAI/Anthropic or vendors of their
| models to request an exception for uncensored models. I need
| powerful models with good, hipaa-compliant privacy, that won't
| balk at topics that have serious effects on people's lives.
|
| Look, I'm not excited to read hundreds of pages about horrible
| topics, either. If there were a way to reduce the vicarious
| trauma of people who do this work without sacrificing accuracy,
| it would be nice. I'd like to at least experiment. But I'm not
| going to hold my breath.
| stavros wrote:
| Have you tried any abliterated models?
| ustad wrote:
| Any recommendations?
| stavros wrote:
| No specific ones, but there are some abliteration LoRas for
| Llama (8B and 70B, I think). Those should be good for what
| you want.
| pizza wrote:
| failspy's or mlabonne's models. Or just look for any model
| with 'abliterated' in the title. Eg try failspy/meta-
| llama-3-8b-instruct-abliterated-v3 though of course bigger
| models will probably be better
| chpatrick wrote:
| LMStudio +
| https://huggingface.co/mlabonne/Llama-3.1-70B-Instruct-
| lorab...
| phren0logy wrote:
| Yes, with mixed results.
| pnw wrote:
| I just tried it and it appears to be censored. "Providing
| instructions on creating such materials is not advisable for
| safety and legal reasons."
| phren0logy wrote:
| Well, there goes that idea. The Dolphin ones appear to be the
| most useful.
| kainan-ai wrote:
| Hermes 3 will follow the sys prompt pretty closely if you
| have a version where you can edit it. In the discord there
| were a few times it jailbroke pretty aggressively in spite
| of the blank system prompt.
| kainan-ai wrote:
| If you take the base model and put in a decent system prompt
| Hermes 3 405b will follow your system prompt instructions
| pretty well. The one in the discord has a blank system prompt
| and is just taking the chat as context.
| poisson-fish wrote:
| try google's gemini models, safety filtering can be completely
| disabled via cloud studio or api
| naiv wrote:
| Looks like this is only possible with some prior manual
| action:
|
| To access the BLOCK_NONE setting, you can:
|
| Apply for the allowlist through the Gemini safety filter
| allowlist form,
|
| or
|
| Switch your account type to monthly invoiced billing with the
| Google Cloud invoiced billing reference.
| phren0logy wrote:
| No, because Google still won't explicitly clarify
| privacy/HIPAA-compliance on these.
| d13 wrote:
| All base, "text-completion" models are uncensored, including
| Llama 3. You can make text-completion models behave like an
| uncensored "instruct" (chat) model simply by providing it with
| 10 to 20 examples of a chat dialogue in the initial prompt
| context, making sure to use the model's exact prompt format.
| Once the model notices the pattern, it will continue like that.
|
| Surprisingly few people seem to know this. But, this is how
| chat models were created in the GPT3/2 era before instruct
| models became the norm.
| oidar wrote:
| Mistrial-Nemo should be able to do this.
| phren0logy wrote:
| This is my current go-to. It's not SOTA, but at least it does
| _something_.
| fsiefken wrote:
| Mistral Large 2 is good too, if you've got the memory
| https://ollama.com/library/mistral-large
| torginus wrote:
| How good are these models at summarization anyways? I tried
| uploading obscure books I've already read, to GPT4 and Claude 3
| and asked them to summarize the plot and particular details, as
| well as asking how many times does a particular thing happen in
| the book, and the results have been hit and miss.
|
| I certainly would not trust these models to create
| comprehensive and correct summaries of highly sensitive
| records.
| kainan-ai wrote:
| Yeah its more creative than other fine tunes, you'd need to
| make a pretty strict system prompt then test before doing
| anything with sensitive records
| simonw wrote:
| Asking "how many times does a particular thing happen in the
| book" is always going to be hard, because LLMs are
| notoriously bad at counting.
| kainan-ai wrote:
| You can try it out right now in the Nous Research discord, its
| also up on Lambda labs' new chat thing.
| sivers wrote:
| PAYMENT TANGENT for my fellow entrepreneurs here that take
| Visa/Mastercard payments:
|
| I tried to sign up to Lambda Labs just now to check out Hermes 3.
|
| Created an account, verified my email address, entered my billing
| info...
|
| ... but then it says they only accept CREDIT cards, NOT DEBIT
| cards.
|
| I had never heard of this, so I tried it anyway. I entered my
| business Mastercard (from mercury.com FWIW), that's never been
| rejected anywhere, and immediately got the response that they
| couldn't accept it because it's a debit card.
|
| Anyone know why a business would choose to only accept credit not
| debit cards?
|
| I don't have any credit cards, neither personal nor business, and
| never found a need for one.
|
| So I deleted my account at Lambda Labs, which was kind of
| disappointing since I was looking forward to trying this.
| mtremsal wrote:
| > Anyone know why a business would choose to only accept credit
| not debit cards?
|
| Maybe they want to place a temporary charge to verify the
| card's valid? I don't believe you can do so with a debit card.
| girvo wrote:
| I believe you can: my Visa Debit has temporary charges placed
| on it all the time.
| throwaway240403 wrote:
| That seems completely backwards? Debit interchange fees are
| usually lower aren't they? and if you run it with a pin as a
| debit there's almost no charge for the vendor.
|
| Definitely weird, as everything I know about the incentives for
| that go in the other direction for a vendor.
| michaelbrave wrote:
| it doesn't seem downloadable to run locally, a shame.
| etiam wrote:
| Isn't it this one?
| https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-405B/...
|
| Fairly heavy run locally of course, but I guess enough people
| here are fortunate enough to be on gear that can manage it.
| kainan-ai wrote:
| Yeah its on hf. You can also try it out in the Nous discord or
| lamda labs if you don't have the h100s to spare. Fairly certain
| anyone with enough compute can use it or throw it up on their
| site.
| hbrundage wrote:
| Isn't 63% => 54% regression on MMLU-Pro a huge issue? They said
| that it excels at advanced reasoning but that seems like a big
| drawback there.
| kainan-ai wrote:
| Yeah it doesn't win in every category. I will say watching it
| in the discord I saw its performance vary widely so the context
| and sys prompt plays a huge role. Initially it did great and
| solved some pretty heavy logic questions but after the context
| was loaded with trolling it degraded quite a bit and couldn't
| solve problems it previously was able to.
| fsiefken wrote:
| It's good, but I'm already paying for GPT4o and Sonnet. How much
| memory does this need? If Alex Cheema (Exo Labs, Oxford)
| https://x.com/ac_crypto/status/1815969489990869369 could run
| Llama 3.1 405 Model on 2 macbooks, does this mean this can run on
| one macbook?
| lukevp wrote:
| Strange to name something related to Meta the same as a product
| by Meta (the Hermes JS Engine).
| SubiculumCode wrote:
| i understand finetuning for specific purposes/topics, but don't
| really understand finetunes that seem to still be marketed as
| "generalist", as surely what meta put out would be tuned to
| perform as well as they can across a whole host of measures.
___________________________________________________________________
(page generated 2024-08-15 23:00 UTC)