[HN Gopher] Nemotron-4-340B
       ___________________________________________________________________
        
       Nemotron-4-340B
        
       Author : bcatanzaro
       Score  : 97 points
       Date   : 2024-06-14 16:01 UTC (7 hours ago)
        
 (HTM) web link (blogs.nvidia.com)
 (TXT) w3m dump (blogs.nvidia.com)
        
       | observationist wrote:
       | This is (possibly) a GPT-4 level dense model with an open source
       | license. Nvidia has released models with issues before, but
       | reports on this so far indicate it's a solid contender without
       | any of the hiccups of previous releases.
       | 
       | A 340B model should require around 700GB vram or ram to run
       | inference. To train or finetune, you're looking at almost double,
       | which is probably why Nvidia recommends 2xA100 nodes with 1.28TB
       | vram.
       | 
       | Jensen Huang is the king of AI summer.
        
         | waldrews wrote:
         | do you mean 1.28 TB?
        
           | observationist wrote:
           | Yes, thank you for catching that!
        
         | throwaway_ab wrote:
         | How would a server/workstation like this be setup?
         | 
         | I thought you could only use the vram on the GPU, so for 700GB
         | you would need 8-9 A100 nodes as 2 only gives 160GB.
         | 
         | I've been trying to figure out how to build a local system to
         | run inference and train on top of LLM models, I thought there
         | was no way to add vram to a system outside of adding more and
         | more GPU's or use system ram (DDR5) even though that would be
         | considerably slower.
        
           | toshinoriyagi wrote:
           | An A100 node has 8 A100s in it, each with 80GB, which is how
           | they got the 1.28TB number 2 * (80 * 8).
        
         | samspenc wrote:
         | I wonder if the open-source LLM community understands what just
         | happened here - we finally got a truly large LLM (a whopping
         | 340B!) but it costs ... $15K per A100 x 16 GPUs = a minimum of
         | $240K to just get started. Probably closer to $500K or half a
         | million dollars once you factor in space, power, cooling,
         | infrastructure etc.
        
           | lhl wrote:
           | You could probably run it as a Q4 (definitely as a Q3) on 4 x
           | A6000 (so on a $25K workstation), although you'd probably
           | also be looking about 3-4 tok/s text generation. I do think
           | that it's a big landmark to have a true GPT4-class model
           | (with some questionable RL though from my initial testing).
           | The best thing about it is that it's almost certainly now the
           | strongest model available for generating synthetic data
           | without any licensing restrictions.
           | 
           | Funnily enough, I don't think it's actually the most
           | interesting model that Nvidia released this week. Nvidia also
           | published this paper https://arxiv.org/abs/2406.07887 and
           | released
           | https://huggingface.co/nvidia/mamba2-hybrid-8b-3t-128k
           | (Apache 2.0 licensed, to boot). It looks like it matches (and
           | sometimes even edges out) Transformer performance, while
           | having linear scaling for context length. Can't wait for a
           | scaled up version of this.
           | 
           | Nvidia also released a top-notch Llama3 70B SteerLM reward
           | model as well (although RLHFlow/ArmoRM-Llama3-8B-v0.1 might
           | still be a better choice).
        
         | rthnbgrredf wrote:
         | With CPU inference you just need a server with 1.28TB RAM. Yes,
         | the inference will be super slow, but it is more realistic than
         | to spend 100k+ dollars for A100 clusters with 1.28TB VRAM.
         | 
         | One example: HP DL580 Gen8. Use the 32GB PC3L-14900L LRDIMMs
         | (HP PN 715275-001; 712384-001, 708643-B21) for a maximum of
         | 3TB. You can get the LRDIMMs in the $32-$45 range on the
         | second-hand market.
        
       | option wrote:
       | 3 models are included: base, instruct, and reward. All under
       | license permitting synthetic data generation and commercial use.
        
       | diggan wrote:
       | The "open" and "permissive" license has an interesting section on
       | "AI Ethics":
       | 
       | > AI Ethics. NVIDIA is committed to safety, trust and
       | transparency in AI development. NVIDIA encourages You to (a)
       | ensure that the product or service You develop, use, offer as a
       | service or distributes meets the legal and ethical requirements
       | of the relevant industry or use case, (b) take reasonable
       | measures to address unintended bias and to mitigate harm to
       | others, including underrepresented or vulnerable groups, and (c)
       | inform users of the nature and limitations of the product or
       | service. NVIDIA expressly prohibits the use of its products or
       | services for any purpose in violation of applicable law or
       | regulation, including but not limited to (a) illegal
       | surveillance, (b) illegal collection or processing of biometric
       | information without the consent of the subject where required
       | under applicable law, or (c) illegal harassment, abuse,
       | threatening or bullying of individuals or groups of individuals
       | or intentionally misleading or deceiving others
       | 
       | https://developer.download.nvidia.com/licenses/nvidia-open-m...
       | 
       | Besides limiting the freedom of use (making it less "open" in my
       | eyes), it's interesting that they tell you to meet "ethical
       | requirements of the relevant industry or use case". Seems like
       | that'd be super hard to pin down in a precise way.
        
         | jerbear4328 wrote:
         | I read that as "NVIDIA _encourages_ you to be ethical and
         | _prohibits_ breaking the law. That doesn 't seem so bad to me.
         | What is bad, however, is section 2.1.
         | 
         | > 2.1 ... If You institute ... litigation against any entity
         | ... alleging that the Model or a Derivative Model constitutes
         | direct or contributory copyright or patent infringement, then
         | any licenses granted to You under this Agreement for that Model
         | or Derivative Model will terminate...
         | 
         | If you sue or file a copyright claim that the model violates
         | copyright, you lose your license to use the model. That's a
         | really weird restriction, I'm not sure what the point is.
        
           | bcatanzaro wrote:
           | The point is: if you sue claiming this model breaks the law,
           | you lose your license to use it.
           | 
           | Apache 2.0 has a similar restriction: " If You institute
           | patent litigation against any entity (including a cross-claim
           | or counterclaim in a lawsuit) alleging that the Work or a
           | Contribution incorporated within the Work constitutes direct
           | or contributory patent infringement, then any patent licenses
           | granted to You under this License for that Work shall
           | terminate as of the date such litigation is filed."
        
             | jerbear4328 wrote:
             | Oh, I didn't realize that it was a standard term. I'm sure
             | there's a good motivation then, it doesn't seem so bad.
        
             | orra wrote:
             | True, although it's unusual to see it for copyright not
             | patents.
             | 
             | That said, the far bigger issue is the end of the same
             | clause 2.1:
             | 
             | > NVIDIA may update this Agreement to comply with legal and
             | regulatory requirements at any time and You agree to either
             | comply with any updated license or cease Your copying, use,
             | and distribution of the Model and any Derivative Model
        
           | sebzim4500 wrote:
           | Sounds reasonable to me. If you are going to claim in court
           | the the model is illegal then why exactly are you using it?
        
         | abdullahkhalids wrote:
         | It's good they have included this clause, despite it being
         | difficult to legally pin down. Hopefully, there will be a
         | lawsuit at some point which will create some ethical boundaries
         | that AI developers and users much not cross.
        
         | imglorp wrote:
         | Very weaselly worded. Some things that appear to be allowed:
         | * intended bias         * legal surveillance         * legal
         | collection of biometrics without consent         * legal
         | harrassment
         | 
         | Ie, state sanctioned killbots are just fine!
        
           | telotortium wrote:
           | No copyright license is going to stop a state from using the
           | model for the military use that they really need. First of
           | all, I'm pretty sure most countries have laws allowing the
           | state to ignore copyright in the case of national defense.
           | More importantly, power does what it wants and what it can
           | get away with.
        
         | IncreasePosts wrote:
         | It says "NVIDIA encourages You to..."
         | 
         | Which, in terms of a contract, means absolutely nothing at all.
        
           | mushufasa wrote:
           | Google famously removed "don't be evil" because lawyers
           | pushed back on who gets to define evil. I can imagine same
           | logic applies here: Nvidia isn't about to define objective
           | morality, so the best alternative is to ask people to try
           | their best.
        
           | brianshaler wrote:
           | I'm not sure what business GP is in, but being encouraged not
           | to be unethical and explicitly forbidding illegal activity
           | doesn't seem like much of an infringement on one's freedom
           | more than the applicable laws. I guess being arrested for
           | crimes is one thing, but having a license revoked on top of
           | that is just one step too far?
        
       | ilaksh wrote:
       | Has anyone runs evaluations to compare the instruct version with
       | gpt-4o or llama3-70b etc.? It's so much larger than the leading
       | open source models. So one would hope it would perform
       | significantly better?
       | 
       | Or is this in one of the chat arenas or whatever? Very curious to
       | see some numbers related to the performance.
       | 
       | But if it's at least somewhat better than the existing open
       | source models then that is a big boost for open source training
       | and other use cases.
        
         | rllearneratwork wrote:
         | this is june-chatbot model currently running on chatbot arena
         | from lmsys
        
       | Something1234 wrote:
       | What is it? Is it an llms or what?
        
         | danielhanchen wrote:
         | Oh NVIDIA released an open weights 340 billion parameter LLM!
         | 
         | It should be the biggest open weights to date I think (Grok
         | 314b).
         | 
         | It's trained on 8 trillion tokens, and some benchmarks show it
         | does better than or equal to GPT-4o!
         | 
         | They released 3 checkpoints - the base, the instruct and a
         | reward aligned model.
         | 
         | See
         | https://huggingface.co/collections/nvidia/nemotron-4-340b-66...
         | for all the checkpoints
        
       | bguberfain wrote:
       | "Nemotron-4-340B-Instruct is a chat model intended for use for
       | the English language" - frustrating
        
       | vosper wrote:
       | Why does nvidia release models that compete with its customers
       | businesses but don't make any money for nvidia?
       | 
       | Are they commodotising their complements?
        
         | vineyardmike wrote:
         | > [commoditizing] their complements
         | 
         | That's exactly what this would be.
         | 
         | > compete with its customers businesses
         | 
         | I suspect most of their business comes from a few massive
         | corporate spenders, not a "long tail" of smaller businesses, so
         | it seems like a questionable goal to disrupt those customers
         | without a clear path to _new_ customers. Then again, few have
         | the resources to run this model, so I guess this just ensures
         | that their big customers are all working with some floor in
         | model size? Probably won 't impact anything realistically.
        
         | logicchains wrote:
         | They target this model at generating synthetic data. Data is
         | the lifeblood of LLM training; quality synthetic data means
         | more training can occur which means more demand for GPUs.
        
         | WithinReason wrote:
         | The model is big enough that you need expensive Nvidia GPUs to
         | run it effectively
        
       | WithinReason wrote:
       | "...and were sized to fit on a single DGX H100 with 8 GPUs when
       | deployed in FP8 precision"
       | 
       | OK I see the goal is to sell more H100s, they made it big enough
       | so it's not compatible with a cheaper GPU
        
       | vineyardmike wrote:
       | > The Nemotron-4 340B family includes base, instruct and reward
       | models that form a pipeline to generate synthetic data used for
       | training and refining LLMs.
       | 
       | I feel like everyone is missing this from the announcement. They
       | explicitly are releasing this to help _generate synthetic
       | training data_. Most big models and APIs have clauses that ban
       | its use to improve other models. Sure it maybe can compete with
       | other big commercial models at normal tasks, but this would be a
       | huge opportunity for ML labs and startups to expand training data
       | of smaller models.
       | 
       | Nvidia must see a limit to the growth of new models (and new
       | demand for training with their GPUs) based on the availability of
       | training data, so they're seeking to provide a tool to bypass
       | those restrictions.
       | 
       | All for the low price of 2x A100s...
        
         | logicchains wrote:
         | >They explicitly are releasing this to help generate synthetic
         | training data
         | 
         | Synthetic training data is basically free money for NVidia;
         | there's only a fixed amount of high-quality original data
         | around, but there's a potential for essentially infinite
         | synthetic data, and more data means more training hours means
         | more GPU demand.
        
         | jsheard wrote:
         | > Most big models and APIs have clauses that ban its use to
         | improve other models.
         | 
         | I will never get over the gall of anything and everything being
         | deemed fair game to use as training data for a model, _except_
         | you 're not allowed to use the output of a model to train your
         | own model without permission, because model output has some
         | kind of exclusive super-copyright apparently.
        
           | vineyardmike wrote:
           | > because model output has some kind of exclusive super-
           | copyright apparently
           | 
           | Well, its not copyright that is being used to forbid this,
           | its _terms of service_ , but yea, it is quite a hypocrisy.
        
         | cyanydeez wrote:
         | GIGOaaS
        
       ___________________________________________________________________
       (page generated 2024-06-14 23:02 UTC)