hngopher.com

       [HN Gopher] Tencent Hunyuan-Large
       ___________________________________________________________________
        
       Tencent Hunyuan-Large
        
       Author : helloericsf
       Score  : 91 points
       Date   : 2024-11-05 18:52 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | helloericsf wrote:
       | - 389 billion parameters and 52 billion activation parameters,
       | capable of handling up to 256K tokens. - outperforms LLama3.1-70B
       | and exhibits comparable performance when compared to the
       | significantly larger LLama3.1-405B model.
        
         | Etheryte wrote:
         | It's a bit funny to call the 405B reference "significantly
         | larger" than their 389B, while highlighting the fact that their
         | 389B outperforms the 70B.
        
           | klipt wrote:
           | It's a whole 4% smaller!
        
           | rose_ann_ wrote:
           | MoE model with 52 billion activated parameters means its more
           | comparable to a (dense) 70b model and not a dense 405b model
        
             | HPsquared wrote:
             | Does this mean it runs faster or better on multiple GPUs?
        
               | chessgecko wrote:
               | For decode steps it depends on the number of inputs you
               | run at a time. If your batch size is 1 then it runs in
               | line with active params, then as you get to like batch
               | size 8 it runs in line with all params, then as you
               | increase to 128ish it runs like the active params again.
               | 
               | For the context encode it's always close to as fast as a
               | model with a similar number of active params.
               | 
               | For running on your own the issue is going to be fitting
               | all the params on your gpu. If you're loading off disk
               | anyways this will be faster but if this forces you to put
               | stuff on disk it will be much slower.
        
             | phkahler wrote:
             | >> MoE model with 52 billion activated parameters means its
             | more comparable to a (dense) 70b model and not a dense 405b
             | model
             | 
             | Only when talking about how fast it can produce output.
             | From a capability point of view it makes sense to compare
             | the larger number of parameters. I suppose there's also a
             | "total storage" comparison too, since didn't they say this
             | is 8bit model weights, where llama is 16bit?
        
       | eptcyka wrote:
       | Definitely not trained on Nvidia or AMD GPUs.
        
         | acchow wrote:
         | How do you know this?
         | 
         | Apparently 20% of Nvidia's quarterly revenue is booked in
         | Singapore where shell companies divert product to China:
         | https://news.ycombinator.com/item?id=42048065
        
           | smnrg wrote:
           | Sarcasm is a valid theory.
        
           | azinman2 wrote:
           | I assume it was missing /s
        
         | rb2k_ wrote:
         | The readme mentioned H20 GPUs. Nvidia's "China compatible" card
         | (41% Fewer Cores & 28% Lower Performance Versus Top Hopper H100
         | Config)
        
           | 1R053 wrote:
           | you can get a long way on something with 41% less performance
           | than your favorite supercar...
        
       | mrob wrote:
       | Not open source. Even if we accept model weights as source code,
       | which is highly dubious, this clearly violates clauses 5 and 6 of
       | the Open Source Definition. It discriminates between users
       | (clause 5) by refusing to grant any rights to users in the
       | European Union, and it discriminates between uses (clause 6) by
       | requiring agreement to an Acceptable Use Policy.
       | 
       | EDIT: The HN title was changed, which previously made the claim.
       | But as HN user swyx pointed out, Tencent is also claiming this is
       | open source, e.g.: "The currently unveiled Hunyuan-Large
       | (Hunyuan-MoE-A52B) model is the largest open-source Transformer-
       | based MoE model in the industry".
        
         | vanguardanon wrote:
         | What is the reason for restrictions in the EU? Is it due to
         | some EU regulations?
        
           | ronsor wrote:
           | Most likely yes. I don't think companies can be blamed for
           | not wanting to subject themselves to EU regulations or
           | uncertainty.
           | 
           | Edit: Also, if you don't want to follow or deal with EU law,
           | you don't do business in the EU. People here regularly say if
           | you do business in a country, you have to follow its laws.
           | The opposite also applies.
        
             | troupo wrote:
             | Stop uncritically parroting corporate marketing bullshit.
             | 
             | There's no uncertainty. There are exactly two regulations
             | applicable here, and it's blindingly obvious why these
             | companies don't want to conform to them:
             | 
             | 1. Do not use user data without users' explicit consent
             | 
             | 2. For foundational models disclose and document your
             | training data (because, you know, user data, copyrighted
             | material etc.)
        
               | ronsor wrote:
               | I will address both points:
               | 
               | 1. No one is training on users' bank details, but if
               | you're training on the whole Internet, it's hard to be
               | sure if you've filtered out all PII, or even who is in
               | there.
               | 
               | 2. This isn't happening because no one has time for more
               | time-wasting lawsuits.
        
               | troupo wrote:
               | > No one is training on users' bank details, but if
               | you're training on the whole Internet
               | 
               | Tencent has access to more than just bank accounts.
               | 
               | In the West there's Meta that this year opted everyone in
               | their platform into training their AI.
               | 
               | > This isn't happening because no one has time for more
               | time-wasting lawsuits.
               | 
               | No, this isn't happening because a) their training data
               | is, without fail, trained on material they shouldn't have
               | willy-nilly access to and b) because they want to pretend
               | to be open source without being opensource
        
               | bilbo0s wrote:
               | ??
               | 
               | Doesn't that mean if they used data created by, (or even
               | the data _of_ ), anyone in the EU, that they would want
               | to _not_ release that model in the EU?
               | 
               | This sounds like "if an EU citizen created, or has data
               | referenced, in any piece of the data you trained from
               | then..."
               | 
               | Which, I mean, I can kind of see why US and Chinese
               | companies prefer to just not release their models in the
               | EU. How could a company ever make a guarantee satisfying
               | those requirements? It would take a massive filtering
               | effort.
        
               | troupo wrote:
               | > that they would want to not release that model in the
               | EU
               | 
               | They don't release that model in the EU, that's correct
               | 
               | > This sounds like "if an EU citizen created, or has data
               | referenced, in any piece of the data you trained from
               | then..."
               | 
               | Yes, and that should be the default for any citizen of
               | any country in the world.
               | 
               | Instead you have companies like Meta just opting everyone
               | in to their AI training dataset.
               | 
               | > I can kind of see why US and Chinese companies prefer
               | to just not release their models in the EU.
               | 
               | Companies having unfettered unrestricted access to any
               | and all data they want is not such a good thing as you
               | make it out to be
        
               | warkdarrior wrote:
               | > > This sounds like "if an EU citizen created, or has
               | data referenced, in any piece of the data you trained
               | from then..."
               | 
               | > Yes, and that should be the default for any citizen of
               | any country in the world.
               | 
               | This is a completely untenable policy. Each and every
               | piece of data in the world can be traced to one or more
               | citizens of some country. Actively getting permission for
               | every item is not feasible for any company, no matter the
               | scale of the company.
        
               | andyferris wrote:
               | I think that's kinda the point that is being made.
               | 
               | Technolgy-wise, it is clearly feasible to aggregate the
               | data to train an LLM and to release a product on that.
               | 
               | It seems that some would argue that was never legally a
               | feasible thing to do, based on the training data being
               | impossible to use legally. So, it is the existence of
               | many of these LLMs that is (legally) untenable.
               | 
               | Whether valid or not the point may be mute because, like
               | Uber, if the laws actually do forbid this use, they will
               | change as necessary to accommodate the new technology.
               | Too many "average voters" like using things such as
               | ChatGPT and it's not a hill politicians will be willing
               | to die on.
        
               | em500 wrote:
               | This seems to mirror the situation where US financial
               | regulations (FATCA) are seen as such a hassle to deal
               | with for foreign financial institutions that they'd
               | prefer to just not accept US citizens as customers.
        
           | GaggiX wrote:
           | They probably trained on data protected by privacy laws,
           | similar to Meta.
        
           | blueblimp wrote:
           | In Meta's case, the problem is that they had been given the
           | go-ahead by the EU to train on certain data, and then after
           | starting training, the EU changed its mind and told them to
           | stop.
        
         | ronsor wrote:
         | I will again ask the obligatory question: are model weights
         | even copyrightable? And if not, does the "license" still
         | matter?
        
           | parl_match wrote:
           | I doubt there will be a satisfactory answer for a long time.
        
             | killjoywashere wrote:
             | How's that NYTimes vs OpenAI lawsuit going? Last I can find
             | is things are hung up in discovery: OpenAI has requested
             | potentially a century of NYTimes reporters' notes.
             | 
             | https://news.bloomberglaw.com/ip-law/openais-aggressive-
             | cour...
        
               | bdowling wrote:
               | Half a century worth of reporters' notes might be some
               | valuable training data.
        
           | warkdarrior wrote:
           | (IANAL)
           | 
           | Model weights could be treated the same way phone books,
           | encyclopedias, and other collections of data are treated. The
           | copyright is over the collection itself, even if the
           | individual items are not copyrightable.
        
             | TMWNN wrote:
             | >phone books, encyclopedias, and other collections of data
             | are treated
             | 
             | Encyclopedias are copyrightable. Phone books are not.
        
               | ronsor wrote:
               | Encyclopedias may be collections of facts, but the
               | writing is generally creative. Phone books are literally
               | just facts. AI models are literally just facts.
        
               | roywiggins wrote:
               | What if I train an AI model on exactly one copyrighted
               | work and all it does it spit that work back out?
               | 
               | eg if I upload Marvels_Avengers.mkv.onnx and it reliably
               | reproduces the original (after all, it's just a _fact_
               | that the first byte of the original file is OxF0, etc)
        
               | ronsor wrote:
               | If the sole purpose of your model is to copy a work, then
               | that's copyright infringement.
        
               | roywiggins wrote:
               | Oh, in this case, the model can either reproduce the work
               | exactly, or it can play tic-tac-toe depending on how you
               | prompt it.
        
               | ronsor wrote:
               | We can change "sole purpose" to "primary purpose", and
               | I'd argue something that happens 50% of the time counts
               | as a primary purpose.
        
               | margalabargala wrote:
               | > AI models are literally just facts.
               | 
               | Are they, or are they collections of probabilities? If
               | they are probabilities, and those probabilities change
               | from model to model, that seems like they might be
               | copywritable.
               | 
               | If Google, OpenAI, Facebook, and Anthropic each train a
               | model from scratch on an identical training corpus, they
               | would wind up with four different models that had four
               | differing sets of weights, because they digest and
               | process the same input corpus differently.
               | 
               | That indicates to me that they are not a collection of
               | facts.
        
               | skissane wrote:
               | > Encyclopedias are copyrightable. Phone books are not.
               | 
               | It depends on the jurisdiction. Th US Supreme Court ruled
               | that phone books are not copyrightable in the 1991 case
               | _Feist Publications, Inc., v. Rural Telephone Service
               | Co._. However, that is not the law in the UK, which
               | generally follows the 1900 House of Lords decision
               | _Walter v Lane_ that found that mere  "sweat of the brow"
               | is enough to establish copyright - that case upheld a
               | publisher's copyright on a book of speeches by
               | politicians, purely on the grounds of the human effort
               | involved in transcribing them.
               | 
               | Furthermore, under its 1996 _Database Directive_ , the EU
               | introduced the _sui generis database right_ , which is a
               | legally distinct form of intellectual property from
               | copyright, but with many of the same features, protecting
               | mere aggregations of information, including phone
               | directories. The UK has retained this after Brexit.
               | However, EU directives give member states discretion over
               | the precise legal mechanism of their implementation, and
               | the UK used that discretion to make database rights a
               | subset of copyright - so, while in EU law they are a
               | technically distinct type of IP from copyright, under UK
               | law they are an application of copyright. EU law only
               | requires database rights to have a term of 15 years.
               | 
               | Do not be surprised if in the next couple of years the EU
               | comes out with a "AI Model Weights Directive"
               | establishing a " _sui generis_ AI model weights right ".
               | And I'm sure US Congress will be interested in following
               | suit. I expect OpenAI / Meta / Google / Microsoft / etc
               | will be lobbying for them to do so.
        
         | kaliqt wrote:
         | I agree, however, Meta is also guilty of this crime as well.
        
         | karaterobot wrote:
         | Hmm, in fairness I don't see where Tencent is claiming this is
         | open source (at least in this repo; I haven't checked
         | elsewhere). The title of the HN post does make the claim, and
         | that may be controversial or simply incorrect.
        
           | swyx wrote:
           | readme: https://github.com/Tencent/Tencent-Hunyuan-Large
           | 
           | > "By open-sourcing the Hunyuan-Large model"
        
         | DataDaemon wrote:
         | Who cares about EU? They are destroying themselves.
        
           | the5avage wrote:
           | Where would you go when you would live there (as a programmer
           | interested in ai)? Just asking for a friend.
        
           | Mistletoe wrote:
           | Ironically their policies are why I want to move there with
           | my American dollars. I want to live somewhere that cares
           | about my rights, not the rights of corporations.
        
             | CamperBob2 wrote:
             | That's fine, but don't complain when you lose access to
             | products and services that are widely available elsewhere.
             | 
             | In particular, restrictions on ML models will leave you
             | without access to extremely powerful resources that are
             | available to people in other countries, and to people in
             | your own country who don't mind operating outside the law.
             | Copyright maximalism is not, in fact, a good thing, and
             | neither is overbearing nanny-statism. Both will ultimately
             | disempower you.
        
         | dplavery92 wrote:
         | The title of Tencent's paper [0] as well as their homepage for
         | the model [1] each use the term "Open-Source" in the title, so
         | I think they are making the claim.
         | 
         | [0] https://arxiv.org/pdf/2411.02265 [1]
         | https://llm.hunyuan.tencent.com/
        
       | 1R053 wrote:
       | the paper with details: https://arxiv.org/pdf/2411.02265
       | 
       | They use
       | 
       | - 16 experts, of which one is activated per token
       | 
       | - 1 shared expert that is always active
       | 
       | in summary that makes around 52B active parameters per token
       | instead of the 405B of LLama3.1.
        
       | the_duke wrote:
       | > Territory" shall mean the worldwide territory, excluding the
       | territory of the European Union.
       | 
       | Anyone have some background on this?
        
         | jmole wrote:
         | I believe the EU has (or is drafting) laws about LLMs of a
         | certain size which this release would not comply with.
        
           | troupo wrote:
           | Also existing privacy laws (GDPR) and AI Act (foundational
           | models have to disclose and document their training data)
        
           | mattlutze wrote:
           | https://artificialintelligenceact.eu/high-level-summary/
           | 
           | There's many places where the model might be used which could
           | count as high-risk scenarios and require lots of controls.
           | Also, we have:                 GPAI models present systemic
           | risks when the cumulative amount of compute used for its
           | training is greater than 10^25 floating point operations
           | (FLOPs). Providers must notify the Commission if their model
           | meets this criterion within 2 weeks. The provider may present
           | arguments that, despite meeting the criteria, their model
           | does not present systemic risks. The Commission may decide on
           | its own, or via a qualified alert from the scientific panel
           | of independent experts, that a model has high impact
           | capabilities, rendering it systemic.            In addition
           | to the four obligations above, providers of GPAI models with
           | systemic risk must also:            - Perform model
           | evaluations, including conducting and documenting adversarial
           | testing to identify and mitigate systemic risk.       -
           | Assess and mitigate possible systemic risks, including their
           | sources.       - Track, document and report serious incidents
           | and possible corrective measures to the AI Office and
           | relevant national competent authorities without undue delay.
           | - Ensure an adequate level of cybersecurity protection."
           | 
           | They may not want to meet these requirements.
        
             | lcnPylGDnU4H9OF wrote:
             | > 10^25 floating point operations (FLOPs)
             | 
             | Is there a reason this number was chosen?
        
         | GaggiX wrote:
         | I imagine they trained on data that is protected by privacy
         | laws, similar to Meta.
        
       | 2OEH8eoCRo0 wrote:
       | Has anyone asked it about Tiananmen Square or Xi Jinping?
        
         | azinman2 wrote:
         | I just did, and it tells me it has no information on that
         | issue. It also responded back in Chinese to that English query,
         | which either suggests to me that the censorship instruction
         | tuning is heavily weighted towards Chinese, or the model has a
         | hard time staying in English (which I believe has been the case
         | for other Chinese LLMs in the past)
        
           | the5avage wrote:
           | I once triggered the ChatGPT censorship (by trying to
           | manipulate an image of my face) and it also responded in
           | english to a german query.
        
       | a_wild_dandan wrote:
       | The model meets/beats Llama despite having an order-of-magnitude
       | fewer active parameters (52B vs 405B). Absolutely bonkers. AI is
       | moving so fast with these breakthroughs -- synthetic data,
       | distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG,
       | curriculum learning, etc.
       | 
       | We've come so astonishingly far in like _two years_. I have no
       | idea what AI will do in another year, and it 's thrilling.
        
       | adt wrote:
       | https://lifearchitect.ai/models-table/
        
       | Tepix wrote:
       | I'm no expert on these MoE models with "a total of 389 billion
       | parameters and 52 billion active parameters". Do hobbyists stand
       | a chance of running this model (quantized) at home? For example
       | on something like a PC with 128GB (or 512GB) RAM and one or two
       | RTX 3090 24GB VRAM GPUs?
        
       ___________________________________________________________________
       (page generated 2024-11-05 23:00 UTC)