[HN Gopher] Tencent Hunyuan-Large
___________________________________________________________________
Tencent Hunyuan-Large
Author : helloericsf
Score : 91 points
Date : 2024-11-05 18:52 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| helloericsf wrote:
| - 389 billion parameters and 52 billion activation parameters,
| capable of handling up to 256K tokens. - outperforms LLama3.1-70B
| and exhibits comparable performance when compared to the
| significantly larger LLama3.1-405B model.
| Etheryte wrote:
| It's a bit funny to call the 405B reference "significantly
| larger" than their 389B, while highlighting the fact that their
| 389B outperforms the 70B.
| klipt wrote:
| It's a whole 4% smaller!
| rose_ann_ wrote:
| MoE model with 52 billion activated parameters means its more
| comparable to a (dense) 70b model and not a dense 405b model
| HPsquared wrote:
| Does this mean it runs faster or better on multiple GPUs?
| chessgecko wrote:
| For decode steps it depends on the number of inputs you
| run at a time. If your batch size is 1 then it runs in
| line with active params, then as you get to like batch
| size 8 it runs in line with all params, then as you
| increase to 128ish it runs like the active params again.
|
| For the context encode it's always close to as fast as a
| model with a similar number of active params.
|
| For running on your own the issue is going to be fitting
| all the params on your gpu. If you're loading off disk
| anyways this will be faster but if this forces you to put
| stuff on disk it will be much slower.
| phkahler wrote:
| >> MoE model with 52 billion activated parameters means its
| more comparable to a (dense) 70b model and not a dense 405b
| model
|
| Only when talking about how fast it can produce output.
| From a capability point of view it makes sense to compare
| the larger number of parameters. I suppose there's also a
| "total storage" comparison too, since didn't they say this
| is 8bit model weights, where llama is 16bit?
| eptcyka wrote:
| Definitely not trained on Nvidia or AMD GPUs.
| acchow wrote:
| How do you know this?
|
| Apparently 20% of Nvidia's quarterly revenue is booked in
| Singapore where shell companies divert product to China:
| https://news.ycombinator.com/item?id=42048065
| smnrg wrote:
| Sarcasm is a valid theory.
| azinman2 wrote:
| I assume it was missing /s
| rb2k_ wrote:
| The readme mentioned H20 GPUs. Nvidia's "China compatible" card
| (41% Fewer Cores & 28% Lower Performance Versus Top Hopper H100
| Config)
| 1R053 wrote:
| you can get a long way on something with 41% less performance
| than your favorite supercar...
| mrob wrote:
| Not open source. Even if we accept model weights as source code,
| which is highly dubious, this clearly violates clauses 5 and 6 of
| the Open Source Definition. It discriminates between users
| (clause 5) by refusing to grant any rights to users in the
| European Union, and it discriminates between uses (clause 6) by
| requiring agreement to an Acceptable Use Policy.
|
| EDIT: The HN title was changed, which previously made the claim.
| But as HN user swyx pointed out, Tencent is also claiming this is
| open source, e.g.: "The currently unveiled Hunyuan-Large
| (Hunyuan-MoE-A52B) model is the largest open-source Transformer-
| based MoE model in the industry".
| vanguardanon wrote:
| What is the reason for restrictions in the EU? Is it due to
| some EU regulations?
| ronsor wrote:
| Most likely yes. I don't think companies can be blamed for
| not wanting to subject themselves to EU regulations or
| uncertainty.
|
| Edit: Also, if you don't want to follow or deal with EU law,
| you don't do business in the EU. People here regularly say if
| you do business in a country, you have to follow its laws.
| The opposite also applies.
| troupo wrote:
| Stop uncritically parroting corporate marketing bullshit.
|
| There's no uncertainty. There are exactly two regulations
| applicable here, and it's blindingly obvious why these
| companies don't want to conform to them:
|
| 1. Do not use user data without users' explicit consent
|
| 2. For foundational models disclose and document your
| training data (because, you know, user data, copyrighted
| material etc.)
| ronsor wrote:
| I will address both points:
|
| 1. No one is training on users' bank details, but if
| you're training on the whole Internet, it's hard to be
| sure if you've filtered out all PII, or even who is in
| there.
|
| 2. This isn't happening because no one has time for more
| time-wasting lawsuits.
| troupo wrote:
| > No one is training on users' bank details, but if
| you're training on the whole Internet
|
| Tencent has access to more than just bank accounts.
|
| In the West there's Meta that this year opted everyone in
| their platform into training their AI.
|
| > This isn't happening because no one has time for more
| time-wasting lawsuits.
|
| No, this isn't happening because a) their training data
| is, without fail, trained on material they shouldn't have
| willy-nilly access to and b) because they want to pretend
| to be open source without being opensource
| bilbo0s wrote:
| ??
|
| Doesn't that mean if they used data created by, (or even
| the data _of_ ), anyone in the EU, that they would want
| to _not_ release that model in the EU?
|
| This sounds like "if an EU citizen created, or has data
| referenced, in any piece of the data you trained from
| then..."
|
| Which, I mean, I can kind of see why US and Chinese
| companies prefer to just not release their models in the
| EU. How could a company ever make a guarantee satisfying
| those requirements? It would take a massive filtering
| effort.
| troupo wrote:
| > that they would want to not release that model in the
| EU
|
| They don't release that model in the EU, that's correct
|
| > This sounds like "if an EU citizen created, or has data
| referenced, in any piece of the data you trained from
| then..."
|
| Yes, and that should be the default for any citizen of
| any country in the world.
|
| Instead you have companies like Meta just opting everyone
| in to their AI training dataset.
|
| > I can kind of see why US and Chinese companies prefer
| to just not release their models in the EU.
|
| Companies having unfettered unrestricted access to any
| and all data they want is not such a good thing as you
| make it out to be
| warkdarrior wrote:
| > > This sounds like "if an EU citizen created, or has
| data referenced, in any piece of the data you trained
| from then..."
|
| > Yes, and that should be the default for any citizen of
| any country in the world.
|
| This is a completely untenable policy. Each and every
| piece of data in the world can be traced to one or more
| citizens of some country. Actively getting permission for
| every item is not feasible for any company, no matter the
| scale of the company.
| andyferris wrote:
| I think that's kinda the point that is being made.
|
| Technolgy-wise, it is clearly feasible to aggregate the
| data to train an LLM and to release a product on that.
|
| It seems that some would argue that was never legally a
| feasible thing to do, based on the training data being
| impossible to use legally. So, it is the existence of
| many of these LLMs that is (legally) untenable.
|
| Whether valid or not the point may be mute because, like
| Uber, if the laws actually do forbid this use, they will
| change as necessary to accommodate the new technology.
| Too many "average voters" like using things such as
| ChatGPT and it's not a hill politicians will be willing
| to die on.
| em500 wrote:
| This seems to mirror the situation where US financial
| regulations (FATCA) are seen as such a hassle to deal
| with for foreign financial institutions that they'd
| prefer to just not accept US citizens as customers.
| GaggiX wrote:
| They probably trained on data protected by privacy laws,
| similar to Meta.
| blueblimp wrote:
| In Meta's case, the problem is that they had been given the
| go-ahead by the EU to train on certain data, and then after
| starting training, the EU changed its mind and told them to
| stop.
| ronsor wrote:
| I will again ask the obligatory question: are model weights
| even copyrightable? And if not, does the "license" still
| matter?
| parl_match wrote:
| I doubt there will be a satisfactory answer for a long time.
| killjoywashere wrote:
| How's that NYTimes vs OpenAI lawsuit going? Last I can find
| is things are hung up in discovery: OpenAI has requested
| potentially a century of NYTimes reporters' notes.
|
| https://news.bloomberglaw.com/ip-law/openais-aggressive-
| cour...
| bdowling wrote:
| Half a century worth of reporters' notes might be some
| valuable training data.
| warkdarrior wrote:
| (IANAL)
|
| Model weights could be treated the same way phone books,
| encyclopedias, and other collections of data are treated. The
| copyright is over the collection itself, even if the
| individual items are not copyrightable.
| TMWNN wrote:
| >phone books, encyclopedias, and other collections of data
| are treated
|
| Encyclopedias are copyrightable. Phone books are not.
| ronsor wrote:
| Encyclopedias may be collections of facts, but the
| writing is generally creative. Phone books are literally
| just facts. AI models are literally just facts.
| roywiggins wrote:
| What if I train an AI model on exactly one copyrighted
| work and all it does it spit that work back out?
|
| eg if I upload Marvels_Avengers.mkv.onnx and it reliably
| reproduces the original (after all, it's just a _fact_
| that the first byte of the original file is OxF0, etc)
| ronsor wrote:
| If the sole purpose of your model is to copy a work, then
| that's copyright infringement.
| roywiggins wrote:
| Oh, in this case, the model can either reproduce the work
| exactly, or it can play tic-tac-toe depending on how you
| prompt it.
| ronsor wrote:
| We can change "sole purpose" to "primary purpose", and
| I'd argue something that happens 50% of the time counts
| as a primary purpose.
| margalabargala wrote:
| > AI models are literally just facts.
|
| Are they, or are they collections of probabilities? If
| they are probabilities, and those probabilities change
| from model to model, that seems like they might be
| copywritable.
|
| If Google, OpenAI, Facebook, and Anthropic each train a
| model from scratch on an identical training corpus, they
| would wind up with four different models that had four
| differing sets of weights, because they digest and
| process the same input corpus differently.
|
| That indicates to me that they are not a collection of
| facts.
| skissane wrote:
| > Encyclopedias are copyrightable. Phone books are not.
|
| It depends on the jurisdiction. Th US Supreme Court ruled
| that phone books are not copyrightable in the 1991 case
| _Feist Publications, Inc., v. Rural Telephone Service
| Co._. However, that is not the law in the UK, which
| generally follows the 1900 House of Lords decision
| _Walter v Lane_ that found that mere "sweat of the brow"
| is enough to establish copyright - that case upheld a
| publisher's copyright on a book of speeches by
| politicians, purely on the grounds of the human effort
| involved in transcribing them.
|
| Furthermore, under its 1996 _Database Directive_ , the EU
| introduced the _sui generis database right_ , which is a
| legally distinct form of intellectual property from
| copyright, but with many of the same features, protecting
| mere aggregations of information, including phone
| directories. The UK has retained this after Brexit.
| However, EU directives give member states discretion over
| the precise legal mechanism of their implementation, and
| the UK used that discretion to make database rights a
| subset of copyright - so, while in EU law they are a
| technically distinct type of IP from copyright, under UK
| law they are an application of copyright. EU law only
| requires database rights to have a term of 15 years.
|
| Do not be surprised if in the next couple of years the EU
| comes out with a "AI Model Weights Directive"
| establishing a " _sui generis_ AI model weights right ".
| And I'm sure US Congress will be interested in following
| suit. I expect OpenAI / Meta / Google / Microsoft / etc
| will be lobbying for them to do so.
| kaliqt wrote:
| I agree, however, Meta is also guilty of this crime as well.
| karaterobot wrote:
| Hmm, in fairness I don't see where Tencent is claiming this is
| open source (at least in this repo; I haven't checked
| elsewhere). The title of the HN post does make the claim, and
| that may be controversial or simply incorrect.
| swyx wrote:
| readme: https://github.com/Tencent/Tencent-Hunyuan-Large
|
| > "By open-sourcing the Hunyuan-Large model"
| DataDaemon wrote:
| Who cares about EU? They are destroying themselves.
| the5avage wrote:
| Where would you go when you would live there (as a programmer
| interested in ai)? Just asking for a friend.
| Mistletoe wrote:
| Ironically their policies are why I want to move there with
| my American dollars. I want to live somewhere that cares
| about my rights, not the rights of corporations.
| CamperBob2 wrote:
| That's fine, but don't complain when you lose access to
| products and services that are widely available elsewhere.
|
| In particular, restrictions on ML models will leave you
| without access to extremely powerful resources that are
| available to people in other countries, and to people in
| your own country who don't mind operating outside the law.
| Copyright maximalism is not, in fact, a good thing, and
| neither is overbearing nanny-statism. Both will ultimately
| disempower you.
| dplavery92 wrote:
| The title of Tencent's paper [0] as well as their homepage for
| the model [1] each use the term "Open-Source" in the title, so
| I think they are making the claim.
|
| [0] https://arxiv.org/pdf/2411.02265 [1]
| https://llm.hunyuan.tencent.com/
| 1R053 wrote:
| the paper with details: https://arxiv.org/pdf/2411.02265
|
| They use
|
| - 16 experts, of which one is activated per token
|
| - 1 shared expert that is always active
|
| in summary that makes around 52B active parameters per token
| instead of the 405B of LLama3.1.
| the_duke wrote:
| > Territory" shall mean the worldwide territory, excluding the
| territory of the European Union.
|
| Anyone have some background on this?
| jmole wrote:
| I believe the EU has (or is drafting) laws about LLMs of a
| certain size which this release would not comply with.
| troupo wrote:
| Also existing privacy laws (GDPR) and AI Act (foundational
| models have to disclose and document their training data)
| mattlutze wrote:
| https://artificialintelligenceact.eu/high-level-summary/
|
| There's many places where the model might be used which could
| count as high-risk scenarios and require lots of controls.
| Also, we have: GPAI models present systemic
| risks when the cumulative amount of compute used for its
| training is greater than 10^25 floating point operations
| (FLOPs). Providers must notify the Commission if their model
| meets this criterion within 2 weeks. The provider may present
| arguments that, despite meeting the criteria, their model
| does not present systemic risks. The Commission may decide on
| its own, or via a qualified alert from the scientific panel
| of independent experts, that a model has high impact
| capabilities, rendering it systemic. In addition
| to the four obligations above, providers of GPAI models with
| systemic risk must also: - Perform model
| evaluations, including conducting and documenting adversarial
| testing to identify and mitigate systemic risk. -
| Assess and mitigate possible systemic risks, including their
| sources. - Track, document and report serious incidents
| and possible corrective measures to the AI Office and
| relevant national competent authorities without undue delay.
| - Ensure an adequate level of cybersecurity protection."
|
| They may not want to meet these requirements.
| lcnPylGDnU4H9OF wrote:
| > 10^25 floating point operations (FLOPs)
|
| Is there a reason this number was chosen?
| GaggiX wrote:
| I imagine they trained on data that is protected by privacy
| laws, similar to Meta.
| 2OEH8eoCRo0 wrote:
| Has anyone asked it about Tiananmen Square or Xi Jinping?
| azinman2 wrote:
| I just did, and it tells me it has no information on that
| issue. It also responded back in Chinese to that English query,
| which either suggests to me that the censorship instruction
| tuning is heavily weighted towards Chinese, or the model has a
| hard time staying in English (which I believe has been the case
| for other Chinese LLMs in the past)
| the5avage wrote:
| I once triggered the ChatGPT censorship (by trying to
| manipulate an image of my face) and it also responded in
| english to a german query.
| a_wild_dandan wrote:
| The model meets/beats Llama despite having an order-of-magnitude
| fewer active parameters (52B vs 405B). Absolutely bonkers. AI is
| moving so fast with these breakthroughs -- synthetic data,
| distillation, alt. architectures (e.g. MoE/SSM), LoRA, RAG,
| curriculum learning, etc.
|
| We've come so astonishingly far in like _two years_. I have no
| idea what AI will do in another year, and it 's thrilling.
| adt wrote:
| https://lifearchitect.ai/models-table/
| Tepix wrote:
| I'm no expert on these MoE models with "a total of 389 billion
| parameters and 52 billion active parameters". Do hobbyists stand
| a chance of running this model (quantized) at home? For example
| on something like a PC with 128GB (or 512GB) RAM and one or two
| RTX 3090 24GB VRAM GPUs?
___________________________________________________________________
(page generated 2024-11-05 23:00 UTC)