[HN Gopher] AMD Unveils Its First Small Language Model AMD-135M
___________________________________________________________________
AMD Unveils Its First Small Language Model AMD-135M
Author : figomore
Score : 116 points
Date : 2024-09-27 19:05 UTC (3 hours ago)
(HTM) web link (community.amd.com)
(TXT) w3m dump (community.amd.com)
| loufe wrote:
| It's always encouraging to see wider hardware platform
| competition for AI inference and training. Access to affordable
| and capable hardware for consumers will only benefit (I imagine)
| from increasing competition.
| diggan wrote:
| > The training code, dataset and weights for this model are open
| sourced so that developers can reproduce the model and help train
| other SLMs and LLMs.
|
| Wow, an actual open source language model (first of its kind
| [from a larger company] maybe even?), includes all you need to be
| able to recreate it from scratch. Thanks AMD!
|
| Available under this funky GitHub organization it seems:
| https://github.com/AMD-AIG-AIMA/AMD-LLM
| bubaumba wrote:
| No, it's not open source till someone can actually reproduce
| it. That's the hardest part. For now it's open weights open
| dataset. Which is not the same.
| diggan wrote:
| That's... Not how open source works? The "binary" (model
| weights) is open source and the "software" (training scripts
| + data used for training) is open source, this release is a
| real open source release. Independent reproduction is not
| needed to call something open source.
|
| Can't believe it's the second time I end up with the very
| same argument about what open source is today on HN.
| dboreham wrote:
| But wouldn't failure to achieve independent reproduction
| falsify the open claim?
|
| Similar to you publish the source for Oracle (the
| database), but nobody can build a binary from it because it
| needs magic compliers or test suites that aren't open
| source?
|
| Heck when the browser was open-sourced, there was an
| explicit test where the source was given to some dude who
| didn't work for Netscape to verify that he could actually
| make a working binary. It's a scene in the movie "Code
| Rush".
| bubaumba wrote:
| You are missing key points here. "reproduce" means produce
| the same. Not just train similar model.
|
| I can simplify the task, can you convincingly explain how
| the same model can be produced from this dataset? We can
| start simple, how you can possibly get the same weights
| after the first single iteration? I.e. the same as original
| model got. Pay attention to randomness, data selection,
| initial model state.
|
| Ok, if you can't do that. Can you explain in believable way
| how to prove that given model was trained on give dataset?
| I'm not asking you for actually doing all these things,
| that could be expensive, only to explain how it can be
| done.
|
| Strict 'open source' includes not only open weights, open
| data. It also includes the word "reproducible". It's not
| "reproduced", only "reproducible". And even this is not the
| case here.
| worewood wrote:
| How often do people expect to compile open-source code
| and get _exactly_ the same binary as the distributed one?
| I've seen this kind of restriction only on decompilation
| projects e.g. the SM64 decompilation -- where they
| deliberately compare the hashes of original vs. compiled
| binaries, as a way to verify the decompilation is
| correct.
|
| It's an unreasonable request with ordinary code, even
| more with ML where very few ones have access to the
| necessary hardware, and where in practice, it is not
| deterministic.
| e12e wrote:
| I expect that if I compile your 3d renderer, and feed it
| the same scene file you did - I get the same image?
| Sayrus wrote:
| Reproducible builds are not a requirement for open source
| software, why is it one for open source models?
| wrs wrote:
| I would say that _functionally_ reproducible builds are
| sort of inherent in the concept of "source". When builds
| are "not reproducible" that typically just means they're
| not bit-for-bit identical, not that they don't produce
| the same output for a given input.
| wrs wrote:
| The interesting part of the product we're taking about
| (that is, the equivalent of the executable binary of an
| ordinary software product) is the weights. The "source" is
| not sufficient to "recompile" the product (i.e., recreate
| the weights). Therefore, while the source you got is open,
| you didn't get _all_ the source to the thing that was
| supposedly "open source".
|
| It's like if I said I open-sourced the Matrix trilogy and
| only gave you the DVD image and the source to the DVD
| decoder.
|
| (Edit: Sorry, I replied to the wrong comment. I'm talking
| primarily about the typical sort of release we see, not
| this one which is a lot closer to actually open.)
| littlestymaar wrote:
| > The "source" is not sufficient to "recompile" the
| product (i.e., recreate the weights). Therefore, while
| the source you got is open, you didn't get all the source
| to the thing that was supposedly "open source".
|
| What's missing?
| wrs wrote:
| Well, I'm not experienced in training full-sized LLMs,
| and it's conceivable that in this particular case the
| training process is simple enough that nothing is
| missing. That would be a rarity, though. But see my edit
| above -- I'm not actually reacting to this release when I
| say that.
| Jabrov wrote:
| What's the difference?
| avaldez_ wrote:
| Reproducibility? I mean what's the point of an open
| technology nobody knows if it works or not.
| jerrygenser wrote:
| This would be another example of open source. Not from such a
| large company but a good reference including code, data,
| weights, etc.
|
| https://allenai.org/olmo
| wrs wrote:
| We (developers and tech managers) really need to hold the line
| on this terminology. This is a full actual open source LLM. The
| usual "open inference" model is not.
| boulos wrote:
| I assume by "open inference" you mostly mean "weights
| available"?
| wrs wrote:
| Usually "open source" for an LLM means you get the weights
| and the inference code, which I've started calling "open
| inference". It's certainly good and useful, but it's not
| the actual source of the model.
|
| I find people get into silly arguments about the
| terminology because they're focused on whether the "source"
| is "open" and not on what the "source" is actually the
| source of.
|
| "Weights available" indicates even the weights aren't
| "open" in the usual software meaning of the term, as they
| typically come with restrictive licenses (more restrictive
| than copyleft or attribution).
| wmf wrote:
| You're not wrong, but if you come up with a definition that
| no one is willing to meet you're just making that definition
| irrelevant.
| GeekyBear wrote:
| > Wow, an actual open source language model (first of its kind
|
| Apple research has previously released another example of a
| model with open training code, data, and weights, but their
| model was sized for running inference workloads on mobile
| devices.
|
| However, Apple has a mobile device line of business and AMD has
| an enterprise AI accelerator line of business, so they are both
| doing work relevant to their bottom line.
| kypro wrote:
| Smart move from AMD. Helps develop an ecosystem around their
| tech and for their GPUs.
| benterix wrote:
| I'm happy to see a truly open source model.
|
| Actually, AMD has excellent reasons to make this kind of
| development and I hope they continue.
| craftkiller wrote:
| I see multiple mentions of NPU on this page, but its still not
| clear to me: is this something that can finally use the NPU on my
| processor?
| n_ary wrote:
| Now this here is the beginning on real innovation of AI. With AMD
| coming in(albeit late and slowly), meta with LLama improving, we
| will soon see some real adaptation and development in next few
| thousand days. At this moment, I see OAI as the yahoo of the pre-
| Google era.
| rsolva wrote:
| Can this model run on ollama?
| luyu_wu wrote:
| The section on speculative execution is interesting. "This
| approach allows each forward pass to generate multiple tokens
| without compromising performance, thereby significantly reducing
| memory access consumption, and enabling several orders of
| magnitude speed improvements."
|
| Does anyone know if the "several orders of magnitude speed
| improvement" is accurate? I'm doubtful.
|
| Very interesting though! I'll be playing around with this on the
| weekend!
| highfrequency wrote:
| Looks like they are using sixteen $13k GPUs [1] (around $210k
| hardware) for 6 days of training.
|
| Anyone know the recommended cloud provider and equivalent rental
| price?
|
| [1]
| https://www.wiredzone.com/shop/product/10025451-supermicro-g...
| wmf wrote:
| Hot Aisle seems to the (only?) place to rent AMD. (Ryan, please
| don't spam this thread. It's not a good look.)
| Decabytes wrote:
| Since most people can't run these LLMs locally, I wonder what a
| model would look like where we have hyper tuned models for
| specific purposes, IE a model for code, a model for prose, etc.
| you have a director model that interprets what downstream model
| should be used and then it runs that. That way you can run the
| model locally, without needing beefy GPUs. It's a trade off of
| using more disk space vs needing more vram
| wmf wrote:
| The whole point of _this_ model is that it 's so tiny that even
| a weak RPi could run it. Apple has also done some interesting
| work with a common <4B base model that is customized with
| different LoRAs for different purposes.
| Philpax wrote:
| You're essentially describing Apple Intelligence :-)
|
| https://machinelearning.apple.com/research/introducing-apple...
| (see Model Adaptation)
___________________________________________________________________
(page generated 2024-09-27 23:00 UTC)