[HN Gopher] AMD Unveils Its First Small Language Model AMD-135M
       ___________________________________________________________________
        
       AMD Unveils Its First Small Language Model AMD-135M
        
       Author : figomore
       Score  : 116 points
       Date   : 2024-09-27 19:05 UTC (3 hours ago)
        
 (HTM) web link (community.amd.com)
 (TXT) w3m dump (community.amd.com)
        
       | loufe wrote:
       | It's always encouraging to see wider hardware platform
       | competition for AI inference and training. Access to affordable
       | and capable hardware for consumers will only benefit (I imagine)
       | from increasing competition.
        
       | diggan wrote:
       | > The training code, dataset and weights for this model are open
       | sourced so that developers can reproduce the model and help train
       | other SLMs and LLMs.
       | 
       | Wow, an actual open source language model (first of its kind
       | [from a larger company] maybe even?), includes all you need to be
       | able to recreate it from scratch. Thanks AMD!
       | 
       | Available under this funky GitHub organization it seems:
       | https://github.com/AMD-AIG-AIMA/AMD-LLM
        
         | bubaumba wrote:
         | No, it's not open source till someone can actually reproduce
         | it. That's the hardest part. For now it's open weights open
         | dataset. Which is not the same.
        
           | diggan wrote:
           | That's... Not how open source works? The "binary" (model
           | weights) is open source and the "software" (training scripts
           | + data used for training) is open source, this release is a
           | real open source release. Independent reproduction is not
           | needed to call something open source.
           | 
           | Can't believe it's the second time I end up with the very
           | same argument about what open source is today on HN.
        
             | dboreham wrote:
             | But wouldn't failure to achieve independent reproduction
             | falsify the open claim?
             | 
             | Similar to you publish the source for Oracle (the
             | database), but nobody can build a binary from it because it
             | needs magic compliers or test suites that aren't open
             | source?
             | 
             | Heck when the browser was open-sourced, there was an
             | explicit test where the source was given to some dude who
             | didn't work for Netscape to verify that he could actually
             | make a working binary. It's a scene in the movie "Code
             | Rush".
        
             | bubaumba wrote:
             | You are missing key points here. "reproduce" means produce
             | the same. Not just train similar model.
             | 
             | I can simplify the task, can you convincingly explain how
             | the same model can be produced from this dataset? We can
             | start simple, how you can possibly get the same weights
             | after the first single iteration? I.e. the same as original
             | model got. Pay attention to randomness, data selection,
             | initial model state.
             | 
             | Ok, if you can't do that. Can you explain in believable way
             | how to prove that given model was trained on give dataset?
             | I'm not asking you for actually doing all these things,
             | that could be expensive, only to explain how it can be
             | done.
             | 
             | Strict 'open source' includes not only open weights, open
             | data. It also includes the word "reproducible". It's not
             | "reproduced", only "reproducible". And even this is not the
             | case here.
        
               | worewood wrote:
               | How often do people expect to compile open-source code
               | and get _exactly_ the same binary as the distributed one?
               | I've seen this kind of restriction only on decompilation
               | projects e.g. the SM64 decompilation -- where they
               | deliberately compare the hashes of original vs. compiled
               | binaries, as a way to verify the decompilation is
               | correct.
               | 
               | It's an unreasonable request with ordinary code, even
               | more with ML where very few ones have access to the
               | necessary hardware, and where in practice, it is not
               | deterministic.
        
               | e12e wrote:
               | I expect that if I compile your 3d renderer, and feed it
               | the same scene file you did - I get the same image?
        
               | Sayrus wrote:
               | Reproducible builds are not a requirement for open source
               | software, why is it one for open source models?
        
               | wrs wrote:
               | I would say that _functionally_ reproducible builds are
               | sort of inherent in the concept of "source". When builds
               | are "not reproducible" that typically just means they're
               | not bit-for-bit identical, not that they don't produce
               | the same output for a given input.
        
             | wrs wrote:
             | The interesting part of the product we're taking about
             | (that is, the equivalent of the executable binary of an
             | ordinary software product) is the weights. The "source" is
             | not sufficient to "recompile" the product (i.e., recreate
             | the weights). Therefore, while the source you got is open,
             | you didn't get _all_ the source to the thing that was
             | supposedly "open source".
             | 
             | It's like if I said I open-sourced the Matrix trilogy and
             | only gave you the DVD image and the source to the DVD
             | decoder.
             | 
             | (Edit: Sorry, I replied to the wrong comment. I'm talking
             | primarily about the typical sort of release we see, not
             | this one which is a lot closer to actually open.)
        
               | littlestymaar wrote:
               | > The "source" is not sufficient to "recompile" the
               | product (i.e., recreate the weights). Therefore, while
               | the source you got is open, you didn't get all the source
               | to the thing that was supposedly "open source".
               | 
               | What's missing?
        
               | wrs wrote:
               | Well, I'm not experienced in training full-sized LLMs,
               | and it's conceivable that in this particular case the
               | training process is simple enough that nothing is
               | missing. That would be a rarity, though. But see my edit
               | above -- I'm not actually reacting to this release when I
               | say that.
        
           | Jabrov wrote:
           | What's the difference?
        
             | avaldez_ wrote:
             | Reproducibility? I mean what's the point of an open
             | technology nobody knows if it works or not.
        
         | jerrygenser wrote:
         | This would be another example of open source. Not from such a
         | large company but a good reference including code, data,
         | weights, etc.
         | 
         | https://allenai.org/olmo
        
         | wrs wrote:
         | We (developers and tech managers) really need to hold the line
         | on this terminology. This is a full actual open source LLM. The
         | usual "open inference" model is not.
        
           | boulos wrote:
           | I assume by "open inference" you mostly mean "weights
           | available"?
        
             | wrs wrote:
             | Usually "open source" for an LLM means you get the weights
             | and the inference code, which I've started calling "open
             | inference". It's certainly good and useful, but it's not
             | the actual source of the model.
             | 
             | I find people get into silly arguments about the
             | terminology because they're focused on whether the "source"
             | is "open" and not on what the "source" is actually the
             | source of.
             | 
             | "Weights available" indicates even the weights aren't
             | "open" in the usual software meaning of the term, as they
             | typically come with restrictive licenses (more restrictive
             | than copyleft or attribution).
        
           | wmf wrote:
           | You're not wrong, but if you come up with a definition that
           | no one is willing to meet you're just making that definition
           | irrelevant.
        
         | GeekyBear wrote:
         | > Wow, an actual open source language model (first of its kind
         | 
         | Apple research has previously released another example of a
         | model with open training code, data, and weights, but their
         | model was sized for running inference workloads on mobile
         | devices.
         | 
         | However, Apple has a mobile device line of business and AMD has
         | an enterprise AI accelerator line of business, so they are both
         | doing work relevant to their bottom line.
        
         | kypro wrote:
         | Smart move from AMD. Helps develop an ecosystem around their
         | tech and for their GPUs.
        
       | benterix wrote:
       | I'm happy to see a truly open source model.
       | 
       | Actually, AMD has excellent reasons to make this kind of
       | development and I hope they continue.
        
       | craftkiller wrote:
       | I see multiple mentions of NPU on this page, but its still not
       | clear to me: is this something that can finally use the NPU on my
       | processor?
        
       | n_ary wrote:
       | Now this here is the beginning on real innovation of AI. With AMD
       | coming in(albeit late and slowly), meta with LLama improving, we
       | will soon see some real adaptation and development in next few
       | thousand days. At this moment, I see OAI as the yahoo of the pre-
       | Google era.
        
       | rsolva wrote:
       | Can this model run on ollama?
        
       | luyu_wu wrote:
       | The section on speculative execution is interesting. "This
       | approach allows each forward pass to generate multiple tokens
       | without compromising performance, thereby significantly reducing
       | memory access consumption, and enabling several orders of
       | magnitude speed improvements."
       | 
       | Does anyone know if the "several orders of magnitude speed
       | improvement" is accurate? I'm doubtful.
       | 
       | Very interesting though! I'll be playing around with this on the
       | weekend!
        
       | highfrequency wrote:
       | Looks like they are using sixteen $13k GPUs [1] (around $210k
       | hardware) for 6 days of training.
       | 
       | Anyone know the recommended cloud provider and equivalent rental
       | price?
       | 
       | [1]
       | https://www.wiredzone.com/shop/product/10025451-supermicro-g...
        
         | wmf wrote:
         | Hot Aisle seems to the (only?) place to rent AMD. (Ryan, please
         | don't spam this thread. It's not a good look.)
        
       | Decabytes wrote:
       | Since most people can't run these LLMs locally, I wonder what a
       | model would look like where we have hyper tuned models for
       | specific purposes, IE a model for code, a model for prose, etc.
       | you have a director model that interprets what downstream model
       | should be used and then it runs that. That way you can run the
       | model locally, without needing beefy GPUs. It's a trade off of
       | using more disk space vs needing more vram
        
         | wmf wrote:
         | The whole point of _this_ model is that it 's so tiny that even
         | a weak RPi could run it. Apple has also done some interesting
         | work with a common <4B base model that is customized with
         | different LoRAs for different purposes.
        
         | Philpax wrote:
         | You're essentially describing Apple Intelligence :-)
         | 
         | https://machinelearning.apple.com/research/introducing-apple...
         | (see Model Adaptation)
        
       ___________________________________________________________________
       (page generated 2024-09-27 23:00 UTC)