[HN Gopher] AI engineers claim new algorithm reduces AI power co...
___________________________________________________________________
AI engineers claim new algorithm reduces AI power consumption by
95%
Author : ferriswil
Score : 325 points
Date : 2024-10-19 18:03 UTC (1 days ago)
(HTM) web link (www.tomshardware.com)
(TXT) w3m dump (www.tomshardware.com)
| remexre wrote:
| Isn't this just taking advantage of "log(x) + log(y) = log(xy)"?
| The IEEE754 floating-point representation stores floats as sign,
| mantissa, and exponent -- ignore the first two (you quantitized
| anyway, right?), and the exponent is just an integer storing
| log() of the float.
| convolvatron wrote:
| yes. and the next question is 'ok, how do we add'
| dietr1ch wrote:
| I guess that if the bulk of the computation goes into the
| multiplications, you can work in the log-space and simply
| sum, and when the time comes to actually do a sum on the
| original space you can go back and sum.
| a-loup-e wrote:
| Not sure how well that would work if you're often adding
| bias after every layer
| kps wrote:
| Yes. I haven't yet read this paper to see what exactly it
| says is new, but I've definitely seen log-based
| representations under development before now. ( _More_ log-
| based than the regular floating-point exponent, that is. I
| don 't actually know the argument behind the exponent-and-
| mantissa form that's been pretty much universal even before
| IEEE754, other than that it mimics decimal scientific
| notation.)
| mota7 wrote:
| Not quite: It's taking advantage of (1+a)(1+b) = 1 + a + b +
| ab. And where a and b are both small-ish, ab is really small
| and can just be ignored.
|
| So it turns the (1+a)(1+b) into 1+a+b. Which is definitely not
| the same! But it turns out, machine guessing apparently doesn't
| care much about the difference.
| amelius wrote:
| You might then as well replace the multiplication by the
| addition in the original network. In that case you're not
| even approximating anything.
|
| Am I missing something?
| dotnet00 wrote:
| They're applying that simplification to the exponent bits
| of an 8 bit float. The range is so small that the
| approximation to multiplication is going to be pretty
| close.
| tommiegannert wrote:
| Plus the 2^-l(m) correction term.
|
| Feels like multiplication shouldn't be needed for
| convergence, just monotonicity? I wonder how well it would
| perform if the model was actually trained the same way.
| dsv3099i wrote:
| This trick is used a ton when doing hand calculation in
| engineering as well. It can save a lot of work.
|
| You're going to have tolerance on the result anyway, so
| what's a little more error. :)
| _aavaa_ wrote:
| Original discussion of the preprint:
| https://news.ycombinator.com/item?id=41784591
| codethief wrote:
| Ahh, there it is! I was sure we had discussed this paper
| before.
| djoldman wrote:
| https://arxiv.org/abs/2410.00907
|
| ABSTRACT
|
| Large neural networks spend most computation on floating point
| tensor multiplications. In this work, we find that a floating
| point multiplier can be approximated by one integer adder with
| high precision. We propose the linear-complexity multiplication
| (L-Mul) algorithm that approximates floating point number
| multiplication with integer addition operations. The new
| algorithm costs significantly less computation resource than
| 8-bit floating point multiplication but achieves higher
| precision. Compared to 8-bit floating point multiplications, the
| proposed method achieves higher precision but consumes
| significantly less bit-level computation. Since multiplying
| floating point numbers requires substantially higher energy
| compared to integer addition operations, applying the L-Mul
| operation in tensor processing hardware can potentially reduce
| 95% energy cost by elementwise floating point tensor
| multiplications and 80% energy cost of dot products. We
| calculated the theoretical error expectation of L-Mul, and
| evaluated the algorithm on a wide range of textual, visual, and
| symbolic tasks, including natural language understanding,
| structural reasoning, mathematics, and commonsense question
| answering. Our numerical analysis experiments agree with the
| theoretical error estimation, which indicates that L-Mul with
| 4-bit mantissa achieves comparable precision as float8 e4m3
| multiplications, and L-Mul with 3-bit mantissa outperforms float8
| e5m2. Evaluation results on popular benchmarks show that directly
| applying L-Mul to the attention mechanism is almost lossless. We
| further show that replacing all floating point multiplications
| with 3-bit mantissa L-Mul in a transformer model achieves
| equivalent precision as using float8 e4m3 as accumulation
| precision in both fine-tuning and inference.
| onlyrealcuzzo wrote:
| Does this mean you can train efficiently without GPUs?
|
| Presumably there will be a lot of interest.
| crazygringo wrote:
| No. But it does potentially mean that either current or
| future-tweaked GPUs could run a lot more efficiently --
| meaning much faster or with much less energy consumption.
|
| You still need the GPU parallelism though.
| fuzzfactor wrote:
| I had a feeling it had to be something like massive waste
| due to a misguided feature of the algorithms that shouldn't
| have been there in the first place.
|
| Once the "math is done" quite likely it would have paid off
| better than most investments for the top people to have
| spent a few short years working with grossly underpowered
| hardware until they could come up with amazing results
| there before scaling up. Rather than grossly overpowered
| hardware before there was even deep understanding of the
| underlying processes.
|
| When you think about it, what we have seen from the latest
| ultra-high-powered "thinking" machines is truly so
| impressive. But if you are trying to fool somebody into
| believing that it's a real person it's still not "quite"
| there.
|
| Maybe a good benchmark would be to take a regular PC, and
| without reliance on AI just pull out all the stops and put
| all the effort into fakery itself. No holds barred, any
| trick you can think of. See what the electronics is capable
| of this way. There are some smart engineers, this would
| only take a few years but looks like it would have been a
| lot more affordable.
|
| Then with the same hardware if an AI alternative is not as
| convincing, something has got to be wrong.
|
| It's good to find out this type of thing before you go
| overboard.
|
| Regardless of speed or power, I never could have gotten an
| 8-bit computer to match the output of a 32-bit floating-
| point algorithm by using floating-point myself. Integers
| all the way and place the decimal where it's supposed to be
| when you're done.
|
| Once it's really figured out, how do you think it would
| feel being the one paying the electric bills up until now?
| jimmaswell wrote:
| Faster progress was absolutely worth it. Spending years
| agonizing over theory to save a bit of electric would
| have been a massive disservice to the world.
| BolexNOLA wrote:
| "A bit"?
| bartread wrote:
| Yes, a large amount for - in the grand scheme of things -
| a short period of time (i.e., a quantity of energy usage
| in an intense spike that will be dwarfed by energy usage
| over time) can accurately be described as "a bit".
|
| Of course, the impact is that AI will continue to become
| cheaper to use, and induced demand will continue the
| feedback loop driving the market as a result.
| rossjudson wrote:
| You're sort of presuming that LLMs are going to be a
| massive _service_ to the world there, aren 't you? I
| think the jury is still out on that one.
| jimmaswell wrote:
| They already have been. Even just in programming, even
| just Copilot has been a life changing productivity
| booster.
| wruza wrote:
| Are you sure it's a life changing productivity booster?
| Sometimes I look at my projects and wonder how would I
| explain it to an LLM what this code should have done if
| it didn't exist yet. Must be a shitton of boilerplate
| programming for copilot to be a life-changing experience.
| AYBABTME wrote:
| You haven't used them enough. Everytime an LLM reduces my
| search from 1min to 5s, the LLM pays.
|
| Just summary features: save me 20min of reading a
| transcript, turn it into 20s. That's a huge enabler.
| mecsred wrote:
| If 20 mins of informations can legitimately be condensed
| into 20 seconds, it sounds like the original wasn't worth
| reading in the first place. Could have skipped the llm
| entirely.
| bostik wrote:
| I upvoted you, because I think you have a valid point.
| The tone is unnecessarily aggressive though.
|
| Effective and information-dense communication is _really_
| hard. That doesn 't mean we should just accept the
| useless fluff surrounding the actual information and/or
| analysis. People could learn a lot from the Ignoble Prize
| ceremony's 24/7 presentation model.
|
| Sadly, it seems we are heading towards a future where you
| may need an LLM to distill the relevant information out
| of a sea of noise.
| mecsred wrote:
| Didn't intend for it to be aggressive, just concise.
| Spare me from the llm please :)
| AYBABTME wrote:
| Think of the summary of a zoom call. Or of a chapter that
| you're not sure if you care to read or not.
|
| Not all content is worth consuming, and not all content
| is dense.
| postalrat wrote:
| If I had a recording of the zoom call I could generate a
| summary on demand with better tools than were available
| at the time the zoom call was made.
| crazygringo wrote:
| > _it sounds like the original wasn 't worth reading in
| the first place_
|
| But if that's the only place that contained the
| information you needed, then you have no choice.
|
| There's a lot of material out there that is badly
| written, badly organized, badly presented. LLM's can be a
| godsend for extracting the information you actually need
| without wasting 20 minutes wading through the muck.
| wruza wrote:
| Overviews aren't code though. In code, for me, they don't
| pass 80/20 tests well enough, sometimes even on simple
| cases. (You get 50-80% of an existing function/block with
| some important context prepended and a comment, let it
| write the rest and check if it succeeds). It doesn't mean
| that LLMs are useless. Or that I am antillamist or a
| denier - I'm actually an enthusiast. But this specific
| claim I hear often and don't find true. Maybe true for
| repetitive code in boring environments where typing and
| remembering formats/params over and over is the main
| issue. Not in actual code.
|
| If I paste the actual non-trivial code, it starts
| deviating fast. And it isn't too complex, it's just less
| like "parallel sort two arrays" and more like "wait for
| an image on a screenshot by execing scrot (with no sound)
| repeatedly and passing the result to this detect-cv2.py
| script and use all matching options described in this ts
| type, get stdout json as in this ts type, and if there's
| a match, wait for the specified anim timeout and test
| again to get the settled match coords after an animation
| finishes; throw after a total timeout". Not a rocket
| science, pretty dumb shit, but right there they fall flat
| and start imagining things, heavily.
|
| I guess it shines if you ask it to make an html form, but
| I couldn't call that life-changing unless I had to make
| these damn forms all day.
| andrei_says_ wrote:
| My experience with overviews is that they are often
| subtly or not so subtly inaccurate. LLMs not
| understanding meaning or intent carries risk of
| misrepresentation.
| giraffe_lady wrote:
| "Even just in programming" the jury is still out. None of
| my coworkers using these are noticeably more productive
| than the ones who don't. Outside of programming no one
| gives a shit except scammers and hype chasers.
| JonChesterfield wrote:
| The people writing articles for journals that aggregate
| and approximate other sources are in mortal terror of
| LLMs. Likewise graphic designers and anyone working in
| (human language) translation.
|
| I don't fear that LLMs are going to take my job as a
| developer. I'm pretty sure they mark a further decrease
| in the quality and coherence of software, along with a
| rapid increase in the quantity of code out there, and
| that seems likely to provide me with reliable employment
| forever. I'm basically employed in fixing bugs that
| didn't need to exist in the first place and that seems to
| cover a lot of software dev.
| giraffe_lady wrote:
| They're not scared of LLMs because of anything about
| LLMs. It's just that everyone with power is publicly
| horny to delete the remaining middle class jobs and are
| happy to use LLMs as a justification whether it can
| functionally replace those workers or not. So it's not
| that everyone has evaluated chatgpt and cannily realized
| it can do their job, they're just reading the room.
| recursive wrote:
| I've been using copilot for several months. If I could
| figure out a way to measure its impact on my
| productivity, I'd probably see a single digit percentage
| boost in "productivity". This is not life-changing for
| me. And for some tasks, it's actually worse than nothing.
| As in, I spend time feeding it a task, and it just
| completely fails to do anything useful.
| jimmaswell wrote:
| I've been using it for over a year I think. I don't often
| feed it tasks with comments so much as go about things
| the same as usual and let it autocomplete. The time and
| cognitive load saved adds up massively. I've had to go
| without it for a bit while my workplace gets its license
| in order for the corporate version and the personal
| version has an issue with the proxy, and it's been
| agonizing going without it again. I almost forgot how
| much it sucks having to jump to google every other
| minute, and it was easy to start to take for granted how
| much context copilot was letting me not have to hold onto
| in my head. It really lets me work on the problem as
| opposed to being mired in immaterial details. It feels
| like I'm at least 2x slower overall without it.
| rockskon wrote:
| I don't know about you but LLMs spit out garbage nonsense
| frequent enough that I can't trust their output in _any_
| context I cannot personally verify the validity of.
| atq2119 wrote:
| > I almost forgot how much it sucks having to jump to
| google every other minute
|
| Even allowing for some hyperbole, your programming
| experience is extremely different from mine. Looking
| anything up outside the IDE, let alone via Google, is by
| far the exception for me rather than the rule.
|
| I've long suspected that this kind of difference explains
| a lot of the difference in how Copilot is perceived.
| namaria wrote:
| Claiming LLMs are a massive boost for coding productivity
| is becoming a red flag that the claimant has a tenuous
| grasp on the skills necessary. Yeah if you have to look
| up everything all the time and you can't tell the AI slop
| isn't very good, you can put out code quite fast.
| soulofmischief wrote:
| Comments like this are a great example of the Dunning-
| Kruger effect. Your comment is actually an indication
| that you don't have the mastery required to get useful,
| productive output from a high quality LLM.
|
| Maybe you don't push your boundaries as an engineer and
| thus rarely need to know new things or at least learn new
| API surfaces. Maybe you don't know how to effectively
| prompt an LLM. Maybe you lack the mastery to analyze and
| refine the results. Maybe you just like doing things the
| slow way. I too remember a time as an early programmer
| where I eschewed even Intellisense and basic auto
| complete...
|
| I'd recommend learning a bit more and practicing some
| humility and curiosity before condemning an entire class
| of engineers just because you don't understand their
| workflow. Just because you've had subpar experiences with
| a new tool doesn't mean it's not a useful tool in another
| engineer's toolkit.
| namaria wrote:
| Funny you should make claims about my skills when you
| have exactly zero data about my abilities or performance.
|
| Evaluating my skills based on how I evaluated someone
| else's skills when they tell me about their abilities
| with and without a crutch, and throwing big academic
| sounding expressions with 'effect' in them might be
| intimidating to some but to me it just transparently
| sounds pretentious and way off mark, since, like I said,
| you have zero data about my abilities or output.
|
| > I'd recommend learning a bit more and practicing some
| humility and curiosity before condemning an entire class
| of engineers
|
| You're clearly coming from an emotional place because you
| feel slighted. There is no 'class of engineers' in my
| evaluation. I recommend reading comments more closely,
| thinking about their content, and not getting offended
| when someone points out signs of lacking skills, because
| you might just be advertising your own limitations.
| soulofmischief wrote:
| > Funny you should make claims about my skills when you
| have exactly zero data about my abilities or performance.
|
| Didn't you just do that to an entire class of engineers:
|
| > Claiming LLMs are a massive boost for coding
| productivity is becoming a red flag that the claimant has
| a tenuous grasp on the skills necessary
|
| Anyway,
|
| > Evaluating my skills based on how I evaluated someone
| else's skills when they tell me about their abilities
| with and without a crutch
|
| Your argument rests on the assumption that LLMs are a
| "crutch", and you're going to have to prove that before
| the rest of your argument holds any water.
|
| It sucks getting generalized, doesn't it? Feels
| ostracizing? That's the exact experience someone who
| productively and effectively uses LLMs will have upon
| encountering your premature judgement.
|
| > You're clearly coming from an emotional place because
| you feel slighted.
|
| You start off your post upset that I'm "making claims"
| about your skills (I used the word "maybe" intentionally,
| multiple times), and then turn around and make a pretty
| intense claim about me. I'm not "clearly" coming from an
| emotional place, you did not "trigger" me, I took a
| moment to educate you about being overly judgemental
| before fully understanding something, and pointed out the
| inherent hypocrisy.
|
| > you might just be advertising your own limitations
|
| But apparently my approach was ineffective, and you are
| still perceiving a world where people who approach their
| work differently than you are inferior. Your toxic
| attitude is unproductive, and while you're busy imagining
| yourself as some masterful engineer, people are out there
| getting massive productivity boosts with careful
| application of cutting-edge generative technologies. LLMs
| have been nothing short of transcendental to a curious
| but skilled mind.
| Ygg2 wrote:
| > Didn't you just do that to an entire class of engineers
|
| Not really. He said "if you claim LLM's are next thing
| since sliced butter I am doubting your abilities". Which
| is fair. It's not really a class as much as a group.
|
| I've never been wowed over by LLMs. At best they are
| boilerplate enhancers. At worst they write plausibly
| looking bullshit that compiles but breaks everything.
| Give it something truly novel and/or fringe and it will
| fold like a deck of cards.
|
| Even latest research called LLM's benefits into question:
| https://papers.ssrn.com/sol3/papers.cfm?abstract_id=49455
| 66
|
| That said. They are fine at generating commit messages
| and docs than me.
| soulofmischief wrote:
| > Not really. He said "if you claim LLM's are next thing
| since sliced butter I am doubting your abilities". Which
| is fair.
|
| No, OP said:
|
| > Claiming LLMs are a massive boost for coding
| productivity is becoming a red flag that the claimant has
| a tenuous grasp on the skills necessary
|
| Quotation marks are usually reserved for direct quotes,
| not paraphrases or straw mans.
|
| > I've never been wowed over by LLMs.
|
| Cool. I have, and many others have. I'm unsure why your
| experience justifies invalidating the experiences of
| others or supporting prejudice against people who have
| made good use of them.
|
| > Give it something truly novel and/or fringe and it will
| fold like a deck of cards.
|
| Few thoughts are truly novel, most are derivative or
| synergistic. Cutting edge LLMs, when paired with a
| capable human, are absolutely capable of productive work.
| I have long, highly technical and cross-cutting
| discussions with GPT 4o which I simply could not have
| with any human that I know. Humans like that exist, but I
| don't know them and so I'm making due with a very good
| approximation.
|
| Your and OP's lack of imagination at the capabilities of
| LLMs are more telling than you realize to those intimate
| with them, which is what makes this all quite ironic
| given that it started from OP making claims about how
| people who say LLMs massively boost productivity are
| giving tells that they're not skilled enough.
| Ygg2 wrote:
| > Quotation marks are usually reserved for direct quotes,
|
| Not on HN. Customary is to use > paragraph quotes like
| you did. However I will keep that in mind.
|
| > Cool. I have, and many others have. I'm unsure why your
| experience justifies invalidating the experiences of
| others
|
| If we're both grading a single student (LLM) in same
| field (programming), and you find it great and I find it
| disappointing, it means one of us is scoring it wrong.
|
| I gave papers that demonstrate its failings, where is
| your counter-proof?
|
| > Your and OP's lack of imagination at the capabilities
| of LLMs
|
| It's not lack of imagination. It's terribleness of
| results. It can't consistently write good doc comments. I
| does not understand the code nor it's purpose, but
| roughly guesses the shape. Which is fine for writing
| something that's not as formal as code.
|
| It can't read and understand specifications, and even
| generate something as simple as useful API for it. The
| novel part doesn't have to be that novel just something
| out of its learned corpus.
|
| Like Yaml parser in Rust. Maybe Zig or something beyond
| it's gobbled data repo.
|
| > Few thoughts are truly novel, most are derivative or
| synergistic.
|
| Sure but you still need A mind to derive/synergize the
| noise of everyday environment into something novel.
|
| It can't even do that but remix data into plausibly
| looking forms. A stochastic parrot. Great for DnD
| campaign. Shit for code.
| soulofmischief wrote:
| > Not on HN. Customary is to use > paragraph quotes like
| you did. However I will keep that in mind.
|
| Hacker News is not some strange place where the normal
| rules of discourse don't apply. I assume you are familiar
| with the function of quotation marks.
|
| > If we're both grading a single student (LLM) in same
| field (programming), and you find it great and I find it
| disappointing, it means one of us is scoring it wrong.
|
| No, it means we have different criteria and general
| capability for evaluating the LLM. There are plenty of
| standard criteria which LLMs are pitted against, and we
| have seen continued improvement since their inception.
|
| > It can't consistently write good doc comments. I does
| not understand the code nor it's purpose, but roughly
| guesses the shape.
|
| Writing good documentation is certainly a challenging
| task. Experience has led me to understand where current
| LLMs typically do and don't succeed with writing tests
| and documentation. Generally, the more organized and
| straightforward the code, the better. The smaller each
| module is, the higher the likelihood of a good first
| pass. And then you can fix deficiencies in a second,
| manual pass. If done right, it's generally faster than
| not making use of LLMs for typical workflows. Accuracy
| also goes down for more niche subject material. All tools
| have limitations, and understanding them is crucial to
| using them effectively.
|
| > It can't read and understand specifications, and even
| generate something as simple as useful API for it.
|
| Actually, I do this all the time and it works great. Keep
| practicing!
|
| In general, the stochastic parrot argument is oft-
| repeated but fails to recognize the general capabilities
| of machine learning. We're not talking about basic Markov
| chains, here. There are literally academic benchmarks
| against which transformers have blown away all initial
| expectations, and they continue to incrementally improve.
| Getting caught up criticizing the crudeness of a new,
| revolutionary tool is definitely my idea of
| unimaginative.
| Ygg2 wrote:
| > Hacker News is not some strange place where the normal
| rules of discourse don't apply. I assume you are familiar
| with the function of quotation marks.
|
| Language is all about context. I wasn't trying to be
| deceitful. And on HN I've never seen anyone using
| quotation marks to quote people.
|
| > Writing good documentation is certainly a challenging
| task.
|
| Doctests isn't same as writing documentation. Doctest are
| the simplest form of documentation. Given function named
| so and so write API doc + example. It could not even
| write example that passed syntax check.
|
| > Actually, I do this all the time and it works great.
| Keep practicing!
|
| Then you haven't given it interesting/complex enough
| problems.
|
| Also this isn't about practice. It's about its
| capabilities.
|
| > In general, the stochastic parrot argument is oft-
| repeated but fails to recognize the general capabilities
| of machine learning.
|
| I gave it write YAML parser given Yaml org spec, and it
| wrote following struct: enum Yaml {
| Scalar(String), List(Vec<Box<Yaml>>),
| Map(HashMap<String, Box<Yaml>>), }
|
| This is the stochastic parrot in action. Why? Because it
| tried to pass of JSON like structure as YAML.
|
| Whatever LLM's are they aren't intelligent. Or they have
| attention spans of a fruit fly and can't figure out basic
| differences.
| williamcotton wrote:
| That's not a good prompt, my friend!
| williamcotton wrote:
| I know plenty of fantastic engineers that use LLM tools
| as code assistants.
|
| I'm not sure when and why reading documentation and man
| pages became a sign of a lack of skill. Watch a
| presentation by someone like Brian Kernighan and you'll
| see him joke about looking up certain compiler flags for
| the thousandth time!
|
| Personally I work in C, C#, F#, Java, Kotlin, Swift, R,
| Ruby, Python, Postgres SQL, MySQL SQL, TypeScript, node,
| and whatever hundreds of libraries and DSLs are built on
| top. Yes, I have to look up documentation and with
| regularity.
| FpUser wrote:
| Same opinion here. I work with way too many things to
| keep everything in my head. I'd rather use my head for
| design than to remember every function and parameter of
| say STL
| specialist wrote:
| For me, thus far, LLMs help me forage docs. I know what I
| want and it helps me narrow my search faster. Watching
| adepts like Simon Willison wield LLMs is on my to do
| list.
| fragmede wrote:
| Add Golang and rust and JavaScript and next.js and react
| to the list for me. ;) If you live and work and breathe
| in the same kernel, operating system, and user space, and
| don't end up memorizing the various bits of minutiae, I'd
| judge you (and me) too, but it's not the 2000's, or the
| 90's or even the 80's anymore, and some of us don't have
| the luxury, or have chosen not to, live in one small
| niche for our entire career. At the end of the day, the
| client doesn't care what language you use, or the
| framework, or even the code quality, as long as it works.
| What they don't want to pay for is overage, and taking
| the previous developer's work and refactoring it and
| rewriting it in your preferred language isn't high value
| work, so you pick up whatever they used and run with it.
| Yeah that makes me less fluent in that one particular
| thing, not having done the same thing for 20+ years, but
| that's not where I deliver value. Some people do, and
| that's great for them and their employers, but my
| expertise lies elsewhere. I got real good at MFC, back in
| the day, and then WX and Qt and I'm working on getting
| good at react and such.
| jimmaswell wrote:
| At the risk of sounding like an inflated ego: I'm very
| good at what I do, the rest of my team frequently looks
| to me for guidance, my boss and boss's boss etc. have
| repeatedly said I'm among the most valuable people
| around, and I'm the one turned to in emergencies, for
| difficult architectural decisions, and to lead projects.
| I conceptually understand the ecosystem I work in very
| well at every layer.
|
| What I'm not good at is memorizing API's and libraries
| that all use different verbs and nouns for the same
| thing, and other such things that are immaterial to the
| actual work. How do you use a mutation observer again?
| Hell if I remember the syntax but I know the concept, and
| copilot will probably spit out what I want, and I'll
| easily verify the output. Or how do you copy an array in
| JS? Or print a stack trace? Or do a node walk? You can
| either wade through google and stackoverflow, or copilot
| can tell you instantly. And I can very quickly tell if
| the code copilot gave me is sensible or not.
| framapotari wrote:
| If you're already a competent developer, I think that's a
| reasonable expectation of impact on productivity. I think
| the 'life-changing' part comes in helping someone get to
| the point of building things with code where before they
| couldn't (or believed they couldn't). It does a lot
| better job of turning the enthusiasts and code-curious
| into amateurs vs. empowering professionals.
| jimmaswell wrote:
| > turning the enthusiasts and code-curious into amateurs
| vs. empowering professionals.
|
| I'm firmly in #2. My other comment goes over how.
|
| I'm intrigued to see how devs in #1 grow. One might be
| wary those devs would grow into bad habits and not
| thinking for themselves, but it might be a case of the
| ancient Greek rant against written books hindering
| memorization. Could be that they'll actually grow to be
| even better devs unburdened by time wasted on trivial
| details.
| haakonhr wrote:
| And here you're assuming that making software engineers
| more productive would be a service to the world. I think
| the jury is out on that one as well. At least for the
| majority of software engineering since 2010.
| pcl wrote:
| Isn't this paper pretty much about spending a few short
| years to improve the performance? Or are you arguing that
| the same people who made breakthroughs over the last few
| years should have also done the optimization work?
| fuzzfactor wrote:
| >the same people who made breakthroughs over the last few
| years should have also done the optimization work
|
| I never thought it would be ideal if it was otherwise, so
| I guess so.
|
| When I first considered neural nets from state-of-the art
| vendors to assist with some non-linguistic situations
| over 30 years ago, it wasn't quite ready for prime time
| and I could accept that.
|
| I just don't have generic situations all the time which
| would benefit me, so it's clearly my problems that have
| the deficiencies ;\
|
| What's being done now with all the resources being thrown
| at it is highly impressive, and gaining all the time, no
| doubt about it. It's nice to know there are people that
| can afford it.
|
| I truly look forward to more progress, and this may be
| the previously unreached milestone I have been detecting
| that might be a big one.
|
| Still not good enough for what I need yet so far though.
| And I can accept that as easily as ever.
|
| That's why I put up my estimation that not all of those
| 30+ years has been spent without agonizing over something
| ;)
| Scene_Cast2 wrote:
| This is a bit like recommending to skip vacuum tubes,
| think hard and invent transistors.
| fuzzfactor wrote:
| This is kind of thought-provoking.
|
| That is a good correlation when you think about how much
| more energy-efficient transistors are than vacuum tubes.
|
| Vacuum tube computers were a thing for a while, but it
| was more out of desperation than systematic intellectual
| progress.
|
| OTOH you could look at the present accomplishments like
| it was throwing more vacuum tubes at a problem that can
| not be adequately addressed that way.
|
| What turned out to be a solid-state solution was a
| completely different approach from the ground up.
|
| To the extent a more power-saving technique _using the
| same hardware_ is only a matter of different software
| approaches, that would be something that realistically
| could have been accomplished before so much energy was
| expended.
|
| Even though I've always thought application-specific
| circuits would be what really helps ML and AI a lot, and
| that would end up not being the exact same hardware at
| all.
|
| If power is truly being wasted enough to _start_ rearing
| its ugly head, somebody should be able to figure out how
| to fix it before it gets out-of-hand.
|
| Ironically enough with my experience using vacuum tubes,
| I've felt that there were some serious losses in
| technology when the research momentum involved was so
| rapidly abandoned in favor of "solid-state everything" at
| any cost.
|
| Maybe it is a good idea to abandon the energy-intensive
| approaches, as soon as anything completely different
| that's the least bit promising can barely be seen by a
| gifted visionary to have a glimmer of potential.
| michaelmrose wrote:
| This comment lives in a fictional world where there is a
| singular group that could have collectively acted
| counterfactually. In the real world any actor that
| individually went this route would have gone bankrupt
| while the others collected money by showing actual
| results even if ineffeciently earned.
| newyankee wrote:
| Also it is likely that the rise of LLMs gave many
| researchers in allied fields the impetus to tackle with
| the problems that are relevant to making it more
| efficient and people stumbled upon a solution hiding
| there.
|
| The momentum with LLMs and allied technology may last
| till it keeps on improving even by a few percentage
| points and keeps shattering human created new benchmarks
| every few months
| VagabundoP wrote:
| That's just not how progress works.
|
| Its iteritive, there are plenty of cul-de-sacs and
| failures. You can't really optimise until you have
| something that works and its a messy process that is
| inefficient.
|
| You're looking at this with hindsight.
| 3abiton wrote:
| This is still amazing work, imagine running chungus models
| on a single 3090.
| etcd wrote:
| I feel like I have seen this idea a few times but don't recall
| where but stuff posted via HN.
|
| Here https://news.ycombinator.com/item?id=41784591 but even
| before that. It is possibly one of those obvious ideas to
| people steeped in this.
|
| To me intuitively using floats to make ultimatelty boolean like
| decisions seems wasteful but that seemed like the way it had to
| be to have diffetentiable algorithms.
| robomartin wrote:
| I posted this about a week ago:
|
| https://news.ycombinator.com/item?id=41816598
|
| This has been done for decades in digital circuits, FPGA's,
| Digital Signal Processing, etc. Floating point is both resource
| and power intensive and using FP without the use of dedicated FP
| processing hardware is something that has been avoided and done
| without for decades unless absolutely necessary.
| ujikoluk wrote:
| Explain more for the uninitiated please.
| ausbah wrote:
| a lot of things in the ML research space are rebranding an old
| concept w a new name as "novel"
| fidotron wrote:
| Right, the ML people are learning, slowly, about the importance
| of optimizing for silicon simplicity, not just reduction of
| symbols in linear algebra.
|
| Their rediscovery of fixed point was bad enough but the "omg if
| we represent poses as quaternions everything works better"
| makes any game engine dev for the last 30 years explode.
| kayo_20211030 wrote:
| Extraordinary claims require extraordinary evidence. Maybe it's
| possible, but consider that some really smart people, in many
| different groups, have been working diligently in this space for
| quite a while; so claims of 95% savings on energy costs _with
| equivalent performance_ is in the extraordinary category. Of
| course, we'll see when the tide goes out.
| vlovich123 wrote:
| They've been working on unrelated problems like structure of
| the network or how to build networks with better results. There
| have been people working on improving the efficiency of the
| low-level math operations and this is the culmination of those
| groups. Figuring this stuff out isn't super easy.
| throwawaymaths wrote:
| I don't think this claim is extraordinary. Nothing proposed is
| mathematically impossible or even unlikely, just a pain in the
| ass to test (lots of retraining, fine tuning etc, and those
| operations are expensive when you dont have already massively
| parallel hardware available, otherwise you're ASIC/FPGAing for
| something with a huge investment risk)
|
| If I could have a SWAG at it I would say a low resolution model
| like llama-2 would probably be just fine (llama-2 quantizes
| without too much headache) but a higher resolution model like
| llama-3 probably not so much, not without massive retraining
| anyways.
| Randor wrote:
| The energy claims up to ~70% can be verified. The inference
| implementation is here:
|
| https://github.com/microsoft/BitNet
| kayo_20211030 wrote:
| I'm not an AI person, in any technical sense. The savings
| being claimed, and I assume verified, are on ARM and x86
| chips. The piece doesn't mention swapping mult to add, and a
| 1-bit LLM is, well, a 1-bit LLM.
|
| Also,
|
| > Additionally, it reduces energy consumption by 55.4% to
| 70.0%
|
| With humility, I don't know what that means. It seems like
| some dubious math with percentages.
| Randor wrote:
| > I don't know what that means. It seems like some dubious
| math with percentages.
|
| I would start by downloading a 1.58 model such as:
| https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens
|
| Run the non-quantized version of the model on your
| 3090/4090 gpu and observe the power draw. Then load the
| 1.58 model and observe the power usage. Sure, the numbers
| have a wide range because there are many gpu/npu to make
| the comparison.
| kayo_20211030 wrote:
| Good one!
| sroussey wrote:
| Not every instruction on a CPU or GPU uses the same amount
| of power. So if you could rewrite your algorithm to use
| more power efficient instructions (even if you technically
| use more of them), you can save overall power draw.
|
| That said, time to market has been more important than any
| cares of efficiency for some time. Now and in the future,
| there is more of a focus on it as the expenses in equipment
| and power have really grown.
| littlestymaar wrote:
| How does the liked article relate to BitNet at all? It's
| about the "addition is all you need" paper which AFAIK is
| unrelated.
| Randor wrote:
| Yeah, I get what you're saying but both are challenging the
| current MatMul methods. The L-Mul paper claims "a power
| savings of 95%" and that is the thread topic. Bitnet proves
| that at least 70% is possible by getting rid of MatMul.
| manquer wrote:
| It is a click bait headline the claim itself is not
| extraordinary. the preprint from arxiv was posted here some
| time back .
|
| The 95% gains is specifically only for multiplication
| operations, inference is compute light and memory heavy in the
| first place so the actual gains would be far less smaller .
|
| Tech journalism (all journalism really) can hardly be trusted
| to publish grounded news with the focus on clicks and revenue
| they need to survive.
| kayo_20211030 wrote:
| Thank you. That makes sense.
| rob_c wrote:
| Bingo,
|
| We have a winner. Glad that came from someone not in my
| lectures on ML network design
|
| Honestly, thanks for beeting me to this comment
| ksec wrote:
| >Tech journalism (all journalism really) can hardly be
| trusted to publish grounded news with the focus on clicks and
| revenue they need to survive.
|
| Right now the only way to gain real knowledge is actually to
| read comments of those articles.
| kayo_20211030 wrote:
| re: all above/below comments. It's still an extraordinary
| claim.
|
| I'm not claiming it's not possible, nor am I claiming that it's
| not true, or, at least, honest.
|
| But, there will need to be evidence that using real machines,
| and using real energy an _equivalent performance_ is
| achievable. A defense that "there are no suitable chips" is a
| bit disingenuous. If the 95% savings actually has legs some
| smart chip manufacturer will do the math and make the chips. If
| it's correct, that chip making firm will make a fortune. If
| it's not, they won't.
| throwawaymaths wrote:
| > If the 95% savings actually has legs some smart chip
| manufacturer will do the math and make the chips
|
| Terrible logic. By a similar logic we wouldn't be using
| python for machine learning at all, for example (or x86 for
| compute). Yet here we are.
| kayo_20211030 wrote:
| What's wrong with the logic? A caveat in the paper is that
| the technique will save 95% energy _but_ that the technique
| will not run efficiently on current chips. I 'm saying that
| if the new technique needs new chips and saves 95% of
| energy costs with the same performance, someone will make
| the chips. I say nothing about how and why we do ML as we
| do today - the 100% energy usage level.
| stefan_ wrote:
| I mean, all these smart people would rather pay NVIDIA all
| their money than make AMD viable. And yet they tell us its all
| MatMul.
| kayo_20211030 wrote:
| Both companies are doing pretty well. Why don't you think AMD
| is viable?
| nelup20 wrote:
| AMD's ROCm just isn't there yet compared to Nvidia's CUDA.
| I tried it on Linux with my AMD GPU and couldn't get things
| working. AFAIK on Windows it's even worse.
| mattalex wrote:
| That entirely depends on what AMD device you look at:
| gaming GPUs are not well supported, but their instinct
| line of accelerators works just as well as cuda. keep in
| mind that, in contrast to Nvidia, AMD uses different
| architectures for compute and gaming (though they are
| changing that in the next generation)
| redleader55 wrote:
| The litimus test would be if you read in the news that
| Amazon, Microsoft, Google or Meta just bought billions in
| GPUs from AMD.
|
| They are and have been buying AMD CPUs for a while now,
| which says something about AMD and Intel.
| JonChesterfield wrote:
| Microsoft and Meta are running customer facing LLM
| workloads on AMD's graphics cards. Oracle seems to like
| them too. Google is doing the TPU thing with Broadcom and
| Amazon seems to have decided to bet on Intel (in a
| presumably fatal move but time will tell). We'll find
| some more information on the order book in a couple of
| weeks at earnings.
|
| I like that the narrative has changed from "AI only runs
| on Cuda" to "sure it runs fine on AMD if you must"
| dotnet00 wrote:
| It's not their job to make AMD viable, it's AMD's job to make
| AMD viable. NVIDIA didn't get their position for free, they
| spent a decade refining CUDA and its tooling before GPU-based
| crypto and AI kicked off.
| syntaxing wrote:
| I'm looking forward to Bitnet adaptation. MS just released a tool
| for it similar to llamacpp. Really hoping major models get
| retrained for it.
| andrewstuart wrote:
| The ultimate "you're doing it wrong".
|
| For he sake of the climate and environment it would be nice to be
| true.
|
| Bad news for Nvidia. "Sell your stock" bad.
|
| Does it come with a demonstration?
| talldayo wrote:
| > Bad news for Nvidia. "Sell your stock" bad.
|
| People say this but then the fastest and most-used
| implementation of these optimizations is always written in
| CUDA. If this turns out to not be a hoax, I wouldn't be
| surprised to see Nvidia prices _jump_ in correlation.
| mouse_ wrote:
| Hypothetically, if this is true and simple as the headline
| implies -- AI using 95% less power doesn't mean AI will use 95%
| less power, it means we will do 20x more AI. As long as it's
| the current fad, we will throw as much power and resources at
| this as we can physically produce, because our economy depends
| on constant, accelerating growth.
| etcd wrote:
| True. A laptop power pack wattage is probably pretty much
| unchanged over 30 years for example.
| Dylan16807 wrote:
| Bad news for Nvidia how? Even ignoring that the power savings
| are only on one type of instruction, 20x less power doesn't
| mean it runs 20x faster. You still need big fat GPUs.
|
| If this increases integer demand and decreases floating point
| demand, that moderately changes future product design and
| doesn't do much else.
| Nasrudith wrote:
| Wouldn't reduced power consumption for an unfulfilled demand
| mean more demand for Nvida as they now need more chips to max
| out amount of power usage to capacity? (As concentration tends
| to be the more efficient way.)
| idiliv wrote:
| Duplicate, posted on October 9:
| https://news.ycombinator.com/item?id=41784591
| asicsarecool wrote:
| Don't assume this isn't already in place at the main AI companies
| DesiLurker wrote:
| validity of the claim aside, why dont they say reduces by 20
| times instead of 95%. its much better perspective of a fraction
| when fraction is tiny.
| hello_computer wrote:
| How does this differ from Cussen & Ullman?
|
| https://arxiv.org/abs/2307.01415
| selimthegrim wrote:
| Cussen is an HN poster incidentally.
| GistNoesis wrote:
| Does https://en.wikipedia.org/wiki/Jevons_paradox apply in this
| case ?
| narrator wrote:
| Of course. Jevons paradox always applies.
| gosub100 wrote:
| Not necessarily a bad thing: this might give the AI charlatans
| enough time to actually make something useful.
| mattxxx wrote:
| That's interesting.
|
| Obviously, energy cost creates a barrier to entry, so reduction
| of cost reduces the barrier to entry... which adds more
| players... which increases demand.
| panosv wrote:
| Lemurian Labs looks like it's doing something similar:
| https://www.lemurianlabs.com/technology They use the Logarithmic
| Number System (LNS)
| andrewstuart wrote:
| Here is the Microsoft implementation:
|
| https://github.com/microsoft/BitNet
| quantadev wrote:
| I wonder if someone has feed this entire "problem" into the
| latest Chat GPT-01 (the new model with reasoning capability), and
| just fed it in all the code for a Multilayer Perceptron and then
| given it the task/prompt of finding ways to implement the same
| network using only integer operations.
|
| Surely even the OpenAI devs must have done this like the minute
| they got done training that model, right? I wonder if they'd even
| admit it was an AI that came up with the solution rather than
| just publishing it, and taking credit. haha.
| chx wrote:
| You are imaging LLMs are capable of much more than they
| actually are. Here's the _only_ thing they are good for.
|
| https://hachyderm.io/@inthehands/112006855076082650
|
| > You might be surprised to learn that I actually think LLMs
| have the potential to be not only fun but genuinely useful.
| "Show me some bullshit that would be typical in this context"
| can be a genuinely helpful question to have answered, in code
| and in natural language -- for brainstorming, for seeing common
| conventions in an unfamiliar context, for having something
| crappy to react to.
|
| > Alas, that does not remotely resemble how people are pitching
| this technology.
| quantadev wrote:
| No, I'm not imagining things. You are, however, imaging
| (incorrectly) that I'm not an expert with AI who's already
| seen superhuman performance out of LLM prompts in the vast
| majority of every software development question I've ever
| asked them, starting all the way back at GPT-3.5.
| dotnet00 wrote:
| So, where are all of your world changing innovations driven
| by these superhuman capabilities?
| quantadev wrote:
| You raise a great point. Maybe I should be asking the AI
| for more career advice or new product ideas, rather than
| just letting it merely solve each specific coding
| challenge.
| didgetmaster wrote:
| Maybe I am just a natural skeptic, but whenever I see a headline
| that says 'method x reduces y by z%'; but when you read the text
| it instead says that optimizing some step 'could potentially
| reduce y by up to z%'; I am suspicious.
|
| Why not publish some actual benchmarks that prove your claim in
| even a few special cases?
| TheRealPomax wrote:
| Because as disappointing as modern life is, you need clickbait
| headlines to drive traffic. You did the right thing by reading
| the article though, that's where the information is, not the
| title.
| phtrivier wrote:
| Fair enough, but then I want a way to penalize publishers for
| abusing clickbait. There is no "unread" button, and there is
| no way to unsubscribe to advertisement-based sites.
|
| Even on sites that have a "Like / Don't like" button, my
| understanding is that clicking "Don't like" is a form of
| "engagement", that the suggestion algorithm are going to
| reward.
|
| Give me a button that says "this article was a scam", and
| have the publisher give the advertisement money back. Of
| better yet, give the advertisement money to charity / public
| services / whatever.
|
| Take a cut of the money being transfered, charge the
| publishers for being able to get a "clickbait free" green
| mark if they implement the scheme.
|
| Track the kind of articles that generate the most clickbait-
| angry comment. Sell back the data.
|
| There might a business model.
| NineStarPoint wrote:
| I doubt there's a business model there because who is going
| to opt in to a scheme that loses them money?
|
| What could work is social media giving people an easy
| button to block links to specific websites from appearing
| in their feed, or something along those lines. It's a nice
| user feature, and having every clickbait article be a
| chance someone will choose to never see your website again
| could actually reign in some of the nonsense.
| phtrivier wrote:
| > I doubt there's a business model there because who is
| going to opt in to a scheme that loses them money?
|
| Agreed, of course.
|
| In a reasonable world, that could be considered part of
| the basic, law mandated requirements. It would be blurry
| and subject to interpretation to decide what is clickbait
| or not, just like libel or defamation - good thing we're
| only a few hundred years away from someone reinventing a
| device to handle that, called "independent judges".
|
| In the meantime, I suppose you would have to bring some
| "unreasonable" thing to it, like "brands like to have
| green logos on their sites to brag" ?
|
| > What could work is social media giving people an easy
| button to block links to specific websites from appearing
| in their feed, or something along those lines.
|
| I completely agree. It's a feature they have had the
| technology to implement such a thing since forever, and
| they've decided against it since forever.
|
| However I wonder if that's something a browser extension
| could handle ? A merge of AdBlock and "saved you a click"
| that displays the "boring" content of the link when you
| hoveron a clickbaity link ?
| keybored wrote:
| Headlines: what can they do, they need that for the traffic
|
| Reader: do the moral thing and read the article, not just the
| title
|
| How is that balanced.
| baq wrote:
| OTOH you have a living proof that an amazingly huge neural
| network can work on 20W of power, so expecting multiple orders
| of magnitude in power consumption reduction is not
| unreasonable.
| etcd wrote:
| Mitochondria are all you need.
|
| Should be able to go more efficient as the brain has other
| constraints such as working at 36.7 degrees C etc.
| dragonwriter wrote:
| Well, one, because the headline isn't from the researchers, its
| from a popular press report (not even the one posted here,
| originally, this is secondary reporting of another popular
| press piece) and isn't what the paper claims so it would be odd
| for the paper's authors to conduct benchmarks to justify it.
| (And, no, even the "up to 95%" isn't from the paper, the cost
| savings are cited per operation depending on operation and the
| precision the operation is conducted at, are as high as 97.3%,
| are based on research already done establishing the energy cost
| of math operations on modern compute hardware, but no end-to-
| end cost savings claim is made.)
|
| And, two, because the actual energy cost savings claimed aren't
| even the experimental question -- the energy cost differences
| between various operations on modern hardware have been
| established in other research, the experimental issue here was
| whether the mathematical technique that enables using the lower
| energy cost operations performs competitively on output quality
| with existing implementations when substituted in for LLM
| inference.
| andrewstuart wrote:
| https://github.com/microsoft/BitNet
|
| "The first release of bitnet.cpp is to support inference on
| CPUs. bitnet.cpp achieves speedups of 1.37x to 5.07x on ARM
| CPUs, with larger models experiencing greater performance
| gains. Additionally, it reduces energy consumption by 55.4% to
| 70.0%, further boosting overall efficiency. On x86 CPUs,
| speedups range from 2.37x to 6.17x with energy reductions
| between 71.9% to 82.2%. Furthermore, bitnet.cpp can run a 100B
| BitNet b1.58 model on a single CPU, achieving speeds comparable
| to human reading (5-7 tokens per second), significantly
| enhancing the potential for running LLMs on local devices. More
| details will be provided soon."
| jdiez17 wrote:
| Damn. Seems almost too good to be true. Let's see where this
| goes in two weeks.
| andrewstuart wrote:
| Intel and AMD will be extremely happy.
|
| Nvidia will be very unhappy.
| l11r wrote:
| Their GPU will still be needed to do training. As far as
| I understand this will improve only interference
| performance and efficiency.
| littlestymaar wrote:
| Related: https://news.ycombinator.com/item?id=41784591 10 days
| ago
| greenthrow wrote:
| The trend of hyping up papers too early on is eroding people's
| faith in science due to poor journalism failing to explain that
| this is theoretical. The outlets that do this should pay the
| price but they don't, because almost every outlet does it.
| holoduke wrote:
| I don't think algorithms will change energy consumption. There is
| always max capacity needed in terms of computing. If tomorrow a
| new algorithm increases the performance 4 times, we will just
| have 4 times more computing.
| neuroelectron wrote:
| Nobody is interested in this because nobody wants less capex.
| ein0p wrote:
| As a rule, compute only takes less than 10% of all energy. 90% is
| data movement.
| tartakovsky wrote:
| original paper: https://news.ycombinator.com/item?id=41784591
| m3kw9 wrote:
| This sounds similar to someone saying room temp super conductor
| was discovered
| Art9681 wrote:
| In the end the power consumption means the current models that
| are "good enough" will fit a much smaller compute budget such as
| edge devices. However, enthusiasts are still going to want the
| best hardware they can afford because inevitably, everyone will
| want to maximize the size and intelligence of a model they can
| run. So we're just going to scale. This might bring a GPT-4 level
| to edge devices, but we are still going to want to run what might
| resemble a GPT-5/6 model on the best hardware possible at the
| time. So don't throw away your GPU's yet. This will bring
| capabilities to mass market, but your high end GPU will still
| scale the solution n-fold and youll be able to run models with
| disregard to the energy savings promoted in the headline.
|
| In other sensationalized words: "AI engineers can claim new
| algorithm allows them to fit GPT-5 in an RTX5090 running at 600
| watts."
| jart wrote:
| It's a very crude approximation, e.g. 1.75 * 2.5 == 3 (although
| it seems better as the numbers get closer to 0).
|
| I tried implementing this for AVX512 with tinyBLAS in llamafile.
| inline __m512 lmul512(__m512 x, __m512 y) { __m512i
| sign_mask = _mm512_set1_epi32(0x80000000); __m512i
| exp_mask = _mm512_set1_epi32(0x7F800000); __m512i
| mant_mask = _mm512_set1_epi32(0x007FFFFF); __m512i
| exp_bias = _mm512_set1_epi32(127); __m512i x_bits =
| _mm512_castps_si512(x); __m512i y_bits =
| _mm512_castps_si512(y); __m512i sign_x =
| _mm512_and_si512(x_bits, sign_mask); __m512i sign_y =
| _mm512_and_si512(y_bits, sign_mask); __m512i exp_x =
| _mm512_srli_epi32(_mm512_and_si512(x_bits, exp_mask), 23);
| __m512i exp_y = _mm512_srli_epi32(_mm512_and_si512(y_bits,
| exp_mask), 23); __m512i mant_x =
| _mm512_and_si512(x_bits, mant_mask); __m512i mant_y =
| _mm512_and_si512(y_bits, mant_mask); __m512i
| sign_result = _mm512_xor_si512(sign_x, sign_y);
| __m512i exp_result = _mm512_sub_epi32(_mm512_add_epi32(exp_x,
| exp_y), exp_bias); __m512i mant_result =
| _mm512_srli_epi32(_mm512_add_epi32(mant_x, mant_y), 1);
| __m512i result_bits = _mm512_or_si512(
| _mm512_or_si512(sign_result, _mm512_slli_epi32(exp_result, 23)),
| mant_result); return
| _mm512_castsi512_ps(result_bits); }
|
| Then I used it for Llama-3.2-3B-Instruct.F16.gguf and it
| outputted jibberish. So you would probably have to train and
| design your model specifically to use this multiplication
| approximation in order for it to work. Or maybe I'd have to tune
| the model so that only certain layers and/or operations use the
| approximation. However the speed was decent. Prefill only dropped
| from 850 tokens per second to 200 tok/sec on my threadripper.
| Prediction speed was totally unaffected, staying at 34 tok/sec. I
| like how the code above generates vpternlog ops. So if anyone
| ever designs an LLM architecture and releases weights on Hugging
| Face that use this algorithm, we'll be able to run them
| reasonably fast without special hardware.
| raluk wrote:
| Your kernel seems to be incorrect for 1.75 * 2.5. From paper we
| have 1.75 == (1+0.75)*2^0 for 2.5 == (1+0.25)*2^1 so result is
| (1+0.75+0.25+2^-4)*2^1 == 4.125 (correct result is 4.375)
| raluk wrote:
| Extra. I am not sure if that is clear from paper, but in
| example of 1.75 * 2.5 we can represent 1.75 also as (1-0.125)
| * 2. This gives good aproximations for numbers that are close
| but less than power of 2. This way abs(a*b) in (1+a)*(1+b) is
| allways small and strictly less than 0.25.
|
| Another example, if we have for example 1.9 * 1.9 then we
| need to account for overflow in (0.9 + 0.9) and this seems to
| induce similar overhead as expressing numbers as (1-0.05)*2 .
| nprateem wrote:
| Is it the one where you delete 95% of user accounts?
| m463 wrote:
| So couldn't you design a GPU that uses or supports this algorithm
| to use the same power, but use bigger models, better models, or
| do more work?
| Wheatman wrote:
| Isnt 90% of the enrgy spent moving bytes around? Why would this
| have such a great affect?
| faragon wrote:
| Before reading the article I was expecting using 1-bit instead of
| bfloats, and using logical operators instead of arithmetic.
| svilen_dobrev wrote:
| i am not well versed in the math involved, but IMO if the outcome
| depends mostly on the differences between them numbers, as
| smaller-or-bigger distinction as well as their magnitudes, then
| exactness might not be needed. i mean, as long as the approximate
| "function" looks similar to the exact one, that might be good
| enough.
|
| Maybe even generate a table of the approximate results and use
| that, in various stages? Like the way sin/cos was done 30y ago
| before FP coprocessors arrived
| gcanyon wrote:
| This isn't really the optimization I'm think about, but: given
| the weird and abstract nature of the functioning of ML in general
| and LLMs in particular, it seems reasonable to think that there
| might be algorithms that achieve the same, or a similar, result
| in an orders-of-magnitude more efficient way.
| DennisL123 wrote:
| This is a result on 8 bit numbers, right? Why not precompute all
| 64k possible combinations and look up the results from the table?
| jhj wrote:
| As someone who has worked in this space (approximate compute) on
| both GPUs and in silicon in my research, the power consumption
| claims are completely bogus, as are the accuracy claims:
|
| > In this section, we show that L-Mul is more precise than fp8
| e4m3 multiplications
|
| > To be concise, we do not consider the rounding to nearest even
| mode in both error analysis and complexity estimation for both
| Mul and L-Mul
|
| These two statements together are non-sensical. Sure, if you
| analyze accuracy while ignoring the part of the algorithm that
| gives you accuracy in the baseline you can derive whatever
| cherry-picked result you want.
|
| The multiplication of two floating point values if you round to
| nearest even will be the correctly rounded result of multiplying
| the original values at infinite precision, this is how floating
| point rounding usually works and what IEEE 754 mandates for
| fundamental operations if you choose to follow those guidelines
| (e.g., multiplication here). But not rounding to nearest even
| will result in a lot more quantization noise, and biased noise at
| that too.
|
| > applying the L-Mul operation in tensor processing hardware can
| potentially reduce 95% energy cost by elementwise floating point
| tensor multiplications and 80% energy cost of dot products
|
| A good chunk of the energy cost is simply moving data between
| memories (especially external DRAM/HBM/whatever) and along wires,
| buffering values in SRAMs and flip-flops and the like.
| Combinational logic cost is usually not a big deal. While having
| a ton of fixed-function matrix multipliers does raise the cost of
| combinational logic quite a bit, at most what they have will
| probably cut the power of an overall accelerator by 10-20% or so.
|
| > In this section, we demonstrate that L-Mul can replace tensor
| multiplications in the attention mechanism without any loss of
| performance, whereas using fp8 multiplications for the same
| purpose degrades inference accuracy
|
| I may have missed it in the paper, but they have provided no
| details on (re)scaling and/or using higher precision accumulation
| for intermediate results as one would experience on an H100 for
| instance. Without this information, I don't trust these
| evaluation results either.
| DrNosferatu wrote:
| Why they don't implement the algorithm in a FPGA to compare with
| a classical baseline?
___________________________________________________________________
(page generated 2024-10-20 23:02 UTC)