hngopher.com

       [HN Gopher] AI engineers claim new algorithm reduces AI power co...
       ___________________________________________________________________
        
       AI engineers claim new algorithm reduces AI power consumption by
       95%
        
       Author : ferriswil
       Score  : 325 points
       Date   : 2024-10-19 18:03 UTC (1 days ago)
        
 (HTM) web link (www.tomshardware.com)
 (TXT) w3m dump (www.tomshardware.com)
        
       | remexre wrote:
       | Isn't this just taking advantage of "log(x) + log(y) = log(xy)"?
       | The IEEE754 floating-point representation stores floats as sign,
       | mantissa, and exponent -- ignore the first two (you quantitized
       | anyway, right?), and the exponent is just an integer storing
       | log() of the float.
        
         | convolvatron wrote:
         | yes. and the next question is 'ok, how do we add'
        
           | dietr1ch wrote:
           | I guess that if the bulk of the computation goes into the
           | multiplications, you can work in the log-space and simply
           | sum, and when the time comes to actually do a sum on the
           | original space you can go back and sum.
        
             | a-loup-e wrote:
             | Not sure how well that would work if you're often adding
             | bias after every layer
        
           | kps wrote:
           | Yes. I haven't yet read this paper to see what exactly it
           | says is new, but I've definitely seen log-based
           | representations under development before now. ( _More_ log-
           | based than the regular floating-point exponent, that is. I
           | don 't actually know the argument behind the exponent-and-
           | mantissa form that's been pretty much universal even before
           | IEEE754, other than that it mimics decimal scientific
           | notation.)
        
         | mota7 wrote:
         | Not quite: It's taking advantage of (1+a)(1+b) = 1 + a + b +
         | ab. And where a and b are both small-ish, ab is really small
         | and can just be ignored.
         | 
         | So it turns the (1+a)(1+b) into 1+a+b. Which is definitely not
         | the same! But it turns out, machine guessing apparently doesn't
         | care much about the difference.
        
           | amelius wrote:
           | You might then as well replace the multiplication by the
           | addition in the original network. In that case you're not
           | even approximating anything.
           | 
           | Am I missing something?
        
             | dotnet00 wrote:
             | They're applying that simplification to the exponent bits
             | of an 8 bit float. The range is so small that the
             | approximation to multiplication is going to be pretty
             | close.
        
           | tommiegannert wrote:
           | Plus the 2^-l(m) correction term.
           | 
           | Feels like multiplication shouldn't be needed for
           | convergence, just monotonicity? I wonder how well it would
           | perform if the model was actually trained the same way.
        
           | dsv3099i wrote:
           | This trick is used a ton when doing hand calculation in
           | engineering as well. It can save a lot of work.
           | 
           | You're going to have tolerance on the result anyway, so
           | what's a little more error. :)
        
       | _aavaa_ wrote:
       | Original discussion of the preprint:
       | https://news.ycombinator.com/item?id=41784591
        
         | codethief wrote:
         | Ahh, there it is! I was sure we had discussed this paper
         | before.
        
       | djoldman wrote:
       | https://arxiv.org/abs/2410.00907
       | 
       | ABSTRACT
       | 
       | Large neural networks spend most computation on floating point
       | tensor multiplications. In this work, we find that a floating
       | point multiplier can be approximated by one integer adder with
       | high precision. We propose the linear-complexity multiplication
       | (L-Mul) algorithm that approximates floating point number
       | multiplication with integer addition operations. The new
       | algorithm costs significantly less computation resource than
       | 8-bit floating point multiplication but achieves higher
       | precision. Compared to 8-bit floating point multiplications, the
       | proposed method achieves higher precision but consumes
       | significantly less bit-level computation. Since multiplying
       | floating point numbers requires substantially higher energy
       | compared to integer addition operations, applying the L-Mul
       | operation in tensor processing hardware can potentially reduce
       | 95% energy cost by elementwise floating point tensor
       | multiplications and 80% energy cost of dot products. We
       | calculated the theoretical error expectation of L-Mul, and
       | evaluated the algorithm on a wide range of textual, visual, and
       | symbolic tasks, including natural language understanding,
       | structural reasoning, mathematics, and commonsense question
       | answering. Our numerical analysis experiments agree with the
       | theoretical error estimation, which indicates that L-Mul with
       | 4-bit mantissa achieves comparable precision as float8 e4m3
       | multiplications, and L-Mul with 3-bit mantissa outperforms float8
       | e5m2. Evaluation results on popular benchmarks show that directly
       | applying L-Mul to the attention mechanism is almost lossless. We
       | further show that replacing all floating point multiplications
       | with 3-bit mantissa L-Mul in a transformer model achieves
       | equivalent precision as using float8 e4m3 as accumulation
       | precision in both fine-tuning and inference.
        
         | onlyrealcuzzo wrote:
         | Does this mean you can train efficiently without GPUs?
         | 
         | Presumably there will be a lot of interest.
        
           | crazygringo wrote:
           | No. But it does potentially mean that either current or
           | future-tweaked GPUs could run a lot more efficiently --
           | meaning much faster or with much less energy consumption.
           | 
           | You still need the GPU parallelism though.
        
             | fuzzfactor wrote:
             | I had a feeling it had to be something like massive waste
             | due to a misguided feature of the algorithms that shouldn't
             | have been there in the first place.
             | 
             | Once the "math is done" quite likely it would have paid off
             | better than most investments for the top people to have
             | spent a few short years working with grossly underpowered
             | hardware until they could come up with amazing results
             | there before scaling up. Rather than grossly overpowered
             | hardware before there was even deep understanding of the
             | underlying processes.
             | 
             | When you think about it, what we have seen from the latest
             | ultra-high-powered "thinking" machines is truly so
             | impressive. But if you are trying to fool somebody into
             | believing that it's a real person it's still not "quite"
             | there.
             | 
             | Maybe a good benchmark would be to take a regular PC, and
             | without reliance on AI just pull out all the stops and put
             | all the effort into fakery itself. No holds barred, any
             | trick you can think of. See what the electronics is capable
             | of this way. There are some smart engineers, this would
             | only take a few years but looks like it would have been a
             | lot more affordable.
             | 
             | Then with the same hardware if an AI alternative is not as
             | convincing, something has got to be wrong.
             | 
             | It's good to find out this type of thing before you go
             | overboard.
             | 
             | Regardless of speed or power, I never could have gotten an
             | 8-bit computer to match the output of a 32-bit floating-
             | point algorithm by using floating-point myself. Integers
             | all the way and place the decimal where it's supposed to be
             | when you're done.
             | 
             | Once it's really figured out, how do you think it would
             | feel being the one paying the electric bills up until now?
        
               | jimmaswell wrote:
               | Faster progress was absolutely worth it. Spending years
               | agonizing over theory to save a bit of electric would
               | have been a massive disservice to the world.
        
               | BolexNOLA wrote:
               | "A bit"?
        
               | bartread wrote:
               | Yes, a large amount for - in the grand scheme of things -
               | a short period of time (i.e., a quantity of energy usage
               | in an intense spike that will be dwarfed by energy usage
               | over time) can accurately be described as "a bit".
               | 
               | Of course, the impact is that AI will continue to become
               | cheaper to use, and induced demand will continue the
               | feedback loop driving the market as a result.
        
               | rossjudson wrote:
               | You're sort of presuming that LLMs are going to be a
               | massive _service_ to the world there, aren 't you? I
               | think the jury is still out on that one.
        
               | jimmaswell wrote:
               | They already have been. Even just in programming, even
               | just Copilot has been a life changing productivity
               | booster.
        
               | wruza wrote:
               | Are you sure it's a life changing productivity booster?
               | Sometimes I look at my projects and wonder how would I
               | explain it to an LLM what this code should have done if
               | it didn't exist yet. Must be a shitton of boilerplate
               | programming for copilot to be a life-changing experience.
        
               | AYBABTME wrote:
               | You haven't used them enough. Everytime an LLM reduces my
               | search from 1min to 5s, the LLM pays.
               | 
               | Just summary features: save me 20min of reading a
               | transcript, turn it into 20s. That's a huge enabler.
        
               | mecsred wrote:
               | If 20 mins of informations can legitimately be condensed
               | into 20 seconds, it sounds like the original wasn't worth
               | reading in the first place. Could have skipped the llm
               | entirely.
        
               | bostik wrote:
               | I upvoted you, because I think you have a valid point.
               | The tone is unnecessarily aggressive though.
               | 
               | Effective and information-dense communication is _really_
               | hard. That doesn 't mean we should just accept the
               | useless fluff surrounding the actual information and/or
               | analysis. People could learn a lot from the Ignoble Prize
               | ceremony's 24/7 presentation model.
               | 
               | Sadly, it seems we are heading towards a future where you
               | may need an LLM to distill the relevant information out
               | of a sea of noise.
        
               | mecsred wrote:
               | Didn't intend for it to be aggressive, just concise.
               | Spare me from the llm please :)
        
               | AYBABTME wrote:
               | Think of the summary of a zoom call. Or of a chapter that
               | you're not sure if you care to read or not.
               | 
               | Not all content is worth consuming, and not all content
               | is dense.
        
               | postalrat wrote:
               | If I had a recording of the zoom call I could generate a
               | summary on demand with better tools than were available
               | at the time the zoom call was made.
        
               | crazygringo wrote:
               | > _it sounds like the original wasn 't worth reading in
               | the first place_
               | 
               | But if that's the only place that contained the
               | information you needed, then you have no choice.
               | 
               | There's a lot of material out there that is badly
               | written, badly organized, badly presented. LLM's can be a
               | godsend for extracting the information you actually need
               | without wasting 20 minutes wading through the muck.
        
               | wruza wrote:
               | Overviews aren't code though. In code, for me, they don't
               | pass 80/20 tests well enough, sometimes even on simple
               | cases. (You get 50-80% of an existing function/block with
               | some important context prepended and a comment, let it
               | write the rest and check if it succeeds). It doesn't mean
               | that LLMs are useless. Or that I am antillamist or a
               | denier - I'm actually an enthusiast. But this specific
               | claim I hear often and don't find true. Maybe true for
               | repetitive code in boring environments where typing and
               | remembering formats/params over and over is the main
               | issue. Not in actual code.
               | 
               | If I paste the actual non-trivial code, it starts
               | deviating fast. And it isn't too complex, it's just less
               | like "parallel sort two arrays" and more like "wait for
               | an image on a screenshot by execing scrot (with no sound)
               | repeatedly and passing the result to this detect-cv2.py
               | script and use all matching options described in this ts
               | type, get stdout json as in this ts type, and if there's
               | a match, wait for the specified anim timeout and test
               | again to get the settled match coords after an animation
               | finishes; throw after a total timeout". Not a rocket
               | science, pretty dumb shit, but right there they fall flat
               | and start imagining things, heavily.
               | 
               | I guess it shines if you ask it to make an html form, but
               | I couldn't call that life-changing unless I had to make
               | these damn forms all day.
        
               | andrei_says_ wrote:
               | My experience with overviews is that they are often
               | subtly or not so subtly inaccurate. LLMs not
               | understanding meaning or intent carries risk of
               | misrepresentation.
        
               | giraffe_lady wrote:
               | "Even just in programming" the jury is still out. None of
               | my coworkers using these are noticeably more productive
               | than the ones who don't. Outside of programming no one
               | gives a shit except scammers and hype chasers.
        
               | JonChesterfield wrote:
               | The people writing articles for journals that aggregate
               | and approximate other sources are in mortal terror of
               | LLMs. Likewise graphic designers and anyone working in
               | (human language) translation.
               | 
               | I don't fear that LLMs are going to take my job as a
               | developer. I'm pretty sure they mark a further decrease
               | in the quality and coherence of software, along with a
               | rapid increase in the quantity of code out there, and
               | that seems likely to provide me with reliable employment
               | forever. I'm basically employed in fixing bugs that
               | didn't need to exist in the first place and that seems to
               | cover a lot of software dev.
        
               | giraffe_lady wrote:
               | They're not scared of LLMs because of anything about
               | LLMs. It's just that everyone with power is publicly
               | horny to delete the remaining middle class jobs and are
               | happy to use LLMs as a justification whether it can
               | functionally replace those workers or not. So it's not
               | that everyone has evaluated chatgpt and cannily realized
               | it can do their job, they're just reading the room.
        
               | recursive wrote:
               | I've been using copilot for several months. If I could
               | figure out a way to measure its impact on my
               | productivity, I'd probably see a single digit percentage
               | boost in "productivity". This is not life-changing for
               | me. And for some tasks, it's actually worse than nothing.
               | As in, I spend time feeding it a task, and it just
               | completely fails to do anything useful.
        
               | jimmaswell wrote:
               | I've been using it for over a year I think. I don't often
               | feed it tasks with comments so much as go about things
               | the same as usual and let it autocomplete. The time and
               | cognitive load saved adds up massively. I've had to go
               | without it for a bit while my workplace gets its license
               | in order for the corporate version and the personal
               | version has an issue with the proxy, and it's been
               | agonizing going without it again. I almost forgot how
               | much it sucks having to jump to google every other
               | minute, and it was easy to start to take for granted how
               | much context copilot was letting me not have to hold onto
               | in my head. It really lets me work on the problem as
               | opposed to being mired in immaterial details. It feels
               | like I'm at least 2x slower overall without it.
        
               | rockskon wrote:
               | I don't know about you but LLMs spit out garbage nonsense
               | frequent enough that I can't trust their output in _any_
               | context I cannot personally verify the validity of.
        
               | atq2119 wrote:
               | > I almost forgot how much it sucks having to jump to
               | google every other minute
               | 
               | Even allowing for some hyperbole, your programming
               | experience is extremely different from mine. Looking
               | anything up outside the IDE, let alone via Google, is by
               | far the exception for me rather than the rule.
               | 
               | I've long suspected that this kind of difference explains
               | a lot of the difference in how Copilot is perceived.
        
               | namaria wrote:
               | Claiming LLMs are a massive boost for coding productivity
               | is becoming a red flag that the claimant has a tenuous
               | grasp on the skills necessary. Yeah if you have to look
               | up everything all the time and you can't tell the AI slop
               | isn't very good, you can put out code quite fast.
        
               | soulofmischief wrote:
               | Comments like this are a great example of the Dunning-
               | Kruger effect. Your comment is actually an indication
               | that you don't have the mastery required to get useful,
               | productive output from a high quality LLM.
               | 
               | Maybe you don't push your boundaries as an engineer and
               | thus rarely need to know new things or at least learn new
               | API surfaces. Maybe you don't know how to effectively
               | prompt an LLM. Maybe you lack the mastery to analyze and
               | refine the results. Maybe you just like doing things the
               | slow way. I too remember a time as an early programmer
               | where I eschewed even Intellisense and basic auto
               | complete...
               | 
               | I'd recommend learning a bit more and practicing some
               | humility and curiosity before condemning an entire class
               | of engineers just because you don't understand their
               | workflow. Just because you've had subpar experiences with
               | a new tool doesn't mean it's not a useful tool in another
               | engineer's toolkit.
        
               | namaria wrote:
               | Funny you should make claims about my skills when you
               | have exactly zero data about my abilities or performance.
               | 
               | Evaluating my skills based on how I evaluated someone
               | else's skills when they tell me about their abilities
               | with and without a crutch, and throwing big academic
               | sounding expressions with 'effect' in them might be
               | intimidating to some but to me it just transparently
               | sounds pretentious and way off mark, since, like I said,
               | you have zero data about my abilities or output.
               | 
               | > I'd recommend learning a bit more and practicing some
               | humility and curiosity before condemning an entire class
               | of engineers
               | 
               | You're clearly coming from an emotional place because you
               | feel slighted. There is no 'class of engineers' in my
               | evaluation. I recommend reading comments more closely,
               | thinking about their content, and not getting offended
               | when someone points out signs of lacking skills, because
               | you might just be advertising your own limitations.
        
               | soulofmischief wrote:
               | > Funny you should make claims about my skills when you
               | have exactly zero data about my abilities or performance.
               | 
               | Didn't you just do that to an entire class of engineers:
               | 
               | > Claiming LLMs are a massive boost for coding
               | productivity is becoming a red flag that the claimant has
               | a tenuous grasp on the skills necessary
               | 
               | Anyway,
               | 
               | > Evaluating my skills based on how I evaluated someone
               | else's skills when they tell me about their abilities
               | with and without a crutch
               | 
               | Your argument rests on the assumption that LLMs are a
               | "crutch", and you're going to have to prove that before
               | the rest of your argument holds any water.
               | 
               | It sucks getting generalized, doesn't it? Feels
               | ostracizing? That's the exact experience someone who
               | productively and effectively uses LLMs will have upon
               | encountering your premature judgement.
               | 
               | > You're clearly coming from an emotional place because
               | you feel slighted.
               | 
               | You start off your post upset that I'm "making claims"
               | about your skills (I used the word "maybe" intentionally,
               | multiple times), and then turn around and make a pretty
               | intense claim about me. I'm not "clearly" coming from an
               | emotional place, you did not "trigger" me, I took a
               | moment to educate you about being overly judgemental
               | before fully understanding something, and pointed out the
               | inherent hypocrisy.
               | 
               | > you might just be advertising your own limitations
               | 
               | But apparently my approach was ineffective, and you are
               | still perceiving a world where people who approach their
               | work differently than you are inferior. Your toxic
               | attitude is unproductive, and while you're busy imagining
               | yourself as some masterful engineer, people are out there
               | getting massive productivity boosts with careful
               | application of cutting-edge generative technologies. LLMs
               | have been nothing short of transcendental to a curious
               | but skilled mind.
        
               | Ygg2 wrote:
               | > Didn't you just do that to an entire class of engineers
               | 
               | Not really. He said "if you claim LLM's are next thing
               | since sliced butter I am doubting your abilities". Which
               | is fair. It's not really a class as much as a group.
               | 
               | I've never been wowed over by LLMs. At best they are
               | boilerplate enhancers. At worst they write plausibly
               | looking bullshit that compiles but breaks everything.
               | Give it something truly novel and/or fringe and it will
               | fold like a deck of cards.
               | 
               | Even latest research called LLM's benefits into question:
               | https://papers.ssrn.com/sol3/papers.cfm?abstract_id=49455
               | 66
               | 
               | That said. They are fine at generating commit messages
               | and docs than me.
        
               | soulofmischief wrote:
               | > Not really. He said "if you claim LLM's are next thing
               | since sliced butter I am doubting your abilities". Which
               | is fair.
               | 
               | No, OP said:
               | 
               | > Claiming LLMs are a massive boost for coding
               | productivity is becoming a red flag that the claimant has
               | a tenuous grasp on the skills necessary
               | 
               | Quotation marks are usually reserved for direct quotes,
               | not paraphrases or straw mans.
               | 
               | > I've never been wowed over by LLMs.
               | 
               | Cool. I have, and many others have. I'm unsure why your
               | experience justifies invalidating the experiences of
               | others or supporting prejudice against people who have
               | made good use of them.
               | 
               | > Give it something truly novel and/or fringe and it will
               | fold like a deck of cards.
               | 
               | Few thoughts are truly novel, most are derivative or
               | synergistic. Cutting edge LLMs, when paired with a
               | capable human, are absolutely capable of productive work.
               | I have long, highly technical and cross-cutting
               | discussions with GPT 4o which I simply could not have
               | with any human that I know. Humans like that exist, but I
               | don't know them and so I'm making due with a very good
               | approximation.
               | 
               | Your and OP's lack of imagination at the capabilities of
               | LLMs are more telling than you realize to those intimate
               | with them, which is what makes this all quite ironic
               | given that it started from OP making claims about how
               | people who say LLMs massively boost productivity are
               | giving tells that they're not skilled enough.
        
               | Ygg2 wrote:
               | > Quotation marks are usually reserved for direct quotes,
               | 
               | Not on HN. Customary is to use > paragraph quotes like
               | you did. However I will keep that in mind.
               | 
               | > Cool. I have, and many others have. I'm unsure why your
               | experience justifies invalidating the experiences of
               | others
               | 
               | If we're both grading a single student (LLM) in same
               | field (programming), and you find it great and I find it
               | disappointing, it means one of us is scoring it wrong.
               | 
               | I gave papers that demonstrate its failings, where is
               | your counter-proof?
               | 
               | > Your and OP's lack of imagination at the capabilities
               | of LLMs
               | 
               | It's not lack of imagination. It's terribleness of
               | results. It can't consistently write good doc comments. I
               | does not understand the code nor it's purpose, but
               | roughly guesses the shape. Which is fine for writing
               | something that's not as formal as code.
               | 
               | It can't read and understand specifications, and even
               | generate something as simple as useful API for it. The
               | novel part doesn't have to be that novel just something
               | out of its learned corpus.
               | 
               | Like Yaml parser in Rust. Maybe Zig or something beyond
               | it's gobbled data repo.
               | 
               | > Few thoughts are truly novel, most are derivative or
               | synergistic.
               | 
               | Sure but you still need A mind to derive/synergize the
               | noise of everyday environment into something novel.
               | 
               | It can't even do that but remix data into plausibly
               | looking forms. A stochastic parrot. Great for DnD
               | campaign. Shit for code.
        
               | soulofmischief wrote:
               | > Not on HN. Customary is to use > paragraph quotes like
               | you did. However I will keep that in mind.
               | 
               | Hacker News is not some strange place where the normal
               | rules of discourse don't apply. I assume you are familiar
               | with the function of quotation marks.
               | 
               | > If we're both grading a single student (LLM) in same
               | field (programming), and you find it great and I find it
               | disappointing, it means one of us is scoring it wrong.
               | 
               | No, it means we have different criteria and general
               | capability for evaluating the LLM. There are plenty of
               | standard criteria which LLMs are pitted against, and we
               | have seen continued improvement since their inception.
               | 
               | > It can't consistently write good doc comments. I does
               | not understand the code nor it's purpose, but roughly
               | guesses the shape.
               | 
               | Writing good documentation is certainly a challenging
               | task. Experience has led me to understand where current
               | LLMs typically do and don't succeed with writing tests
               | and documentation. Generally, the more organized and
               | straightforward the code, the better. The smaller each
               | module is, the higher the likelihood of a good first
               | pass. And then you can fix deficiencies in a second,
               | manual pass. If done right, it's generally faster than
               | not making use of LLMs for typical workflows. Accuracy
               | also goes down for more niche subject material. All tools
               | have limitations, and understanding them is crucial to
               | using them effectively.
               | 
               | > It can't read and understand specifications, and even
               | generate something as simple as useful API for it.
               | 
               | Actually, I do this all the time and it works great. Keep
               | practicing!
               | 
               | In general, the stochastic parrot argument is oft-
               | repeated but fails to recognize the general capabilities
               | of machine learning. We're not talking about basic Markov
               | chains, here. There are literally academic benchmarks
               | against which transformers have blown away all initial
               | expectations, and they continue to incrementally improve.
               | Getting caught up criticizing the crudeness of a new,
               | revolutionary tool is definitely my idea of
               | unimaginative.
        
               | Ygg2 wrote:
               | > Hacker News is not some strange place where the normal
               | rules of discourse don't apply. I assume you are familiar
               | with the function of quotation marks.
               | 
               | Language is all about context. I wasn't trying to be
               | deceitful. And on HN I've never seen anyone using
               | quotation marks to quote people.
               | 
               | > Writing good documentation is certainly a challenging
               | task.
               | 
               | Doctests isn't same as writing documentation. Doctest are
               | the simplest form of documentation. Given function named
               | so and so write API doc + example. It could not even
               | write example that passed syntax check.
               | 
               | > Actually, I do this all the time and it works great.
               | Keep practicing!
               | 
               | Then you haven't given it interesting/complex enough
               | problems.
               | 
               | Also this isn't about practice. It's about its
               | capabilities.
               | 
               | > In general, the stochastic parrot argument is oft-
               | repeated but fails to recognize the general capabilities
               | of machine learning.
               | 
               | I gave it write YAML parser given Yaml org spec, and it
               | wrote following struct:                  enum Yaml {
               | Scalar(String),           List(Vec<Box<Yaml>>),
               | Map(HashMap<String, Box<Yaml>>),        }
               | 
               | This is the stochastic parrot in action. Why? Because it
               | tried to pass of JSON like structure as YAML.
               | 
               | Whatever LLM's are they aren't intelligent. Or they have
               | attention spans of a fruit fly and can't figure out basic
               | differences.
        
               | williamcotton wrote:
               | That's not a good prompt, my friend!
        
               | williamcotton wrote:
               | I know plenty of fantastic engineers that use LLM tools
               | as code assistants.
               | 
               | I'm not sure when and why reading documentation and man
               | pages became a sign of a lack of skill. Watch a
               | presentation by someone like Brian Kernighan and you'll
               | see him joke about looking up certain compiler flags for
               | the thousandth time!
               | 
               | Personally I work in C, C#, F#, Java, Kotlin, Swift, R,
               | Ruby, Python, Postgres SQL, MySQL SQL, TypeScript, node,
               | and whatever hundreds of libraries and DSLs are built on
               | top. Yes, I have to look up documentation and with
               | regularity.
        
               | FpUser wrote:
               | Same opinion here. I work with way too many things to
               | keep everything in my head. I'd rather use my head for
               | design than to remember every function and parameter of
               | say STL
        
               | specialist wrote:
               | For me, thus far, LLMs help me forage docs. I know what I
               | want and it helps me narrow my search faster. Watching
               | adepts like Simon Willison wield LLMs is on my to do
               | list.
        
               | fragmede wrote:
               | Add Golang and rust and JavaScript and next.js and react
               | to the list for me. ;) If you live and work and breathe
               | in the same kernel, operating system, and user space, and
               | don't end up memorizing the various bits of minutiae, I'd
               | judge you (and me) too, but it's not the 2000's, or the
               | 90's or even the 80's anymore, and some of us don't have
               | the luxury, or have chosen not to, live in one small
               | niche for our entire career. At the end of the day, the
               | client doesn't care what language you use, or the
               | framework, or even the code quality, as long as it works.
               | What they don't want to pay for is overage, and taking
               | the previous developer's work and refactoring it and
               | rewriting it in your preferred language isn't high value
               | work, so you pick up whatever they used and run with it.
               | Yeah that makes me less fluent in that one particular
               | thing, not having done the same thing for 20+ years, but
               | that's not where I deliver value. Some people do, and
               | that's great for them and their employers, but my
               | expertise lies elsewhere. I got real good at MFC, back in
               | the day, and then WX and Qt and I'm working on getting
               | good at react and such.
        
               | jimmaswell wrote:
               | At the risk of sounding like an inflated ego: I'm very
               | good at what I do, the rest of my team frequently looks
               | to me for guidance, my boss and boss's boss etc. have
               | repeatedly said I'm among the most valuable people
               | around, and I'm the one turned to in emergencies, for
               | difficult architectural decisions, and to lead projects.
               | I conceptually understand the ecosystem I work in very
               | well at every layer.
               | 
               | What I'm not good at is memorizing API's and libraries
               | that all use different verbs and nouns for the same
               | thing, and other such things that are immaterial to the
               | actual work. How do you use a mutation observer again?
               | Hell if I remember the syntax but I know the concept, and
               | copilot will probably spit out what I want, and I'll
               | easily verify the output. Or how do you copy an array in
               | JS? Or print a stack trace? Or do a node walk? You can
               | either wade through google and stackoverflow, or copilot
               | can tell you instantly. And I can very quickly tell if
               | the code copilot gave me is sensible or not.
        
               | framapotari wrote:
               | If you're already a competent developer, I think that's a
               | reasonable expectation of impact on productivity. I think
               | the 'life-changing' part comes in helping someone get to
               | the point of building things with code where before they
               | couldn't (or believed they couldn't). It does a lot
               | better job of turning the enthusiasts and code-curious
               | into amateurs vs. empowering professionals.
        
               | jimmaswell wrote:
               | > turning the enthusiasts and code-curious into amateurs
               | vs. empowering professionals.
               | 
               | I'm firmly in #2. My other comment goes over how.
               | 
               | I'm intrigued to see how devs in #1 grow. One might be
               | wary those devs would grow into bad habits and not
               | thinking for themselves, but it might be a case of the
               | ancient Greek rant against written books hindering
               | memorization. Could be that they'll actually grow to be
               | even better devs unburdened by time wasted on trivial
               | details.
        
               | haakonhr wrote:
               | And here you're assuming that making software engineers
               | more productive would be a service to the world. I think
               | the jury is out on that one as well. At least for the
               | majority of software engineering since 2010.
        
               | pcl wrote:
               | Isn't this paper pretty much about spending a few short
               | years to improve the performance? Or are you arguing that
               | the same people who made breakthroughs over the last few
               | years should have also done the optimization work?
        
               | fuzzfactor wrote:
               | >the same people who made breakthroughs over the last few
               | years should have also done the optimization work
               | 
               | I never thought it would be ideal if it was otherwise, so
               | I guess so.
               | 
               | When I first considered neural nets from state-of-the art
               | vendors to assist with some non-linguistic situations
               | over 30 years ago, it wasn't quite ready for prime time
               | and I could accept that.
               | 
               | I just don't have generic situations all the time which
               | would benefit me, so it's clearly my problems that have
               | the deficiencies ;\
               | 
               | What's being done now with all the resources being thrown
               | at it is highly impressive, and gaining all the time, no
               | doubt about it. It's nice to know there are people that
               | can afford it.
               | 
               | I truly look forward to more progress, and this may be
               | the previously unreached milestone I have been detecting
               | that might be a big one.
               | 
               | Still not good enough for what I need yet so far though.
               | And I can accept that as easily as ever.
               | 
               | That's why I put up my estimation that not all of those
               | 30+ years has been spent without agonizing over something
               | ;)
        
               | Scene_Cast2 wrote:
               | This is a bit like recommending to skip vacuum tubes,
               | think hard and invent transistors.
        
               | fuzzfactor wrote:
               | This is kind of thought-provoking.
               | 
               | That is a good correlation when you think about how much
               | more energy-efficient transistors are than vacuum tubes.
               | 
               | Vacuum tube computers were a thing for a while, but it
               | was more out of desperation than systematic intellectual
               | progress.
               | 
               | OTOH you could look at the present accomplishments like
               | it was throwing more vacuum tubes at a problem that can
               | not be adequately addressed that way.
               | 
               | What turned out to be a solid-state solution was a
               | completely different approach from the ground up.
               | 
               | To the extent a more power-saving technique _using the
               | same hardware_ is only a matter of different software
               | approaches, that would be something that realistically
               | could have been accomplished before so much energy was
               | expended.
               | 
               | Even though I've always thought application-specific
               | circuits would be what really helps ML and AI a lot, and
               | that would end up not being the exact same hardware at
               | all.
               | 
               | If power is truly being wasted enough to _start_ rearing
               | its ugly head, somebody should be able to figure out how
               | to fix it before it gets out-of-hand.
               | 
               | Ironically enough with my experience using vacuum tubes,
               | I've felt that there were some serious losses in
               | technology when the research momentum involved was so
               | rapidly abandoned in favor of "solid-state everything" at
               | any cost.
               | 
               | Maybe it is a good idea to abandon the energy-intensive
               | approaches, as soon as anything completely different
               | that's the least bit promising can barely be seen by a
               | gifted visionary to have a glimmer of potential.
        
               | michaelmrose wrote:
               | This comment lives in a fictional world where there is a
               | singular group that could have collectively acted
               | counterfactually. In the real world any actor that
               | individually went this route would have gone bankrupt
               | while the others collected money by showing actual
               | results even if ineffeciently earned.
        
               | newyankee wrote:
               | Also it is likely that the rise of LLMs gave many
               | researchers in allied fields the impetus to tackle with
               | the problems that are relevant to making it more
               | efficient and people stumbled upon a solution hiding
               | there.
               | 
               | The momentum with LLMs and allied technology may last
               | till it keeps on improving even by a few percentage
               | points and keeps shattering human created new benchmarks
               | every few months
        
               | VagabundoP wrote:
               | That's just not how progress works.
               | 
               | Its iteritive, there are plenty of cul-de-sacs and
               | failures. You can't really optimise until you have
               | something that works and its a messy process that is
               | inefficient.
               | 
               | You're looking at this with hindsight.
        
             | 3abiton wrote:
             | This is still amazing work, imagine running chungus models
             | on a single 3090.
        
         | etcd wrote:
         | I feel like I have seen this idea a few times but don't recall
         | where but stuff posted via HN.
         | 
         | Here https://news.ycombinator.com/item?id=41784591 but even
         | before that. It is possibly one of those obvious ideas to
         | people steeped in this.
         | 
         | To me intuitively using floats to make ultimatelty boolean like
         | decisions seems wasteful but that seemed like the way it had to
         | be to have diffetentiable algorithms.
        
       | robomartin wrote:
       | I posted this about a week ago:
       | 
       | https://news.ycombinator.com/item?id=41816598
       | 
       | This has been done for decades in digital circuits, FPGA's,
       | Digital Signal Processing, etc. Floating point is both resource
       | and power intensive and using FP without the use of dedicated FP
       | processing hardware is something that has been avoided and done
       | without for decades unless absolutely necessary.
        
         | ujikoluk wrote:
         | Explain more for the uninitiated please.
        
         | ausbah wrote:
         | a lot of things in the ML research space are rebranding an old
         | concept w a new name as "novel"
        
         | fidotron wrote:
         | Right, the ML people are learning, slowly, about the importance
         | of optimizing for silicon simplicity, not just reduction of
         | symbols in linear algebra.
         | 
         | Their rediscovery of fixed point was bad enough but the "omg if
         | we represent poses as quaternions everything works better"
         | makes any game engine dev for the last 30 years explode.
        
       | kayo_20211030 wrote:
       | Extraordinary claims require extraordinary evidence. Maybe it's
       | possible, but consider that some really smart people, in many
       | different groups, have been working diligently in this space for
       | quite a while; so claims of 95% savings on energy costs _with
       | equivalent performance_ is in the extraordinary category. Of
       | course, we'll see when the tide goes out.
        
         | vlovich123 wrote:
         | They've been working on unrelated problems like structure of
         | the network or how to build networks with better results. There
         | have been people working on improving the efficiency of the
         | low-level math operations and this is the culmination of those
         | groups. Figuring this stuff out isn't super easy.
        
         | throwawaymaths wrote:
         | I don't think this claim is extraordinary. Nothing proposed is
         | mathematically impossible or even unlikely, just a pain in the
         | ass to test (lots of retraining, fine tuning etc, and those
         | operations are expensive when you dont have already massively
         | parallel hardware available, otherwise you're ASIC/FPGAing for
         | something with a huge investment risk)
         | 
         | If I could have a SWAG at it I would say a low resolution model
         | like llama-2 would probably be just fine (llama-2 quantizes
         | without too much headache) but a higher resolution model like
         | llama-3 probably not so much, not without massive retraining
         | anyways.
        
         | Randor wrote:
         | The energy claims up to ~70% can be verified. The inference
         | implementation is here:
         | 
         | https://github.com/microsoft/BitNet
        
           | kayo_20211030 wrote:
           | I'm not an AI person, in any technical sense. The savings
           | being claimed, and I assume verified, are on ARM and x86
           | chips. The piece doesn't mention swapping mult to add, and a
           | 1-bit LLM is, well, a 1-bit LLM.
           | 
           | Also,
           | 
           | > Additionally, it reduces energy consumption by 55.4% to
           | 70.0%
           | 
           | With humility, I don't know what that means. It seems like
           | some dubious math with percentages.
        
             | Randor wrote:
             | > I don't know what that means. It seems like some dubious
             | math with percentages.
             | 
             | I would start by downloading a 1.58 model such as:
             | https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens
             | 
             | Run the non-quantized version of the model on your
             | 3090/4090 gpu and observe the power draw. Then load the
             | 1.58 model and observe the power usage. Sure, the numbers
             | have a wide range because there are many gpu/npu to make
             | the comparison.
        
               | kayo_20211030 wrote:
               | Good one!
        
             | sroussey wrote:
             | Not every instruction on a CPU or GPU uses the same amount
             | of power. So if you could rewrite your algorithm to use
             | more power efficient instructions (even if you technically
             | use more of them), you can save overall power draw.
             | 
             | That said, time to market has been more important than any
             | cares of efficiency for some time. Now and in the future,
             | there is more of a focus on it as the expenses in equipment
             | and power have really grown.
        
           | littlestymaar wrote:
           | How does the liked article relate to BitNet at all? It's
           | about the "addition is all you need" paper which AFAIK is
           | unrelated.
        
             | Randor wrote:
             | Yeah, I get what you're saying but both are challenging the
             | current MatMul methods. The L-Mul paper claims "a power
             | savings of 95%" and that is the thread topic. Bitnet proves
             | that at least 70% is possible by getting rid of MatMul.
        
         | manquer wrote:
         | It is a click bait headline the claim itself is not
         | extraordinary. the preprint from arxiv was posted here some
         | time back .
         | 
         | The 95% gains is specifically only for multiplication
         | operations, inference is compute light and memory heavy in the
         | first place so the actual gains would be far less smaller .
         | 
         | Tech journalism (all journalism really) can hardly be trusted
         | to publish grounded news with the focus on clicks and revenue
         | they need to survive.
        
           | kayo_20211030 wrote:
           | Thank you. That makes sense.
        
           | rob_c wrote:
           | Bingo,
           | 
           | We have a winner. Glad that came from someone not in my
           | lectures on ML network design
           | 
           | Honestly, thanks for beeting me to this comment
        
           | ksec wrote:
           | >Tech journalism (all journalism really) can hardly be
           | trusted to publish grounded news with the focus on clicks and
           | revenue they need to survive.
           | 
           | Right now the only way to gain real knowledge is actually to
           | read comments of those articles.
        
         | kayo_20211030 wrote:
         | re: all above/below comments. It's still an extraordinary
         | claim.
         | 
         | I'm not claiming it's not possible, nor am I claiming that it's
         | not true, or, at least, honest.
         | 
         | But, there will need to be evidence that using real machines,
         | and using real energy an _equivalent performance_ is
         | achievable. A defense that "there are no suitable chips" is a
         | bit disingenuous. If the 95% savings actually has legs some
         | smart chip manufacturer will do the math and make the chips. If
         | it's correct, that chip making firm will make a fortune. If
         | it's not, they won't.
        
           | throwawaymaths wrote:
           | > If the 95% savings actually has legs some smart chip
           | manufacturer will do the math and make the chips
           | 
           | Terrible logic. By a similar logic we wouldn't be using
           | python for machine learning at all, for example (or x86 for
           | compute). Yet here we are.
        
             | kayo_20211030 wrote:
             | What's wrong with the logic? A caveat in the paper is that
             | the technique will save 95% energy _but_ that the technique
             | will not run efficiently on current chips. I 'm saying that
             | if the new technique needs new chips and saves 95% of
             | energy costs with the same performance, someone will make
             | the chips. I say nothing about how and why we do ML as we
             | do today - the 100% energy usage level.
        
         | stefan_ wrote:
         | I mean, all these smart people would rather pay NVIDIA all
         | their money than make AMD viable. And yet they tell us its all
         | MatMul.
        
           | kayo_20211030 wrote:
           | Both companies are doing pretty well. Why don't you think AMD
           | is viable?
        
             | nelup20 wrote:
             | AMD's ROCm just isn't there yet compared to Nvidia's CUDA.
             | I tried it on Linux with my AMD GPU and couldn't get things
             | working. AFAIK on Windows it's even worse.
        
               | mattalex wrote:
               | That entirely depends on what AMD device you look at:
               | gaming GPUs are not well supported, but their instinct
               | line of accelerators works just as well as cuda. keep in
               | mind that, in contrast to Nvidia, AMD uses different
               | architectures for compute and gaming (though they are
               | changing that in the next generation)
        
             | redleader55 wrote:
             | The litimus test would be if you read in the news that
             | Amazon, Microsoft, Google or Meta just bought billions in
             | GPUs from AMD.
             | 
             | They are and have been buying AMD CPUs for a while now,
             | which says something about AMD and Intel.
        
               | JonChesterfield wrote:
               | Microsoft and Meta are running customer facing LLM
               | workloads on AMD's graphics cards. Oracle seems to like
               | them too. Google is doing the TPU thing with Broadcom and
               | Amazon seems to have decided to bet on Intel (in a
               | presumably fatal move but time will tell). We'll find
               | some more information on the order book in a couple of
               | weeks at earnings.
               | 
               | I like that the narrative has changed from "AI only runs
               | on Cuda" to "sure it runs fine on AMD if you must"
        
           | dotnet00 wrote:
           | It's not their job to make AMD viable, it's AMD's job to make
           | AMD viable. NVIDIA didn't get their position for free, they
           | spent a decade refining CUDA and its tooling before GPU-based
           | crypto and AI kicked off.
        
       | syntaxing wrote:
       | I'm looking forward to Bitnet adaptation. MS just released a tool
       | for it similar to llamacpp. Really hoping major models get
       | retrained for it.
        
       | andrewstuart wrote:
       | The ultimate "you're doing it wrong".
       | 
       | For he sake of the climate and environment it would be nice to be
       | true.
       | 
       | Bad news for Nvidia. "Sell your stock" bad.
       | 
       | Does it come with a demonstration?
        
         | talldayo wrote:
         | > Bad news for Nvidia. "Sell your stock" bad.
         | 
         | People say this but then the fastest and most-used
         | implementation of these optimizations is always written in
         | CUDA. If this turns out to not be a hoax, I wouldn't be
         | surprised to see Nvidia prices _jump_ in correlation.
        
         | mouse_ wrote:
         | Hypothetically, if this is true and simple as the headline
         | implies -- AI using 95% less power doesn't mean AI will use 95%
         | less power, it means we will do 20x more AI. As long as it's
         | the current fad, we will throw as much power and resources at
         | this as we can physically produce, because our economy depends
         | on constant, accelerating growth.
        
           | etcd wrote:
           | True. A laptop power pack wattage is probably pretty much
           | unchanged over 30 years for example.
        
         | Dylan16807 wrote:
         | Bad news for Nvidia how? Even ignoring that the power savings
         | are only on one type of instruction, 20x less power doesn't
         | mean it runs 20x faster. You still need big fat GPUs.
         | 
         | If this increases integer demand and decreases floating point
         | demand, that moderately changes future product design and
         | doesn't do much else.
        
         | Nasrudith wrote:
         | Wouldn't reduced power consumption for an unfulfilled demand
         | mean more demand for Nvida as they now need more chips to max
         | out amount of power usage to capacity? (As concentration tends
         | to be the more efficient way.)
        
       | idiliv wrote:
       | Duplicate, posted on October 9:
       | https://news.ycombinator.com/item?id=41784591
        
       | asicsarecool wrote:
       | Don't assume this isn't already in place at the main AI companies
        
       | DesiLurker wrote:
       | validity of the claim aside, why dont they say reduces by 20
       | times instead of 95%. its much better perspective of a fraction
       | when fraction is tiny.
        
       | hello_computer wrote:
       | How does this differ from Cussen & Ullman?
       | 
       | https://arxiv.org/abs/2307.01415
        
         | selimthegrim wrote:
         | Cussen is an HN poster incidentally.
        
       | GistNoesis wrote:
       | Does https://en.wikipedia.org/wiki/Jevons_paradox apply in this
       | case ?
        
         | narrator wrote:
         | Of course. Jevons paradox always applies.
        
         | gosub100 wrote:
         | Not necessarily a bad thing: this might give the AI charlatans
         | enough time to actually make something useful.
        
         | mattxxx wrote:
         | That's interesting.
         | 
         | Obviously, energy cost creates a barrier to entry, so reduction
         | of cost reduces the barrier to entry... which adds more
         | players... which increases demand.
        
       | panosv wrote:
       | Lemurian Labs looks like it's doing something similar:
       | https://www.lemurianlabs.com/technology They use the Logarithmic
       | Number System (LNS)
        
       | andrewstuart wrote:
       | Here is the Microsoft implementation:
       | 
       | https://github.com/microsoft/BitNet
        
       | quantadev wrote:
       | I wonder if someone has feed this entire "problem" into the
       | latest Chat GPT-01 (the new model with reasoning capability), and
       | just fed it in all the code for a Multilayer Perceptron and then
       | given it the task/prompt of finding ways to implement the same
       | network using only integer operations.
       | 
       | Surely even the OpenAI devs must have done this like the minute
       | they got done training that model, right? I wonder if they'd even
       | admit it was an AI that came up with the solution rather than
       | just publishing it, and taking credit. haha.
        
         | chx wrote:
         | You are imaging LLMs are capable of much more than they
         | actually are. Here's the _only_ thing they are good for.
         | 
         | https://hachyderm.io/@inthehands/112006855076082650
         | 
         | > You might be surprised to learn that I actually think LLMs
         | have the potential to be not only fun but genuinely useful.
         | "Show me some bullshit that would be typical in this context"
         | can be a genuinely helpful question to have answered, in code
         | and in natural language -- for brainstorming, for seeing common
         | conventions in an unfamiliar context, for having something
         | crappy to react to.
         | 
         | > Alas, that does not remotely resemble how people are pitching
         | this technology.
        
           | quantadev wrote:
           | No, I'm not imagining things. You are, however, imaging
           | (incorrectly) that I'm not an expert with AI who's already
           | seen superhuman performance out of LLM prompts in the vast
           | majority of every software development question I've ever
           | asked them, starting all the way back at GPT-3.5.
        
             | dotnet00 wrote:
             | So, where are all of your world changing innovations driven
             | by these superhuman capabilities?
        
               | quantadev wrote:
               | You raise a great point. Maybe I should be asking the AI
               | for more career advice or new product ideas, rather than
               | just letting it merely solve each specific coding
               | challenge.
        
       | didgetmaster wrote:
       | Maybe I am just a natural skeptic, but whenever I see a headline
       | that says 'method x reduces y by z%'; but when you read the text
       | it instead says that optimizing some step 'could potentially
       | reduce y by up to z%'; I am suspicious.
       | 
       | Why not publish some actual benchmarks that prove your claim in
       | even a few special cases?
        
         | TheRealPomax wrote:
         | Because as disappointing as modern life is, you need clickbait
         | headlines to drive traffic. You did the right thing by reading
         | the article though, that's where the information is, not the
         | title.
        
           | phtrivier wrote:
           | Fair enough, but then I want a way to penalize publishers for
           | abusing clickbait. There is no "unread" button, and there is
           | no way to unsubscribe to advertisement-based sites.
           | 
           | Even on sites that have a "Like / Don't like" button, my
           | understanding is that clicking "Don't like" is a form of
           | "engagement", that the suggestion algorithm are going to
           | reward.
           | 
           | Give me a button that says "this article was a scam", and
           | have the publisher give the advertisement money back. Of
           | better yet, give the advertisement money to charity / public
           | services / whatever.
           | 
           | Take a cut of the money being transfered, charge the
           | publishers for being able to get a "clickbait free" green
           | mark if they implement the scheme.
           | 
           | Track the kind of articles that generate the most clickbait-
           | angry comment. Sell back the data.
           | 
           | There might a business model.
        
             | NineStarPoint wrote:
             | I doubt there's a business model there because who is going
             | to opt in to a scheme that loses them money?
             | 
             | What could work is social media giving people an easy
             | button to block links to specific websites from appearing
             | in their feed, or something along those lines. It's a nice
             | user feature, and having every clickbait article be a
             | chance someone will choose to never see your website again
             | could actually reign in some of the nonsense.
        
               | phtrivier wrote:
               | > I doubt there's a business model there because who is
               | going to opt in to a scheme that loses them money?
               | 
               | Agreed, of course.
               | 
               | In a reasonable world, that could be considered part of
               | the basic, law mandated requirements. It would be blurry
               | and subject to interpretation to decide what is clickbait
               | or not, just like libel or defamation - good thing we're
               | only a few hundred years away from someone reinventing a
               | device to handle that, called "independent judges".
               | 
               | In the meantime, I suppose you would have to bring some
               | "unreasonable" thing to it, like "brands like to have
               | green logos on their sites to brag" ?
               | 
               | > What could work is social media giving people an easy
               | button to block links to specific websites from appearing
               | in their feed, or something along those lines.
               | 
               | I completely agree. It's a feature they have had the
               | technology to implement such a thing since forever, and
               | they've decided against it since forever.
               | 
               | However I wonder if that's something a browser extension
               | could handle ? A merge of AdBlock and "saved you a click"
               | that displays the "boring" content of the link when you
               | hoveron a clickbaity link ?
        
           | keybored wrote:
           | Headlines: what can they do, they need that for the traffic
           | 
           | Reader: do the moral thing and read the article, not just the
           | title
           | 
           | How is that balanced.
        
         | baq wrote:
         | OTOH you have a living proof that an amazingly huge neural
         | network can work on 20W of power, so expecting multiple orders
         | of magnitude in power consumption reduction is not
         | unreasonable.
        
           | etcd wrote:
           | Mitochondria are all you need.
           | 
           | Should be able to go more efficient as the brain has other
           | constraints such as working at 36.7 degrees C etc.
        
         | dragonwriter wrote:
         | Well, one, because the headline isn't from the researchers, its
         | from a popular press report (not even the one posted here,
         | originally, this is secondary reporting of another popular
         | press piece) and isn't what the paper claims so it would be odd
         | for the paper's authors to conduct benchmarks to justify it.
         | (And, no, even the "up to 95%" isn't from the paper, the cost
         | savings are cited per operation depending on operation and the
         | precision the operation is conducted at, are as high as 97.3%,
         | are based on research already done establishing the energy cost
         | of math operations on modern compute hardware, but no end-to-
         | end cost savings claim is made.)
         | 
         | And, two, because the actual energy cost savings claimed aren't
         | even the experimental question -- the energy cost differences
         | between various operations on modern hardware have been
         | established in other research, the experimental issue here was
         | whether the mathematical technique that enables using the lower
         | energy cost operations performs competitively on output quality
         | with existing implementations when substituted in for LLM
         | inference.
        
         | andrewstuart wrote:
         | https://github.com/microsoft/BitNet
         | 
         | "The first release of bitnet.cpp is to support inference on
         | CPUs. bitnet.cpp achieves speedups of 1.37x to 5.07x on ARM
         | CPUs, with larger models experiencing greater performance
         | gains. Additionally, it reduces energy consumption by 55.4% to
         | 70.0%, further boosting overall efficiency. On x86 CPUs,
         | speedups range from 2.37x to 6.17x with energy reductions
         | between 71.9% to 82.2%. Furthermore, bitnet.cpp can run a 100B
         | BitNet b1.58 model on a single CPU, achieving speeds comparable
         | to human reading (5-7 tokens per second), significantly
         | enhancing the potential for running LLMs on local devices. More
         | details will be provided soon."
        
           | jdiez17 wrote:
           | Damn. Seems almost too good to be true. Let's see where this
           | goes in two weeks.
        
             | andrewstuart wrote:
             | Intel and AMD will be extremely happy.
             | 
             | Nvidia will be very unhappy.
        
               | l11r wrote:
               | Their GPU will still be needed to do training. As far as
               | I understand this will improve only interference
               | performance and efficiency.
        
       | littlestymaar wrote:
       | Related: https://news.ycombinator.com/item?id=41784591 10 days
       | ago
        
       | greenthrow wrote:
       | The trend of hyping up papers too early on is eroding people's
       | faith in science due to poor journalism failing to explain that
       | this is theoretical. The outlets that do this should pay the
       | price but they don't, because almost every outlet does it.
        
       | holoduke wrote:
       | I don't think algorithms will change energy consumption. There is
       | always max capacity needed in terms of computing. If tomorrow a
       | new algorithm increases the performance 4 times, we will just
       | have 4 times more computing.
        
       | neuroelectron wrote:
       | Nobody is interested in this because nobody wants less capex.
        
       | ein0p wrote:
       | As a rule, compute only takes less than 10% of all energy. 90% is
       | data movement.
        
       | tartakovsky wrote:
       | original paper: https://news.ycombinator.com/item?id=41784591
        
       | m3kw9 wrote:
       | This sounds similar to someone saying room temp super conductor
       | was discovered
        
       | Art9681 wrote:
       | In the end the power consumption means the current models that
       | are "good enough" will fit a much smaller compute budget such as
       | edge devices. However, enthusiasts are still going to want the
       | best hardware they can afford because inevitably, everyone will
       | want to maximize the size and intelligence of a model they can
       | run. So we're just going to scale. This might bring a GPT-4 level
       | to edge devices, but we are still going to want to run what might
       | resemble a GPT-5/6 model on the best hardware possible at the
       | time. So don't throw away your GPU's yet. This will bring
       | capabilities to mass market, but your high end GPU will still
       | scale the solution n-fold and youll be able to run models with
       | disregard to the energy savings promoted in the headline.
       | 
       | In other sensationalized words: "AI engineers can claim new
       | algorithm allows them to fit GPT-5 in an RTX5090 running at 600
       | watts."
        
       | jart wrote:
       | It's a very crude approximation, e.g. 1.75 * 2.5 == 3 (although
       | it seems better as the numbers get closer to 0).
       | 
       | I tried implementing this for AVX512 with tinyBLAS in llamafile.
       | inline __m512 lmul512(__m512 x, __m512 y) {             __m512i
       | sign_mask = _mm512_set1_epi32(0x80000000);             __m512i
       | exp_mask = _mm512_set1_epi32(0x7F800000);             __m512i
       | mant_mask = _mm512_set1_epi32(0x007FFFFF);             __m512i
       | exp_bias = _mm512_set1_epi32(127);             __m512i x_bits =
       | _mm512_castps_si512(x);             __m512i y_bits =
       | _mm512_castps_si512(y);             __m512i sign_x =
       | _mm512_and_si512(x_bits, sign_mask);             __m512i sign_y =
       | _mm512_and_si512(y_bits, sign_mask);             __m512i exp_x =
       | _mm512_srli_epi32(_mm512_and_si512(x_bits, exp_mask), 23);
       | __m512i exp_y = _mm512_srli_epi32(_mm512_and_si512(y_bits,
       | exp_mask), 23);             __m512i mant_x =
       | _mm512_and_si512(x_bits, mant_mask);             __m512i mant_y =
       | _mm512_and_si512(y_bits, mant_mask);             __m512i
       | sign_result = _mm512_xor_si512(sign_x, sign_y);
       | __m512i exp_result = _mm512_sub_epi32(_mm512_add_epi32(exp_x,
       | exp_y), exp_bias);             __m512i mant_result =
       | _mm512_srli_epi32(_mm512_add_epi32(mant_x, mant_y), 1);
       | __m512i result_bits = _mm512_or_si512(
       | _mm512_or_si512(sign_result, _mm512_slli_epi32(exp_result, 23)),
       | mant_result);             return
       | _mm512_castsi512_ps(result_bits);         }
       | 
       | Then I used it for Llama-3.2-3B-Instruct.F16.gguf and it
       | outputted jibberish. So you would probably have to train and
       | design your model specifically to use this multiplication
       | approximation in order for it to work. Or maybe I'd have to tune
       | the model so that only certain layers and/or operations use the
       | approximation. However the speed was decent. Prefill only dropped
       | from 850 tokens per second to 200 tok/sec on my threadripper.
       | Prediction speed was totally unaffected, staying at 34 tok/sec. I
       | like how the code above generates vpternlog ops. So if anyone
       | ever designs an LLM architecture and releases weights on Hugging
       | Face that use this algorithm, we'll be able to run them
       | reasonably fast without special hardware.
        
         | raluk wrote:
         | Your kernel seems to be incorrect for 1.75 * 2.5. From paper we
         | have 1.75 == (1+0.75)*2^0 for 2.5 == (1+0.25)*2^1 so result is
         | (1+0.75+0.25+2^-4)*2^1 == 4.125 (correct result is 4.375)
        
           | raluk wrote:
           | Extra. I am not sure if that is clear from paper, but in
           | example of 1.75 * 2.5 we can represent 1.75 also as (1-0.125)
           | * 2. This gives good aproximations for numbers that are close
           | but less than power of 2. This way abs(a*b) in (1+a)*(1+b) is
           | allways small and strictly less than 0.25.
           | 
           | Another example, if we have for example 1.9 * 1.9 then we
           | need to account for overflow in (0.9 + 0.9) and this seems to
           | induce similar overhead as expressing numbers as (1-0.05)*2 .
        
       | nprateem wrote:
       | Is it the one where you delete 95% of user accounts?
        
       | m463 wrote:
       | So couldn't you design a GPU that uses or supports this algorithm
       | to use the same power, but use bigger models, better models, or
       | do more work?
        
       | Wheatman wrote:
       | Isnt 90% of the enrgy spent moving bytes around? Why would this
       | have such a great affect?
        
       | faragon wrote:
       | Before reading the article I was expecting using 1-bit instead of
       | bfloats, and using logical operators instead of arithmetic.
        
       | svilen_dobrev wrote:
       | i am not well versed in the math involved, but IMO if the outcome
       | depends mostly on the differences between them numbers, as
       | smaller-or-bigger distinction as well as their magnitudes, then
       | exactness might not be needed. i mean, as long as the approximate
       | "function" looks similar to the exact one, that might be good
       | enough.
       | 
       | Maybe even generate a table of the approximate results and use
       | that, in various stages? Like the way sin/cos was done 30y ago
       | before FP coprocessors arrived
        
       | gcanyon wrote:
       | This isn't really the optimization I'm think about, but: given
       | the weird and abstract nature of the functioning of ML in general
       | and LLMs in particular, it seems reasonable to think that there
       | might be algorithms that achieve the same, or a similar, result
       | in an orders-of-magnitude more efficient way.
        
       | DennisL123 wrote:
       | This is a result on 8 bit numbers, right? Why not precompute all
       | 64k possible combinations and look up the results from the table?
        
       | jhj wrote:
       | As someone who has worked in this space (approximate compute) on
       | both GPUs and in silicon in my research, the power consumption
       | claims are completely bogus, as are the accuracy claims:
       | 
       | > In this section, we show that L-Mul is more precise than fp8
       | e4m3 multiplications
       | 
       | > To be concise, we do not consider the rounding to nearest even
       | mode in both error analysis and complexity estimation for both
       | Mul and L-Mul
       | 
       | These two statements together are non-sensical. Sure, if you
       | analyze accuracy while ignoring the part of the algorithm that
       | gives you accuracy in the baseline you can derive whatever
       | cherry-picked result you want.
       | 
       | The multiplication of two floating point values if you round to
       | nearest even will be the correctly rounded result of multiplying
       | the original values at infinite precision, this is how floating
       | point rounding usually works and what IEEE 754 mandates for
       | fundamental operations if you choose to follow those guidelines
       | (e.g., multiplication here). But not rounding to nearest even
       | will result in a lot more quantization noise, and biased noise at
       | that too.
       | 
       | > applying the L-Mul operation in tensor processing hardware can
       | potentially reduce 95% energy cost by elementwise floating point
       | tensor multiplications and 80% energy cost of dot products
       | 
       | A good chunk of the energy cost is simply moving data between
       | memories (especially external DRAM/HBM/whatever) and along wires,
       | buffering values in SRAMs and flip-flops and the like.
       | Combinational logic cost is usually not a big deal. While having
       | a ton of fixed-function matrix multipliers does raise the cost of
       | combinational logic quite a bit, at most what they have will
       | probably cut the power of an overall accelerator by 10-20% or so.
       | 
       | > In this section, we demonstrate that L-Mul can replace tensor
       | multiplications in the attention mechanism without any loss of
       | performance, whereas using fp8 multiplications for the same
       | purpose degrades inference accuracy
       | 
       | I may have missed it in the paper, but they have provided no
       | details on (re)scaling and/or using higher precision accumulation
       | for intermediate results as one would experience on an H100 for
       | instance. Without this information, I don't trust these
       | evaluation results either.
        
       | DrNosferatu wrote:
       | Why they don't implement the algorithm in a FPGA to compare with
       | a classical baseline?
        
       ___________________________________________________________________
       (page generated 2024-10-20 23:02 UTC)