[HN Gopher] Advances in semiconductors are feeding the AI boom
___________________________________________________________________
Advances in semiconductors are feeding the AI boom
Author : mfiguiere
Score : 126 points
Date : 2024-03-28 15:16 UTC (7 hours ago)
(HTM) web link (spectrum.ieee.org)
(TXT) w3m dump (spectrum.ieee.org)
| tzm wrote:
| Cerebras WSE-3 contains 4 trillion transistors and 8 exaflops per
| sec, 20 PB bandwidth. 62 times the cores of an H100.. 900,000. I
| wonder if the WSE-3 can compete on price / performance though.
| Interesting times!
| winwang wrote:
| How's the single core-to-core bandwidth?
| jsheard wrote:
| Is anyone actually using those WSEs in anger yet? They're on
| their third generation now, but as far as I can tell the
| discussion of each generation consists of "Cerebras announces
| new giant chip" and then radio silence until they announce the
| next giant chip.
| JonChesterfield wrote:
| They sold some. Not strictly speaking the same as using any
| but there's a decent chance some code is running on the
| machines.
| rthnbgrredf wrote:
| Problem is Software. You can put out a XYZ trillion monster
| chip that beats anything hardware wise, but it is going
| nowhere if you don't have the tooling and massive community
| (like Nvidia has) to actually do some real A.I. stuff.
| IshKebab wrote:
| Unlikely. They cost so much that nobody is going to do
| research on them - at best it's porting existing models. And
| they're so different to GPUs that the porting effort is going
| to be enormous.
|
| They also suffer from the global optimisation problem for
| layout of calculations so compile time is going to be insane.
|
| Their WSE technology is also already obsolete - Tesla's chip
| does it in a much more logical and cost effective way.
| shrubble wrote:
| The Cerebras-2 is at the Pittsburgh Supercomputing Center.
| Not sure if they ordered a 3.
| monocasa wrote:
| > 62 times the cores of an H100.. 900,000.
|
| More than that arguably. CUDA cores are more like SIMD lanes
| than CPU cores like cerebras's usage of 'core'. Since they have
| 4 wide tensor ops on cerebras, there's arguably 3.6M CUDA
| equivalent cores.
| AnimalMuppet wrote:
| 9 trillion flops per core? That's... mind-boggling. Is that
| real?
|
| And, 9 trillion flops per core in 4.4 million transistors per
| core. That sounds a bit too good to be true.
| ramshanker wrote:
| When the limits of digital reaches the boundary of physics,
| Analogue is going make a comeback. Human brain feels nearer to
| Analogue than Digital. I will be surprised to reach AGI without
| nearing the Order-of-Magnitude of brain processing.
|
| We need that ONE paper on analogue to end this quest of trillions
| and counting transistors.
| Teever wrote:
| Won't that mean that we just change the quest from higher and
| higher density of digital elements to higher and higher density
| of analog elements?
|
| Like people weren't trying to make computers out of bigger and
| bigger tubes before the transistor, they were trying to make
| them out of smaller and smaller ones.
| bigyikes wrote:
| Computers are going to increase in size as they consume more
| and more power, requiring increasingly elaborate cooling
| systems.
|
| Like how the transistor made the big and hot vacuum tubes
| obsolete, maybe we'll see some analog breakthrough do the same
| thing to transistors, at least for AI.
|
| I doubt there is a world where we use analog for general
| purpose computing, but it seems perfect for messy,
| probabilistic processes like thinking.
| enlyth wrote:
| What's amazing is that the human brain does it all on the
| equivalent of like 20 watts of power. That's basically a
| couple of LED light bulbs.
| passion__desire wrote:
| Is there a comparison of power efficiency of human brain
| doing 50 digits multiplication vs a multiplier circuit
| doing it?
| gamacodre wrote:
| I think the problem here would be figuring out how much
| of the brain's power draw to attribute to the
| multiplication. A brain is more akin to a motherboard
| than a single CPU, with all kinds of I/O, internal
| regulation, and other ancillary stuff going on all the
| time.
| passion__desire wrote:
| Is the issue then we haven't discovered the magical
| algorithm run by our brain? If we discover it, then
| digital circuits will handsomely beat brain.
| gamacodre wrote:
| We can surely build more efficient and capable hardware
| than our current evolved wetware, since all of the
| details of _how_ to build it are generally externalized.
| If the chips had to fab themselves, it would be a
| different story.
|
| The software _is_ a different story. Sure, the brain does
| all sorts of things that aren 't necessary for $TASK, but
| we aren't necessarily going to be able to correctly
| identify which are which. Is your inner experience of
| your arm motion needed to fully parse the meaning in
| "raise a glass to toast the bride and groom", or respond
| meaningfully to someone who says that? Or perhaps it
| doesn't really matter - language is already a decent tool
| for bridging disjoint creature realities, maybe it'll
| stretch to synthetic consciousness too.
| passion__desire wrote:
| All of computation is realised by very few arithmetic
| operations. Then test energy efficiency of wetware and
| hardware on those operations. Then any difference can be
| attributed to algorithms.
| smallmancontrov wrote:
| What? Action potentials are extremely digital, in that small
| perturbations are suppressed so as to encode information in a
| higher level conceptual state (the "digit" / "bit" is the
| presence or absence of an AP) to obtain robustness and
| integration scale.
| jzombie wrote:
| I may have misinterpreted this comment by thinking you meant
| that as we squeeze more and more transistors into a small
| amount of space, the resulting waveforms would start to
| resemble analog more so than digital.
| knodi123 wrote:
| > I will be surprised to reach AGI without nearing the Order-
| of-Magnitude of brain processing.
|
| I have some theories that this isn't necessary. 1.) Just
| because the brain is a general-purpose machine great at doing
| lots of things, doesn't mean it's great at each of those
| things. Like when two people are playing catch, and one of them
| sees the first fragments of a parabola and estimates where the
| ball is going to land- a computer can calculate that way more
| efficiently than a mind, despite the fact that both are quick
| enough to get the job done. 2.) While the brain is great at,
| say, putting names to faces... a good CV machine can do the job
| almost as well, and can annotate a video stream in real-time.
|
| Combining 1.) the fact that some problems are much simpler to
| solve with classical algorithms instead of neural networking,
| and 2.) that many brain tasks can be farmed out to a
| coprocessor/service, my hypothesis is that the number of
| neurons/resources required to do the "secret sauce" part of agi
| could be greatly reduced.
| Galaxeblaffer wrote:
| I'm also in the camp that believes that we won't reach agi
| without significantly more compute. I think that
| consciousness is an emergent property, just like i think life
| itself is an emergent property, Both need a certain set
| elements/systems to work. The secret sauce may look simple
| when we recreate the right conditions, but it's not going to
| be something that makes it possible without the right
| hardware so to speak.
| knodi123 wrote:
| But, take for instance, sight. Sighted people use a large
| percentage of their brain to process visual input. People
| born congenitally blind are no smarter or dumber for their
| brains not having to process that input- so clearly that's
| not the secret sauce.
|
| I'm not convinced consciousness is emergent, I don't really
| have an opinion on _that_- but I'm > 50% convinced that
| consciousness itself doesn't _require_ a neural network as
| large as a human brain 's.
| bee_rider wrote:
| Analogue circuits can be a real pain in the butt, imagine if
| integer overflows destroyed the whole ALU, haha.
|
| Transistors are already much smaller than neurons. And of
| course the brain doesn't have a clock. And neurons have more
| complex behavior than single transistors... The whole system is
| just very different. So, this doesn't seem like a strategy to
| get past a boundary, it is more like a suggestion that we give
| up on the current path and go in a radically different
| direction. It... isn't impossible but it seems like a wild
| change for the field.
|
| If we want something post-cmos, potentially radically more
| efficient, but still familiar in the sense that it produces
| digital logic, quantum dot cellular automata with Bennett
| clocking seems more promising IMO.
| mwbajor wrote:
| I work in analog,
|
| 1) Noise is an issue as the system gets complex. You can't get
| away with counting to 1 anymore, all those levels in between
| matter. 2) Its hard to make an analog computer reconfigurable.
| 3) Analog computers exist commercially believe it or not, but
| for niche applications and essentially as coprocessors.
| jameshart wrote:
| Quantization of parameters in neural networks is roughly
| analogous to introducing noise into analog signals. We've got
| good evidence that these architectures are robust to
| quantization - which implies they could be implemented over
| noisy analog signals.
|
| Not sure who's working on that but I can't believe it's not
| being examined.
| floxy wrote:
| >We need that ONE paper on analogue
|
| https://www.analog.com/en/resources/analog-dialogue/articles...
| marcosdumay wrote:
| I doubt it. Digital just scales better.
|
| Our brain has a pretty bounded need of scaling, but once we
| create some computer equivalent, it would be very
| counterproductive to make it useless for larger problems for a
| small gain on smaller ones.
| smallmancontrov wrote:
| > Digital just scales better.
|
| Yes!
|
| > Our brain has a pretty bounded need of scaling
|
| No!
|
| Over aeons our brains scaled from several neurons to 100
| billion neurons, each with 1000 synapses. They were able to
| do it _because our brains are digital_. They lean on their
| digital nature even more than computer chips do.
|
| Action potentials are so digital it hurts. They aren't just
| quantized in level, but in the entire shape of the waveform
| across several milliseconds. Just as in computer chips, this
| suppresses perturbations. As long as higher level computation
| only depends on presence/absence of action potentials and
| timing, it inherits this robustness and allows scale. Rather
| than errors accumulating and preventing integration beyond a
| certain threshold, error resilience scales alongside
| computation. Every neuron "refreshes the signal," allowing
| arbitrary integration complexity at any scale, even in the
| face of messy biology problems along the way. Just like every
| transistor (or at least logic gate) "refreshes the signal" so
| that you can stack billions on a chip and quadrillions in
| sequential computation, even though each transistor is
| imperfect.
|
| Digital computation is the way. Always has been, always will
| be.
| ViktorRay wrote:
| This article has two authors.
|
| One author is chairman of TSMC.
|
| The other author is Chief Scientist of TSMC.
|
| This is important to note because they clearly know some stuff
| and we should listen.
| bigbillheck wrote:
| Two credited authors.
| GaggiX wrote:
| Yeah I also think that the impact of better hardware generates a
| visible lead in quality in the models that are released, for
| example: companies like OpenAI have had access to large
| quantities of H100 for a few months now and Sora is being
| presented, something I would not have believed a year ago, I
| would also believe that Claude 3 models were trained on H100,
| DBRX was trained on 12T tokens, a big difference compared to the
| 300B for the original GPT-3, the new NovelAI image generation was
| trained on H100 and compared to the previous model is like day
| and night. It seems to create a generational jump.
| declaredapple wrote:
| > companies like OpenAI have had access to large quantities of
| H100 for a few months now and Sora is being presented
|
| From what I could tell from Nvidia's recent presentation,
| Nvidia works directly with OpenAI to test their next gen
| hardware. IIRC they had some slides showing the throughput
| comparisons with Hopper and Blackwell, suggesting they used
| OpenAI's workload for testing.
|
| H100's have been generally available (not a long waitlist) for
| only several months, but all the big players had them already 1
| year ago.
|
| I agree with you, but I think you might be 1 generation behind.
|
| > OpenAI used H100's predecessor -- NVIDIA A100 GPUs -- to
| train and run ChatGPT, an AI system optimized for dialogue,
| which has been used by hundreds of millions of people worldwide
| in record time. OpenAI will be using H100 on its Azure
| supercomputer to power its continuing AI research.
|
| March 21, 2023 https://nvidianews.nvidia.com/news/nvidia-
| hopper-gpus-expand...
| GaggiX wrote:
| Very interesting, I guess it does make sense that GPT-4 was
| also trained on the Hopper architecture.
| jameshart wrote:
| For some sense of how far out/close 1 trillion transistors in one
| GPU is:
|
| NVIDIA just announced Blackwell which gets to 208bn transistors
| on a chip by stitching two dies together into a single GPU.
| https://www.nvidia.com/en-us/data-center/technologies/blackw...
|
| They're sticking two of them in a board with a Grace CPU in
| between, then linking 36 of those those boards together in racks
| with NVLink switches that offer "130TB/s of GPU bandwidth in one
| 72-GPU NVLink domain".
|
| In terms of marketing, NVidia calls one of those racks a GB200
| NVL72 "super GPU".
|
| So on one level NVIDIA would say they already have a GPU with
| 'trillions' of transistors.
| sylware wrote:
| GPUs can still scale up by a LOT. The real limiting parameters
| are power consumption (which includes cooling), bus speed,
| production level of foundries.
|
| Think super-[cross-fire/sli].
|
| Economics will probably forbid that. This is a virtual limit
| which factors in the previous physical limits... in theory.
| HarHarVeryFunny wrote:
| We already have a 4 trillion transistor "GPU" in the Cerebras
| WSE-3 (wafer-scale engine), used in Cerebras' data centers.
|
| https://www.youtube.com/watch?v=f4Dly8I8lMY
| DesiLurker wrote:
| how many regular sized GPU (say 4080 or nvidia L4) dies can be
| cut out from a full-sized wafer? I suppose thats what OP means
| by reaching integration density of 1T GPU.
| denimnerd42 wrote:
| here's some discussion about yields for H100 per wafer. I'd
| assume a 4080 is smaller? regardless, the calculator is
| supplied.
|
| https://news.ycombinator.com/item?id=38588876
| moralestapia wrote:
| I think the whole premise of TFA is flawed, as there already
| chips with way more than a trillion transistors (as GP points
| out).
|
| Arguing about what is the size limit to consider something a
| GPU or not is a bit like bikeshedding.
|
| As to why wouldn't a supercomputer be considered for this?
| Because it's not a single chip.
| IshKebab wrote:
| It's somewhere around 20 if you go as large as possible IIRC.
| IshKebab wrote:
| Yeah that doesn't really count. It's the equivalent to like 20
| GPUs and costs 200x as much.
| edward28 wrote:
| It's meant to compete with nvidia's DGX systems with 8 GPUs
| per node.
| highfrequency wrote:
| Wild that the human brain can squeeze in 100 trillion synapses (
| _very_ roughly analogous to model parameters / transistors) in a
| 3lb piece of meat that draws 20 Watts. The power efficiency
| difference may be explainable by the much slower frequency of
| brain computation (200 Hz vs. 2GHz).
|
| My impression is that the main obstacle to achieving a comparable
| volumetric density is that we haven't cracked 3d stacking of
| integrated circuits yet. Very exciting to see TSMC making inroads
| here:
|
| > Recent advances have shown HBM test structures with 12 layers
| of chips stacked using hybrid bonding, a copper-to-copper
| connection with a higher density than solder bumps can provide.
| Bonded at low temperature on top of a larger base logic chip,
| this memory system has a total thickness of just 600 um...We'll
| need to link all these chiplets together in a 3D stack, but
| fortunately, industry has been able to rapidly scale down the
| pitch of vertical interconnects, increasing the density of
| connections. And there is plenty of room for more. We see no
| reason why the interconnect density can't grow by an order of
| magnitude, and even beyond.
|
| It's hard to imagine _not_ getting unbelievable results when in
| 10-30 years we have GPUs with a comparable number of transistors
| to brain synapses that support computation speed 10,000x faster
| than the brain. What a thing to witness!
| user90131313 wrote:
| I think biggest difference is millions of years evolutionary
| development. That's a lot of time difference.
| londons_explore wrote:
| > is it just that we haven't cracked 3-d stacking of integrated
| circuits yet?
|
| Yes. If we could stack transistors in the Z dimension as
| closely as we do in X and Y, we'd easily exceed the brains
| density.
| npalli wrote:
| No neurons are the equivalent of transistors. Synapses are the
| equivalent of connections between transistors. The total
| neurons are about 100 Billion but the connections are 100
| Trillion. A neuron can have up to 10,000 synapses, while a
| transistor can have on average only about 3. Impressive
| nonetheless on power efficiency.
| highfrequency wrote:
| # synapses should be analogous to # model parameters no? And
| # model parameters should be linear in # transistors.
| phkahler wrote:
| >> # synapses should be analogous to # model parameters no?
|
| I think they're equivalent to a parameter AND the
| multiplier. Or in analog terms they'd just be a resistor
| whose value can be changed. Digital stuff is not a good fit
| for this.
| 0xcde4c3db wrote:
| > Or in analog terms they'd just be a resistor whose
| value can be changed
|
| For what it's worth, that's actually a thing
| (ReRAM/memristors), but I think it got put on the back
| burner because it requires novel materials and nobody
| figured out how to cost-effectively scale up the
| fabrication versus scaling up flash memory. I saw some
| mention recently that advances in perovskite materials (a
| big deal lately due to potential solar applications)
| might revive the concept.
| retrofrost wrote:
| We can not even get close to saying our current networks
| can be even close to synapses in performance or functional
| because architecturally we still use feedforward networks
| no recursion, no timing elements, very static connections.
| Transitors will definitely have some advantages in terms of
| being able to synchronize information and steps to an
| infinitely better degree than biological neurons, but as
| long as we stick with transformers it's the equivalent of
| trying to get to space by stacking sand, could you get
| there eventually? Yes, but there's better ways.
| layer8 wrote:
| Ignoring for the moment that transistors and synapses are very
| different in their function, the current in a CPU transistor is
| in the milliampere range, whereas in the ion channels of a
| synapse it is in the picoampere range. The voltage differs by
| roughly a factor of ten. So the wattage differs by a factor of
| 10^10.
|
| One important reason for the difference in current is that
| transistors need to reliably switch between two reliably
| distinguishable states, which requires a comparatively high
| current, whereas synapses are very analog in nature. It may not
| be possible to reach the brain's efficiency with deterministic
| binary logic.
| TRDRVR wrote:
| Thank you for this - the name _neural_ networks has made a
| whole generation of people forget that they have an endocrine
| system.
|
| We _know_ things like sleep, hunger, fear, and stress all
| impact how we think, yet people want to still build this
| mental model that synapses are just dot products that either
| reach an activation threshold or don 't.
| __loam wrote:
| People will spout off about how machine learning is based
| on the brain while having no idea how the brain works.
| gary_0 wrote:
| It is based on the brain, but only in the loosest
| possible terms; ML is a cargo cult of biology. It's kind
| of surprising that it works at all.
| Borg3 wrote:
| It works because well, its actually pretty primitive at
| its core. Whole learning process is actually pretty
| brutal. Doing millions of interations w/ random (and
| semi-random) adjustments.
| cosmojg wrote:
| Fortunately for academics looking for a new start in
| industry, this widespread misunderstanding has made it only
| far too easy to transition from a slow-paced career in
| computational neuroscience to an overwhelmingly lucrative
| one in machine learning!
| goatlover wrote:
| There have been people on HN arguing that the human brain
| is a biological LLM, because they can't think of any other
| way it could work, as if we evolved to generate the next
| token, instead of fitness as organisms in the real world.
| Where things like eating, sleeping, shelter, avoiding
| danger, social bonds, reproduction and child rearing are
| important. Things that require a body.
| kungito wrote:
| I'm one of those people. To me those things only sounded
| like a different prompt. Priorities set for the llm
| TRDRVR wrote:
| Here's the hubris of thinking that way:
|
| I would imagine the baseline assumption of your thinking
| is that things like sleep and emotions are a 'bug' in
| terms of cognition. Said differently, the assumption is
| that _with the right engineer,_ you could reach human-
| parity cognition with a model that doesn 't sleep or feel
| emotions (after all what's the point of an LLM if it
| get's tired and doesn't want to answer your questions
| sometimes? Or even worse knowingly deceives you because
| it is mad at you or prejudices against you).
|
| The problem with that assumption is that as far as we can
| tell, every being with even the slightest amount of
| cognition sleeps in some form and has something akin to
| emotional states. As far as we can prove, sleep and
| emotions _are necessary preconditions_ to cognition.
|
| A worldview where the 'good' parts of the brain are
| replicated in LLM but the 'bad' parts are not is likely
| an incomplete one.
| philipkglass wrote:
| _Ignoring for the moment that transistors and synapses are
| very different in their function, the current in a CPU
| transistor is in the milliampere range..._
|
| That seems implausible. Apple's M2 has 20 billion transistors
| and draws 15 watts at full power [1]. Even assuming that 90%
| of those transistors are for cache and not logic, that would
| still be 2 billion logic transistors * 1 milliampere = 2
| million amperes at full power. That would imply a voltage of
| 7.5 microvolts, which is far too low for silicon transistors.
|
| [1] https://www.anandtech.com/show/17431/apple-
| announces-m2-soc-...
| fsckboy wrote:
| > _the current in a CPU transistor is in the milliampere
| range_
|
| ? you sure about that? in a single transistor? over what time
| period, more than nanoseconds? milliamps is huge, and there
| are millions of transistors on a single chip these days, and
| with voltage drops of ... 3V? .7V? you 're talking major
| power. FETs should be operating on field more than flow,
| though there is some capacitive charge/discharge.
| Ductapemaster wrote:
| Single transistors in modern processes switch currents
| orders of magnitude lower than milliamps. More like micro-
| to picoamps. There's leakage to account for too as features
| get smaller and smaller due to tunneling and other effects,
| but still in aggregate the current per transistor is tiny.
|
| Also the transistors are working at 1V or lower, but as you
| say they are FETs and don't have the same Vbe drop as a
| BJT.
| layer8 wrote:
| You are right, I mixed this up. If you take a CPU running
| at 100 W with 10 billion transistors (not quite
| realistically assumed to all be wired in parallel) at 1 V,
| you would get an average of 0.01 microamps. So the factor
| would reduce to roughly 10^5.
| nuancebydefault wrote:
| Wait a minute, a lot of those transistors are switching
| the same currents since they are in series. Also, FETs
| only draw most current while switching, so in between
| switches there's almost no flow of electrons. So in fact
| you cannot calculate things that way.
| layer8 wrote:
| Yes, as I said the parallel assumption is not quite
| realistic, and the number is an average, covering all
| states a transistor may be in. So it amounts to a rough
| lower bound for when a transistor is switching.
| jvanderbot wrote:
| A simple discretization of the various levels of signal at
| each input/output, a discretization to handle time-of-
| propagation (which is almost surely part of the computation
| just because it _can be_ and nature probably hijacks all
| mechanisms), and a further discretization to handle the
| various serum levels in the brain, which are either inputs,
| outputs, or probably both.
|
| Just add a factor 2^D transistors for each original "brain
| transistor" and re-run your hardware. Hope field effects
| don't count, and cross your fingers that neurons are
| idempotent!
|
| Easy! /s
|
| Modelling an analog system in digital will always have a
| combinatorial curse of dimensionality. Modelling a biological
| system is so insanely complex I can't even begin to think
| about it.
| omikun wrote:
| A single precision flop is in the order of pJ. [1] A
| transistor would be much less.
|
| [1] https://arxiv.org/pdf/1809.09206.pdf
| whiplash451 wrote:
| How do you get to 10^10? I might be missing a fundamental of
| physics here (asking genuinely).
| CuriouslyC wrote:
| The more of our logic we can implement with addition, the
| more can be offloaded to noisy analog systems with
| approximate computing. It would be funny if model temperature
| stopped being metaphorical.
| marcosdumay wrote:
| On the power usage, the difference is that those synapses are
| almost always in stand-by. The equivalent would be a CMOS
| circuit with a clock of minutes.
|
| On the complexity, AFAIK a synapse is way more complex than a
| transistor. Larger too, if you include its share of the
| neuron's volume. And yes, the count difference is due to the 3D
| packing.
| sroussey wrote:
| The brain's 3d packing is via folding. Maybe something
| similar would be better than just stacking.
| mrtksn wrote:
| IMHO the breakthrough will come with an analog computer.
|
| The current systems simulate stuff through computation using
| switches.
|
| It's like real sand vs sand simulator in the browser. One spins
| your fans and drains your battery when showing you 1000
| particles acting like sand, the other just obeys laws of
| physics locally per particle and can do millions of particles
| much more accurately and with very slight increase in
| temperature.
|
| Of course, analog computations are much less controllable but
| in this case that's not a deal breaker.
| zer00eyz wrote:
| >> 100 trillion synapses (very roughly analogous to
| transistors?)
|
| Not even remotely comparable
|
| * its unlikely synapses are binary. Candidly they probably
| serve more than one purpose.
|
| * transistor count is a bad proxy for other reasons. A pipeline
| to do floats are not going to be useful for fetch from memory.
| "Where" the density lies is important.
|
| * Power: On this front transistors are a joke.
|
| * The brain is clockless, and analog... frequency is an
| interesting metric
|
| Binary systems are going to be bad at simulating complex
| processes. LLM's are a simulation of intelligence, like
| forecasting is a simulation of weather. Lorenz shows us why
| simulation of weather has limits, there isnt some magical math
| that will change those rules for ML to make the leap to "AGI"
| luyu_wu wrote:
| Transistor power is really not a joke. Synapses would take
| far far FAR more power at close to similar frequencies.
| Biological neurons are incredibly inefficient.
| goatlover wrote:
| Biology is incredibly efficient at what it does well
| though. Thus only 20 watts of brain energy to coordinate
| everything we do. We didn't evolve to be mentats.
| vasco wrote:
| We also lose a lot when building computers due to the fact we
| have to convert the analog world into digital representations.
| A neural analog computer would be more efficient I think, and
| due to the non-deterministic nature of AI would probably suit
| the task as well.
| gricardo99 wrote:
| interesting point.
|
| i know of at least one startup working with that concept[1].
|
| Im sure there are others.
|
| 1 - https://www.extropic.ai/
| sroussey wrote:
| There are nueromorphic companies like https://rain.ai
| barelyauser wrote:
| Non-deterministic means random. AI or natural I is not
| random. Analog suffers immensely from noise and it is the
| reason the brain has such a large number of neurons, part to
| deal with noise and part to deal with losing some neurons
| along the way.
| doug_durham wrote:
| Non-deterministic doesn't mean random. Random means random.
| Non-deterministic means that specific inputs don't generate
| the same outputs. It says nothing about the distribution of
| the output values. A chaotic system isn't random, its just
| non-deterministic.
| dragonwriter wrote:
| "Nondeternimism" gets used jn a variety of ways both from
| differing context and conflicting uses in the same
| context, but chaotic systems are fully deterministic in
| the most common relevant sense, but highly sensitive to
| inputs, resulting in even very small uncertainty in
| inputs to render them largely unpredictable.
| fiftyfifty wrote:
| By some accounts it takes a pretty sizable neural network to
| simulate a single neuron:
|
| https://www.sciencedirect.com/science/article/pii/S089662732...
|
| So we are going to need a lot of computational power to
| approximate what's going on in an entire human brain.
| pnjunction wrote:
| I wonder if this trajectory will only lead to reinventing the
| biological brain. It is hard to imagine the emergence of
| consciousness, as we know it, on a fundamentally deterministic
| system.
| haltIncomplete wrote:
| Why are static 3D cells going to get us there when other ideas
| have not? Is it needed to replicate "arbitrary" academic ideas
| of consciousness (despite our best efforts our models are
| always approximation) to make a useful machine?
|
| "Living things" are not static designs off the drafters table.
| They'll never intelligent from their own curiosity, but from
| ours and the rules we embed. No matter how hard we push puerile
| hallucinations embedded by Star Trek. It's still a computer and
| human agency does not have to bend to it.
| HarHarVeryFunny wrote:
| > The power efficiency difference may be explainable by the
| much slower frequency of brain computation (200 Hz vs. 2GHz).
|
| Partly, but also because the brain has an asynchronous data-
| flow design, while the GPU is synchronous, and as you say
| clocked at a very high frequency.
|
| In a clocked design the clock signal needs to be routed to
| every element on the chip which requires a lot of power, the
| more so the higher the frequency is. It's a bit like the amount
| of energy used doing "battle ropes" at the gym. The heavier the
| ropes (cf more gates the clock is connected to), the more power
| it takes to move them, and the faster you want to move them (cf
| faster clock frequency) the more power it takes.
|
| In a data-flow design, like the brain, there is no clock. Each
| neuron fires, or not, independent of what other neurons are
| doing, based on their own individual inputs. If the inputs are
| changing (i.e. receiving signal spikes from attached neurons),
| then at some threshold of spike accumulation the neuron will
| fire (expending energy). If the inputs are not changing, or at
| a level below threshold, then the neuron will not fire.
|
| To consider the difference, imagine our visual cortex if we're
| looking at a seagull flying across a blue sky. The seagull
| represents a tiny part of the visual field, and is the only
| part that is moving/changing, so there are only a few neurons
| who's inputs are changing and which themselves will therefore
| fire and expend energy. The blue sky comprising the rest of the
| visual field is not changing and we therefore don't expend any
| energy reprocessing it over and over.
|
| In contrast, if you fed a video (frame by frame) of that same
| visual scene into a CNN being processed on a GPU, then it does
| not distinguish between what is changing or not, so 95% of the
| energy processing each frame will be wasted, and this will be
| repeated frame by frame as long as we're looking at that scene!
| taktoa wrote:
| > In a clocked design the clock signal needs to be routed to
| every element on the chip which requires a lot of power, the
| more so the higher the frequency is.
|
| Clock only needs to be distributed to sequential components
| like flip flops or SRAMs. The number of clock distribution
| wire-millimeters in typical chip is dwarfed by the number of
| data wire-millimeters, and if a neural network is well
| trained and quantized activations should be random, so number
| of transitions per clock should be 0.5 (as opposed to 1 for
| clock wires), meaning that power can't be dominated by clock.
| The flops that prevent clock skew are a small % of area, so I
| don't think those can tip the scales either. On the other
| hand, in asynchronous digital logic you need to have valid
| bit calculation on every single piece of logic, which seems
| like a pretty huge overhead to me.
| HarHarVeryFunny wrote:
| There's obvious potential savings in not wasting FLOPs
| recalculating things unnecessarily, but I'm not sure how
| much of that could be realized by just building a data-flow
| digital GPU. The only attempt at a data-flow digital
| processor I'm aware of was AMULET (by ARM designer Steve
| Furber), which was not very successful.
|
| There's more promise in analog chip designs, such as here:
|
| https://spectrum.ieee.org/low-power-ai-spiking-neural-net
|
| Or otherwise smarter architectures (software only or
| S/W+H/W) that design out the unnecessary calculations.
|
| It's interesting to note how extraordinarily wasteful
| transformer-based LLMs are too. The transformer was
| designed part inspired by linguistics and part based on the
| parallel hardware (GPU's etc) available to run it on.
| Language mostly has only local sentence structure
| dependencies, yet transformer's self-attention mechanism
| has every word in a sentence paying attention to every
| other word (to some learned degree)! Turns out it's better
| to be dumb and fast than smart, although I expect future
| architectures will be much more efficient.
| ordu wrote:
| _> The power efficiency difference may be explainable by the
| much slower frequency of brain computation (200 Hz vs. 2GHz)._
|
| Or by a more static design. A GPU can't do a thing without all
| the weights and shaders. There are benefits of this, you can
| easily swap one model for another. Human mind from the other
| hand is not reprogrammable. It can learn new tricks, but you
| cannot extract a firmware from one person and to upload it to
| another person.
|
| Just imagine if every logical neuron of AI was a real thing,
| with physical connections to other neurons as inputs. No more
| need to have a high throughput memory, no more need to have
| compute units with gigaherz frequency.
| wslh wrote:
| Neurons are not just linear algebra pieces by the way. This is
| why there is bibliography that talks about the complexity of a
| single neuron. So, comparing at the unit level at this point is
| apples vs. oranges.
|
| But, yes, the brain continues to be a surprising machine and ML
| accomplishements are amazing for that machine.
| orbital-decay wrote:
| Apples to oranges. Gate count indicates nothing when the
| architectures are nothing alike.
|
| Brain is a spiking network with mutable connectivity, mostly
| asynchronous. Only the active path is spending energy at a
| single moment in time, and "compute" is tightly coupled with
| memory to the point of being indistinguishable. No need to move
| data anywhere.
|
| In contrast, GPUs/TPUs are clocked and run fully connected
| networks, they have to iterate over humongous data arrays every
| time. Memory is decoupled from compute due to the semiconductor
| process differences between the two. As a result, they waste a
| huge amount of energy just moving data back and forth.
|
| Fundamental advancements in SNNs are also required, it's not
| just about the transistors.
| Traubenfuchs wrote:
| Neurons and synapses have more than 100 different
| neurotransmitters and their many receptors, there is reuptake
| and destructive encyme activity and they are connected up to
| many thousands of their peer. Every single neuron is a
| dizzyingly complex machine, employing countless sub machines
| and processes.
|
| You can not reasonably compare this to model parameters or
| transistors at all.
| ozim wrote:
| Nature was "building" this stuff for eons. I feel pretty good
| about our progress in still less than a 100 years.
| somewhereoutth wrote:
| and how we'll find a way to soak up all those transistors for a
| perhaps actually _worse_ user experience / societal outcome.
| phkahler wrote:
| If we are just after AI now we should drop the GPU concept and
| call it what it is - Matrix multipliers. With that framing, we
| can move to in-memory compute so the data doesn't have to be
| moving around so much. Memory chips could have lines of MAC units
| at some intervals and something to compute the nonlinear
| functions after summing. Fixed sizes could be implemented in
| hardware and software would spread larger computations over a
| number of tiles. If this were standardized we might see it end up
| as a modest premium price on memory chips. nVidia step aside it's
| going to be Micron, Hynix and friends revolutionizing AI.
| kazinator wrote:
| Hardware, or at least purely computational hardware, will never
| get the same accolades as the software that actually makes it do
| something.
|
| Interface hardware, being is perceptible to the senses, gets
| credit over software.
|
| E.g. when people experience a vivid, sharp high resolution
| display, they attribute all its good properties to the hardware,
| even if there is some software involved in improving the visuals,
| like making fonts look better and whatnot.
|
| If a mouse works nicely, people attribute it to the hardware, not
| the drivers.
|
| If you work in hardware, and crave the appreciation, make
| something that people look at, hear, or hold in their hands, not
| something that crunches away in a closet.
| anonymousDan wrote:
| I hear a lot about the energy efficiency of animal brains in
| comparison to e.g. GPUs. However, as far as I can tell most of
| the numbers reported are for adult brains, which effectively have
| been sparsified over time. Does anyone know of how the picture
| changes if we consider baby animal brains, which as I understand
| it have much denser connectivity and higher energy consumption
| than adult brains?
___________________________________________________________________
(page generated 2024-03-28 23:01 UTC)