[HN Gopher] Advances in semiconductors are feeding the AI boom
       ___________________________________________________________________
        
       Advances in semiconductors are feeding the AI boom
        
       Author : mfiguiere
       Score  : 126 points
       Date   : 2024-03-28 15:16 UTC (7 hours ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | tzm wrote:
       | Cerebras WSE-3 contains 4 trillion transistors and 8 exaflops per
       | sec, 20 PB bandwidth. 62 times the cores of an H100.. 900,000. I
       | wonder if the WSE-3 can compete on price / performance though.
       | Interesting times!
        
         | winwang wrote:
         | How's the single core-to-core bandwidth?
        
         | jsheard wrote:
         | Is anyone actually using those WSEs in anger yet? They're on
         | their third generation now, but as far as I can tell the
         | discussion of each generation consists of "Cerebras announces
         | new giant chip" and then radio silence until they announce the
         | next giant chip.
        
           | JonChesterfield wrote:
           | They sold some. Not strictly speaking the same as using any
           | but there's a decent chance some code is running on the
           | machines.
        
           | rthnbgrredf wrote:
           | Problem is Software. You can put out a XYZ trillion monster
           | chip that beats anything hardware wise, but it is going
           | nowhere if you don't have the tooling and massive community
           | (like Nvidia has) to actually do some real A.I. stuff.
        
           | IshKebab wrote:
           | Unlikely. They cost so much that nobody is going to do
           | research on them - at best it's porting existing models. And
           | they're so different to GPUs that the porting effort is going
           | to be enormous.
           | 
           | They also suffer from the global optimisation problem for
           | layout of calculations so compile time is going to be insane.
           | 
           | Their WSE technology is also already obsolete - Tesla's chip
           | does it in a much more logical and cost effective way.
        
           | shrubble wrote:
           | The Cerebras-2 is at the Pittsburgh Supercomputing Center.
           | Not sure if they ordered a 3.
        
         | monocasa wrote:
         | > 62 times the cores of an H100.. 900,000.
         | 
         | More than that arguably. CUDA cores are more like SIMD lanes
         | than CPU cores like cerebras's usage of 'core'. Since they have
         | 4 wide tensor ops on cerebras, there's arguably 3.6M CUDA
         | equivalent cores.
        
         | AnimalMuppet wrote:
         | 9 trillion flops per core? That's... mind-boggling. Is that
         | real?
         | 
         | And, 9 trillion flops per core in 4.4 million transistors per
         | core. That sounds a bit too good to be true.
        
       | ramshanker wrote:
       | When the limits of digital reaches the boundary of physics,
       | Analogue is going make a comeback. Human brain feels nearer to
       | Analogue than Digital. I will be surprised to reach AGI without
       | nearing the Order-of-Magnitude of brain processing.
       | 
       | We need that ONE paper on analogue to end this quest of trillions
       | and counting transistors.
        
         | Teever wrote:
         | Won't that mean that we just change the quest from higher and
         | higher density of digital elements to higher and higher density
         | of analog elements?
         | 
         | Like people weren't trying to make computers out of bigger and
         | bigger tubes before the transistor, they were trying to make
         | them out of smaller and smaller ones.
        
         | bigyikes wrote:
         | Computers are going to increase in size as they consume more
         | and more power, requiring increasingly elaborate cooling
         | systems.
         | 
         | Like how the transistor made the big and hot vacuum tubes
         | obsolete, maybe we'll see some analog breakthrough do the same
         | thing to transistors, at least for AI.
         | 
         | I doubt there is a world where we use analog for general
         | purpose computing, but it seems perfect for messy,
         | probabilistic processes like thinking.
        
           | enlyth wrote:
           | What's amazing is that the human brain does it all on the
           | equivalent of like 20 watts of power. That's basically a
           | couple of LED light bulbs.
        
             | passion__desire wrote:
             | Is there a comparison of power efficiency of human brain
             | doing 50 digits multiplication vs a multiplier circuit
             | doing it?
        
               | gamacodre wrote:
               | I think the problem here would be figuring out how much
               | of the brain's power draw to attribute to the
               | multiplication. A brain is more akin to a motherboard
               | than a single CPU, with all kinds of I/O, internal
               | regulation, and other ancillary stuff going on all the
               | time.
        
               | passion__desire wrote:
               | Is the issue then we haven't discovered the magical
               | algorithm run by our brain? If we discover it, then
               | digital circuits will handsomely beat brain.
        
               | gamacodre wrote:
               | We can surely build more efficient and capable hardware
               | than our current evolved wetware, since all of the
               | details of _how_ to build it are generally externalized.
               | If the chips had to fab themselves, it would be a
               | different story.
               | 
               | The software _is_ a different story. Sure, the brain does
               | all sorts of things that aren 't necessary for $TASK, but
               | we aren't necessarily going to be able to correctly
               | identify which are which. Is your inner experience of
               | your arm motion needed to fully parse the meaning in
               | "raise a glass to toast the bride and groom", or respond
               | meaningfully to someone who says that? Or perhaps it
               | doesn't really matter - language is already a decent tool
               | for bridging disjoint creature realities, maybe it'll
               | stretch to synthetic consciousness too.
        
               | passion__desire wrote:
               | All of computation is realised by very few arithmetic
               | operations. Then test energy efficiency of wetware and
               | hardware on those operations. Then any difference can be
               | attributed to algorithms.
        
         | smallmancontrov wrote:
         | What? Action potentials are extremely digital, in that small
         | perturbations are suppressed so as to encode information in a
         | higher level conceptual state (the "digit" / "bit" is the
         | presence or absence of an AP) to obtain robustness and
         | integration scale.
        
         | jzombie wrote:
         | I may have misinterpreted this comment by thinking you meant
         | that as we squeeze more and more transistors into a small
         | amount of space, the resulting waveforms would start to
         | resemble analog more so than digital.
        
         | knodi123 wrote:
         | > I will be surprised to reach AGI without nearing the Order-
         | of-Magnitude of brain processing.
         | 
         | I have some theories that this isn't necessary. 1.) Just
         | because the brain is a general-purpose machine great at doing
         | lots of things, doesn't mean it's great at each of those
         | things. Like when two people are playing catch, and one of them
         | sees the first fragments of a parabola and estimates where the
         | ball is going to land- a computer can calculate that way more
         | efficiently than a mind, despite the fact that both are quick
         | enough to get the job done. 2.) While the brain is great at,
         | say, putting names to faces... a good CV machine can do the job
         | almost as well, and can annotate a video stream in real-time.
         | 
         | Combining 1.) the fact that some problems are much simpler to
         | solve with classical algorithms instead of neural networking,
         | and 2.) that many brain tasks can be farmed out to a
         | coprocessor/service, my hypothesis is that the number of
         | neurons/resources required to do the "secret sauce" part of agi
         | could be greatly reduced.
        
           | Galaxeblaffer wrote:
           | I'm also in the camp that believes that we won't reach agi
           | without significantly more compute. I think that
           | consciousness is an emergent property, just like i think life
           | itself is an emergent property, Both need a certain set
           | elements/systems to work. The secret sauce may look simple
           | when we recreate the right conditions, but it's not going to
           | be something that makes it possible without the right
           | hardware so to speak.
        
             | knodi123 wrote:
             | But, take for instance, sight. Sighted people use a large
             | percentage of their brain to process visual input. People
             | born congenitally blind are no smarter or dumber for their
             | brains not having to process that input- so clearly that's
             | not the secret sauce.
             | 
             | I'm not convinced consciousness is emergent, I don't really
             | have an opinion on _that_- but I'm > 50% convinced that
             | consciousness itself doesn't _require_ a neural network as
             | large as a human brain 's.
        
         | bee_rider wrote:
         | Analogue circuits can be a real pain in the butt, imagine if
         | integer overflows destroyed the whole ALU, haha.
         | 
         | Transistors are already much smaller than neurons. And of
         | course the brain doesn't have a clock. And neurons have more
         | complex behavior than single transistors... The whole system is
         | just very different. So, this doesn't seem like a strategy to
         | get past a boundary, it is more like a suggestion that we give
         | up on the current path and go in a radically different
         | direction. It... isn't impossible but it seems like a wild
         | change for the field.
         | 
         | If we want something post-cmos, potentially radically more
         | efficient, but still familiar in the sense that it produces
         | digital logic, quantum dot cellular automata with Bennett
         | clocking seems more promising IMO.
        
         | mwbajor wrote:
         | I work in analog,
         | 
         | 1) Noise is an issue as the system gets complex. You can't get
         | away with counting to 1 anymore, all those levels in between
         | matter. 2) Its hard to make an analog computer reconfigurable.
         | 3) Analog computers exist commercially believe it or not, but
         | for niche applications and essentially as coprocessors.
        
           | jameshart wrote:
           | Quantization of parameters in neural networks is roughly
           | analogous to introducing noise into analog signals. We've got
           | good evidence that these architectures are robust to
           | quantization - which implies they could be implemented over
           | noisy analog signals.
           | 
           | Not sure who's working on that but I can't believe it's not
           | being examined.
        
         | floxy wrote:
         | >We need that ONE paper on analogue
         | 
         | https://www.analog.com/en/resources/analog-dialogue/articles...
        
         | marcosdumay wrote:
         | I doubt it. Digital just scales better.
         | 
         | Our brain has a pretty bounded need of scaling, but once we
         | create some computer equivalent, it would be very
         | counterproductive to make it useless for larger problems for a
         | small gain on smaller ones.
        
           | smallmancontrov wrote:
           | > Digital just scales better.
           | 
           | Yes!
           | 
           | > Our brain has a pretty bounded need of scaling
           | 
           | No!
           | 
           | Over aeons our brains scaled from several neurons to 100
           | billion neurons, each with 1000 synapses. They were able to
           | do it _because our brains are digital_. They lean on their
           | digital nature even more than computer chips do.
           | 
           | Action potentials are so digital it hurts. They aren't just
           | quantized in level, but in the entire shape of the waveform
           | across several milliseconds. Just as in computer chips, this
           | suppresses perturbations. As long as higher level computation
           | only depends on presence/absence of action potentials and
           | timing, it inherits this robustness and allows scale. Rather
           | than errors accumulating and preventing integration beyond a
           | certain threshold, error resilience scales alongside
           | computation. Every neuron "refreshes the signal," allowing
           | arbitrary integration complexity at any scale, even in the
           | face of messy biology problems along the way. Just like every
           | transistor (or at least logic gate) "refreshes the signal" so
           | that you can stack billions on a chip and quadrillions in
           | sequential computation, even though each transistor is
           | imperfect.
           | 
           | Digital computation is the way. Always has been, always will
           | be.
        
       | ViktorRay wrote:
       | This article has two authors.
       | 
       | One author is chairman of TSMC.
       | 
       | The other author is Chief Scientist of TSMC.
       | 
       | This is important to note because they clearly know some stuff
       | and we should listen.
        
         | bigbillheck wrote:
         | Two credited authors.
        
       | GaggiX wrote:
       | Yeah I also think that the impact of better hardware generates a
       | visible lead in quality in the models that are released, for
       | example: companies like OpenAI have had access to large
       | quantities of H100 for a few months now and Sora is being
       | presented, something I would not have believed a year ago, I
       | would also believe that Claude 3 models were trained on H100,
       | DBRX was trained on 12T tokens, a big difference compared to the
       | 300B for the original GPT-3, the new NovelAI image generation was
       | trained on H100 and compared to the previous model is like day
       | and night. It seems to create a generational jump.
        
         | declaredapple wrote:
         | > companies like OpenAI have had access to large quantities of
         | H100 for a few months now and Sora is being presented
         | 
         | From what I could tell from Nvidia's recent presentation,
         | Nvidia works directly with OpenAI to test their next gen
         | hardware. IIRC they had some slides showing the throughput
         | comparisons with Hopper and Blackwell, suggesting they used
         | OpenAI's workload for testing.
         | 
         | H100's have been generally available (not a long waitlist) for
         | only several months, but all the big players had them already 1
         | year ago.
         | 
         | I agree with you, but I think you might be 1 generation behind.
         | 
         | > OpenAI used H100's predecessor -- NVIDIA A100 GPUs -- to
         | train and run ChatGPT, an AI system optimized for dialogue,
         | which has been used by hundreds of millions of people worldwide
         | in record time. OpenAI will be using H100 on its Azure
         | supercomputer to power its continuing AI research.
         | 
         | March 21, 2023 https://nvidianews.nvidia.com/news/nvidia-
         | hopper-gpus-expand...
        
           | GaggiX wrote:
           | Very interesting, I guess it does make sense that GPT-4 was
           | also trained on the Hopper architecture.
        
       | jameshart wrote:
       | For some sense of how far out/close 1 trillion transistors in one
       | GPU is:
       | 
       | NVIDIA just announced Blackwell which gets to 208bn transistors
       | on a chip by stitching two dies together into a single GPU.
       | https://www.nvidia.com/en-us/data-center/technologies/blackw...
       | 
       | They're sticking two of them in a board with a Grace CPU in
       | between, then linking 36 of those those boards together in racks
       | with NVLink switches that offer "130TB/s of GPU bandwidth in one
       | 72-GPU NVLink domain".
       | 
       | In terms of marketing, NVidia calls one of those racks a GB200
       | NVL72 "super GPU".
       | 
       | So on one level NVIDIA would say they already have a GPU with
       | 'trillions' of transistors.
        
         | sylware wrote:
         | GPUs can still scale up by a LOT. The real limiting parameters
         | are power consumption (which includes cooling), bus speed,
         | production level of foundries.
         | 
         | Think super-[cross-fire/sli].
         | 
         | Economics will probably forbid that. This is a virtual limit
         | which factors in the previous physical limits... in theory.
        
       | HarHarVeryFunny wrote:
       | We already have a 4 trillion transistor "GPU" in the Cerebras
       | WSE-3 (wafer-scale engine), used in Cerebras' data centers.
       | 
       | https://www.youtube.com/watch?v=f4Dly8I8lMY
        
         | DesiLurker wrote:
         | how many regular sized GPU (say 4080 or nvidia L4) dies can be
         | cut out from a full-sized wafer? I suppose thats what OP means
         | by reaching integration density of 1T GPU.
        
           | denimnerd42 wrote:
           | here's some discussion about yields for H100 per wafer. I'd
           | assume a 4080 is smaller? regardless, the calculator is
           | supplied.
           | 
           | https://news.ycombinator.com/item?id=38588876
        
           | moralestapia wrote:
           | I think the whole premise of TFA is flawed, as there already
           | chips with way more than a trillion transistors (as GP points
           | out).
           | 
           | Arguing about what is the size limit to consider something a
           | GPU or not is a bit like bikeshedding.
           | 
           | As to why wouldn't a supercomputer be considered for this?
           | Because it's not a single chip.
        
           | IshKebab wrote:
           | It's somewhere around 20 if you go as large as possible IIRC.
        
         | IshKebab wrote:
         | Yeah that doesn't really count. It's the equivalent to like 20
         | GPUs and costs 200x as much.
        
           | edward28 wrote:
           | It's meant to compete with nvidia's DGX systems with 8 GPUs
           | per node.
        
       | highfrequency wrote:
       | Wild that the human brain can squeeze in 100 trillion synapses (
       | _very_ roughly analogous to model parameters  / transistors) in a
       | 3lb piece of meat that draws 20 Watts. The power efficiency
       | difference may be explainable by the much slower frequency of
       | brain computation (200 Hz vs. 2GHz).
       | 
       | My impression is that the main obstacle to achieving a comparable
       | volumetric density is that we haven't cracked 3d stacking of
       | integrated circuits yet. Very exciting to see TSMC making inroads
       | here:
       | 
       | > Recent advances have shown HBM test structures with 12 layers
       | of chips stacked using hybrid bonding, a copper-to-copper
       | connection with a higher density than solder bumps can provide.
       | Bonded at low temperature on top of a larger base logic chip,
       | this memory system has a total thickness of just 600 um...We'll
       | need to link all these chiplets together in a 3D stack, but
       | fortunately, industry has been able to rapidly scale down the
       | pitch of vertical interconnects, increasing the density of
       | connections. And there is plenty of room for more. We see no
       | reason why the interconnect density can't grow by an order of
       | magnitude, and even beyond.
       | 
       | It's hard to imagine _not_ getting unbelievable results when in
       | 10-30 years we have GPUs with a comparable number of transistors
       | to brain synapses that support computation speed 10,000x faster
       | than the brain. What a thing to witness!
        
         | user90131313 wrote:
         | I think biggest difference is millions of years evolutionary
         | development. That's a lot of time difference.
        
         | londons_explore wrote:
         | > is it just that we haven't cracked 3-d stacking of integrated
         | circuits yet?
         | 
         | Yes. If we could stack transistors in the Z dimension as
         | closely as we do in X and Y, we'd easily exceed the brains
         | density.
        
         | npalli wrote:
         | No neurons are the equivalent of transistors. Synapses are the
         | equivalent of connections between transistors. The total
         | neurons are about 100 Billion but the connections are 100
         | Trillion. A neuron can have up to 10,000 synapses, while a
         | transistor can have on average only about 3. Impressive
         | nonetheless on power efficiency.
        
           | highfrequency wrote:
           | # synapses should be analogous to # model parameters no? And
           | # model parameters should be linear in # transistors.
        
             | phkahler wrote:
             | >> # synapses should be analogous to # model parameters no?
             | 
             | I think they're equivalent to a parameter AND the
             | multiplier. Or in analog terms they'd just be a resistor
             | whose value can be changed. Digital stuff is not a good fit
             | for this.
        
               | 0xcde4c3db wrote:
               | > Or in analog terms they'd just be a resistor whose
               | value can be changed
               | 
               | For what it's worth, that's actually a thing
               | (ReRAM/memristors), but I think it got put on the back
               | burner because it requires novel materials and nobody
               | figured out how to cost-effectively scale up the
               | fabrication versus scaling up flash memory. I saw some
               | mention recently that advances in perovskite materials (a
               | big deal lately due to potential solar applications)
               | might revive the concept.
        
             | retrofrost wrote:
             | We can not even get close to saying our current networks
             | can be even close to synapses in performance or functional
             | because architecturally we still use feedforward networks
             | no recursion, no timing elements, very static connections.
             | Transitors will definitely have some advantages in terms of
             | being able to synchronize information and steps to an
             | infinitely better degree than biological neurons, but as
             | long as we stick with transformers it's the equivalent of
             | trying to get to space by stacking sand, could you get
             | there eventually? Yes, but there's better ways.
        
         | layer8 wrote:
         | Ignoring for the moment that transistors and synapses are very
         | different in their function, the current in a CPU transistor is
         | in the milliampere range, whereas in the ion channels of a
         | synapse it is in the picoampere range. The voltage differs by
         | roughly a factor of ten. So the wattage differs by a factor of
         | 10^10.
         | 
         | One important reason for the difference in current is that
         | transistors need to reliably switch between two reliably
         | distinguishable states, which requires a comparatively high
         | current, whereas synapses are very analog in nature. It may not
         | be possible to reach the brain's efficiency with deterministic
         | binary logic.
        
           | TRDRVR wrote:
           | Thank you for this - the name _neural_ networks has made a
           | whole generation of people forget that they have an endocrine
           | system.
           | 
           | We _know_ things like sleep, hunger, fear, and stress all
           | impact how we think, yet people want to still build this
           | mental model that synapses are just dot products that either
           | reach an activation threshold or don 't.
        
             | __loam wrote:
             | People will spout off about how machine learning is based
             | on the brain while having no idea how the brain works.
        
               | gary_0 wrote:
               | It is based on the brain, but only in the loosest
               | possible terms; ML is a cargo cult of biology. It's kind
               | of surprising that it works at all.
        
               | Borg3 wrote:
               | It works because well, its actually pretty primitive at
               | its core. Whole learning process is actually pretty
               | brutal. Doing millions of interations w/ random (and
               | semi-random) adjustments.
        
             | cosmojg wrote:
             | Fortunately for academics looking for a new start in
             | industry, this widespread misunderstanding has made it only
             | far too easy to transition from a slow-paced career in
             | computational neuroscience to an overwhelmingly lucrative
             | one in machine learning!
        
             | goatlover wrote:
             | There have been people on HN arguing that the human brain
             | is a biological LLM, because they can't think of any other
             | way it could work, as if we evolved to generate the next
             | token, instead of fitness as organisms in the real world.
             | Where things like eating, sleeping, shelter, avoiding
             | danger, social bonds, reproduction and child rearing are
             | important. Things that require a body.
        
               | kungito wrote:
               | I'm one of those people. To me those things only sounded
               | like a different prompt. Priorities set for the llm
        
               | TRDRVR wrote:
               | Here's the hubris of thinking that way:
               | 
               | I would imagine the baseline assumption of your thinking
               | is that things like sleep and emotions are a 'bug' in
               | terms of cognition. Said differently, the assumption is
               | that _with the right engineer,_ you could reach human-
               | parity cognition with a model that doesn 't sleep or feel
               | emotions (after all what's the point of an LLM if it
               | get's tired and doesn't want to answer your questions
               | sometimes? Or even worse knowingly deceives you because
               | it is mad at you or prejudices against you).
               | 
               | The problem with that assumption is that as far as we can
               | tell, every being with even the slightest amount of
               | cognition sleeps in some form and has something akin to
               | emotional states. As far as we can prove, sleep and
               | emotions _are necessary preconditions_ to cognition.
               | 
               | A worldview where the 'good' parts of the brain are
               | replicated in LLM but the 'bad' parts are not is likely
               | an incomplete one.
        
           | philipkglass wrote:
           | _Ignoring for the moment that transistors and synapses are
           | very different in their function, the current in a CPU
           | transistor is in the milliampere range..._
           | 
           | That seems implausible. Apple's M2 has 20 billion transistors
           | and draws 15 watts at full power [1]. Even assuming that 90%
           | of those transistors are for cache and not logic, that would
           | still be 2 billion logic transistors * 1 milliampere = 2
           | million amperes at full power. That would imply a voltage of
           | 7.5 microvolts, which is far too low for silicon transistors.
           | 
           | [1] https://www.anandtech.com/show/17431/apple-
           | announces-m2-soc-...
        
           | fsckboy wrote:
           | > _the current in a CPU transistor is in the milliampere
           | range_
           | 
           | ? you sure about that? in a single transistor? over what time
           | period, more than nanoseconds? milliamps is huge, and there
           | are millions of transistors on a single chip these days, and
           | with voltage drops of ... 3V? .7V? you 're talking major
           | power. FETs should be operating on field more than flow,
           | though there is some capacitive charge/discharge.
        
             | Ductapemaster wrote:
             | Single transistors in modern processes switch currents
             | orders of magnitude lower than milliamps. More like micro-
             | to picoamps. There's leakage to account for too as features
             | get smaller and smaller due to tunneling and other effects,
             | but still in aggregate the current per transistor is tiny.
             | 
             | Also the transistors are working at 1V or lower, but as you
             | say they are FETs and don't have the same Vbe drop as a
             | BJT.
        
             | layer8 wrote:
             | You are right, I mixed this up. If you take a CPU running
             | at 100 W with 10 billion transistors (not quite
             | realistically assumed to all be wired in parallel) at 1 V,
             | you would get an average of 0.01 microamps. So the factor
             | would reduce to roughly 10^5.
        
               | nuancebydefault wrote:
               | Wait a minute, a lot of those transistors are switching
               | the same currents since they are in series. Also, FETs
               | only draw most current while switching, so in between
               | switches there's almost no flow of electrons. So in fact
               | you cannot calculate things that way.
        
               | layer8 wrote:
               | Yes, as I said the parallel assumption is not quite
               | realistic, and the number is an average, covering all
               | states a transistor may be in. So it amounts to a rough
               | lower bound for when a transistor is switching.
        
           | jvanderbot wrote:
           | A simple discretization of the various levels of signal at
           | each input/output, a discretization to handle time-of-
           | propagation (which is almost surely part of the computation
           | just because it _can be_ and nature probably hijacks all
           | mechanisms), and a further discretization to handle the
           | various serum levels in the brain, which are either inputs,
           | outputs, or probably both.
           | 
           | Just add a factor 2^D transistors for each original "brain
           | transistor" and re-run your hardware. Hope field effects
           | don't count, and cross your fingers that neurons are
           | idempotent!
           | 
           | Easy! /s
           | 
           | Modelling an analog system in digital will always have a
           | combinatorial curse of dimensionality. Modelling a biological
           | system is so insanely complex I can't even begin to think
           | about it.
        
           | omikun wrote:
           | A single precision flop is in the order of pJ. [1] A
           | transistor would be much less.
           | 
           | [1] https://arxiv.org/pdf/1809.09206.pdf
        
           | whiplash451 wrote:
           | How do you get to 10^10? I might be missing a fundamental of
           | physics here (asking genuinely).
        
           | CuriouslyC wrote:
           | The more of our logic we can implement with addition, the
           | more can be offloaded to noisy analog systems with
           | approximate computing. It would be funny if model temperature
           | stopped being metaphorical.
        
         | marcosdumay wrote:
         | On the power usage, the difference is that those synapses are
         | almost always in stand-by. The equivalent would be a CMOS
         | circuit with a clock of minutes.
         | 
         | On the complexity, AFAIK a synapse is way more complex than a
         | transistor. Larger too, if you include its share of the
         | neuron's volume. And yes, the count difference is due to the 3D
         | packing.
        
           | sroussey wrote:
           | The brain's 3d packing is via folding. Maybe something
           | similar would be better than just stacking.
        
         | mrtksn wrote:
         | IMHO the breakthrough will come with an analog computer.
         | 
         | The current systems simulate stuff through computation using
         | switches.
         | 
         | It's like real sand vs sand simulator in the browser. One spins
         | your fans and drains your battery when showing you 1000
         | particles acting like sand, the other just obeys laws of
         | physics locally per particle and can do millions of particles
         | much more accurately and with very slight increase in
         | temperature.
         | 
         | Of course, analog computations are much less controllable but
         | in this case that's not a deal breaker.
        
         | zer00eyz wrote:
         | >> 100 trillion synapses (very roughly analogous to
         | transistors?)
         | 
         | Not even remotely comparable
         | 
         | * its unlikely synapses are binary. Candidly they probably
         | serve more than one purpose.
         | 
         | * transistor count is a bad proxy for other reasons. A pipeline
         | to do floats are not going to be useful for fetch from memory.
         | "Where" the density lies is important.
         | 
         | * Power: On this front transistors are a joke.
         | 
         | * The brain is clockless, and analog... frequency is an
         | interesting metric
         | 
         | Binary systems are going to be bad at simulating complex
         | processes. LLM's are a simulation of intelligence, like
         | forecasting is a simulation of weather. Lorenz shows us why
         | simulation of weather has limits, there isnt some magical math
         | that will change those rules for ML to make the leap to "AGI"
        
           | luyu_wu wrote:
           | Transistor power is really not a joke. Synapses would take
           | far far FAR more power at close to similar frequencies.
           | Biological neurons are incredibly inefficient.
        
             | goatlover wrote:
             | Biology is incredibly efficient at what it does well
             | though. Thus only 20 watts of brain energy to coordinate
             | everything we do. We didn't evolve to be mentats.
        
         | vasco wrote:
         | We also lose a lot when building computers due to the fact we
         | have to convert the analog world into digital representations.
         | A neural analog computer would be more efficient I think, and
         | due to the non-deterministic nature of AI would probably suit
         | the task as well.
        
           | gricardo99 wrote:
           | interesting point.
           | 
           | i know of at least one startup working with that concept[1].
           | 
           | Im sure there are others.
           | 
           | 1 - https://www.extropic.ai/
        
           | sroussey wrote:
           | There are nueromorphic companies like https://rain.ai
        
           | barelyauser wrote:
           | Non-deterministic means random. AI or natural I is not
           | random. Analog suffers immensely from noise and it is the
           | reason the brain has such a large number of neurons, part to
           | deal with noise and part to deal with losing some neurons
           | along the way.
        
             | doug_durham wrote:
             | Non-deterministic doesn't mean random. Random means random.
             | Non-deterministic means that specific inputs don't generate
             | the same outputs. It says nothing about the distribution of
             | the output values. A chaotic system isn't random, its just
             | non-deterministic.
        
               | dragonwriter wrote:
               | "Nondeternimism" gets used jn a variety of ways both from
               | differing context and conflicting uses in the same
               | context, but chaotic systems are fully deterministic in
               | the most common relevant sense, but highly sensitive to
               | inputs, resulting in even very small uncertainty in
               | inputs to render them largely unpredictable.
        
         | fiftyfifty wrote:
         | By some accounts it takes a pretty sizable neural network to
         | simulate a single neuron:
         | 
         | https://www.sciencedirect.com/science/article/pii/S089662732...
         | 
         | So we are going to need a lot of computational power to
         | approximate what's going on in an entire human brain.
        
         | pnjunction wrote:
         | I wonder if this trajectory will only lead to reinventing the
         | biological brain. It is hard to imagine the emergence of
         | consciousness, as we know it, on a fundamentally deterministic
         | system.
        
         | haltIncomplete wrote:
         | Why are static 3D cells going to get us there when other ideas
         | have not? Is it needed to replicate "arbitrary" academic ideas
         | of consciousness (despite our best efforts our models are
         | always approximation) to make a useful machine?
         | 
         | "Living things" are not static designs off the drafters table.
         | They'll never intelligent from their own curiosity, but from
         | ours and the rules we embed. No matter how hard we push puerile
         | hallucinations embedded by Star Trek. It's still a computer and
         | human agency does not have to bend to it.
        
         | HarHarVeryFunny wrote:
         | > The power efficiency difference may be explainable by the
         | much slower frequency of brain computation (200 Hz vs. 2GHz).
         | 
         | Partly, but also because the brain has an asynchronous data-
         | flow design, while the GPU is synchronous, and as you say
         | clocked at a very high frequency.
         | 
         | In a clocked design the clock signal needs to be routed to
         | every element on the chip which requires a lot of power, the
         | more so the higher the frequency is. It's a bit like the amount
         | of energy used doing "battle ropes" at the gym. The heavier the
         | ropes (cf more gates the clock is connected to), the more power
         | it takes to move them, and the faster you want to move them (cf
         | faster clock frequency) the more power it takes.
         | 
         | In a data-flow design, like the brain, there is no clock. Each
         | neuron fires, or not, independent of what other neurons are
         | doing, based on their own individual inputs. If the inputs are
         | changing (i.e. receiving signal spikes from attached neurons),
         | then at some threshold of spike accumulation the neuron will
         | fire (expending energy). If the inputs are not changing, or at
         | a level below threshold, then the neuron will not fire.
         | 
         | To consider the difference, imagine our visual cortex if we're
         | looking at a seagull flying across a blue sky. The seagull
         | represents a tiny part of the visual field, and is the only
         | part that is moving/changing, so there are only a few neurons
         | who's inputs are changing and which themselves will therefore
         | fire and expend energy. The blue sky comprising the rest of the
         | visual field is not changing and we therefore don't expend any
         | energy reprocessing it over and over.
         | 
         | In contrast, if you fed a video (frame by frame) of that same
         | visual scene into a CNN being processed on a GPU, then it does
         | not distinguish between what is changing or not, so 95% of the
         | energy processing each frame will be wasted, and this will be
         | repeated frame by frame as long as we're looking at that scene!
        
           | taktoa wrote:
           | > In a clocked design the clock signal needs to be routed to
           | every element on the chip which requires a lot of power, the
           | more so the higher the frequency is.
           | 
           | Clock only needs to be distributed to sequential components
           | like flip flops or SRAMs. The number of clock distribution
           | wire-millimeters in typical chip is dwarfed by the number of
           | data wire-millimeters, and if a neural network is well
           | trained and quantized activations should be random, so number
           | of transitions per clock should be 0.5 (as opposed to 1 for
           | clock wires), meaning that power can't be dominated by clock.
           | The flops that prevent clock skew are a small % of area, so I
           | don't think those can tip the scales either. On the other
           | hand, in asynchronous digital logic you need to have valid
           | bit calculation on every single piece of logic, which seems
           | like a pretty huge overhead to me.
        
             | HarHarVeryFunny wrote:
             | There's obvious potential savings in not wasting FLOPs
             | recalculating things unnecessarily, but I'm not sure how
             | much of that could be realized by just building a data-flow
             | digital GPU. The only attempt at a data-flow digital
             | processor I'm aware of was AMULET (by ARM designer Steve
             | Furber), which was not very successful.
             | 
             | There's more promise in analog chip designs, such as here:
             | 
             | https://spectrum.ieee.org/low-power-ai-spiking-neural-net
             | 
             | Or otherwise smarter architectures (software only or
             | S/W+H/W) that design out the unnecessary calculations.
             | 
             | It's interesting to note how extraordinarily wasteful
             | transformer-based LLMs are too. The transformer was
             | designed part inspired by linguistics and part based on the
             | parallel hardware (GPU's etc) available to run it on.
             | Language mostly has only local sentence structure
             | dependencies, yet transformer's self-attention mechanism
             | has every word in a sentence paying attention to every
             | other word (to some learned degree)! Turns out it's better
             | to be dumb and fast than smart, although I expect future
             | architectures will be much more efficient.
        
         | ordu wrote:
         | _> The power efficiency difference may be explainable by the
         | much slower frequency of brain computation (200 Hz vs. 2GHz)._
         | 
         | Or by a more static design. A GPU can't do a thing without all
         | the weights and shaders. There are benefits of this, you can
         | easily swap one model for another. Human mind from the other
         | hand is not reprogrammable. It can learn new tricks, but you
         | cannot extract a firmware from one person and to upload it to
         | another person.
         | 
         | Just imagine if every logical neuron of AI was a real thing,
         | with physical connections to other neurons as inputs. No more
         | need to have a high throughput memory, no more need to have
         | compute units with gigaherz frequency.
        
         | wslh wrote:
         | Neurons are not just linear algebra pieces by the way. This is
         | why there is bibliography that talks about the complexity of a
         | single neuron. So, comparing at the unit level at this point is
         | apples vs. oranges.
         | 
         | But, yes, the brain continues to be a surprising machine and ML
         | accomplishements are amazing for that machine.
        
         | orbital-decay wrote:
         | Apples to oranges. Gate count indicates nothing when the
         | architectures are nothing alike.
         | 
         | Brain is a spiking network with mutable connectivity, mostly
         | asynchronous. Only the active path is spending energy at a
         | single moment in time, and "compute" is tightly coupled with
         | memory to the point of being indistinguishable. No need to move
         | data anywhere.
         | 
         | In contrast, GPUs/TPUs are clocked and run fully connected
         | networks, they have to iterate over humongous data arrays every
         | time. Memory is decoupled from compute due to the semiconductor
         | process differences between the two. As a result, they waste a
         | huge amount of energy just moving data back and forth.
         | 
         | Fundamental advancements in SNNs are also required, it's not
         | just about the transistors.
        
         | Traubenfuchs wrote:
         | Neurons and synapses have more than 100 different
         | neurotransmitters and their many receptors, there is reuptake
         | and destructive encyme activity and they are connected up to
         | many thousands of their peer. Every single neuron is a
         | dizzyingly complex machine, employing countless sub machines
         | and processes.
         | 
         | You can not reasonably compare this to model parameters or
         | transistors at all.
        
         | ozim wrote:
         | Nature was "building" this stuff for eons. I feel pretty good
         | about our progress in still less than a 100 years.
        
       | somewhereoutth wrote:
       | and how we'll find a way to soak up all those transistors for a
       | perhaps actually _worse_ user experience  / societal outcome.
        
       | phkahler wrote:
       | If we are just after AI now we should drop the GPU concept and
       | call it what it is - Matrix multipliers. With that framing, we
       | can move to in-memory compute so the data doesn't have to be
       | moving around so much. Memory chips could have lines of MAC units
       | at some intervals and something to compute the nonlinear
       | functions after summing. Fixed sizes could be implemented in
       | hardware and software would spread larger computations over a
       | number of tiles. If this were standardized we might see it end up
       | as a modest premium price on memory chips. nVidia step aside it's
       | going to be Micron, Hynix and friends revolutionizing AI.
        
       | kazinator wrote:
       | Hardware, or at least purely computational hardware, will never
       | get the same accolades as the software that actually makes it do
       | something.
       | 
       | Interface hardware, being is perceptible to the senses, gets
       | credit over software.
       | 
       | E.g. when people experience a vivid, sharp high resolution
       | display, they attribute all its good properties to the hardware,
       | even if there is some software involved in improving the visuals,
       | like making fonts look better and whatnot.
       | 
       | If a mouse works nicely, people attribute it to the hardware, not
       | the drivers.
       | 
       | If you work in hardware, and crave the appreciation, make
       | something that people look at, hear, or hold in their hands, not
       | something that crunches away in a closet.
        
       | anonymousDan wrote:
       | I hear a lot about the energy efficiency of animal brains in
       | comparison to e.g. GPUs. However, as far as I can tell most of
       | the numbers reported are for adult brains, which effectively have
       | been sparsified over time. Does anyone know of how the picture
       | changes if we consider baby animal brains, which as I understand
       | it have much denser connectivity and higher energy consumption
       | than adult brains?
        
       ___________________________________________________________________
       (page generated 2024-03-28 23:01 UTC)