hngopher.com

       [HN Gopher] Stanford engineers present new chip that ramps up AI...
       ___________________________________________________________________
        
       Stanford engineers present new chip that ramps up AI computing
       efficiency
        
       Author : rbanffy
       Score  : 79 points
       Date   : 2022-08-21 12:20 UTC (10 hours ago)
        
 (HTM) web link (news.stanford.edu)
 (TXT) w3m dump (news.stanford.edu)
        
       | mikewarot wrote:
       | The assumption made in using a resistive network as a matrix
       | multiply is that D/A and A/D circuits are available at the speeds
       | and resolutions intended. For running a network a few times per
       | second, to utilize a trained network in an application, this is
       | entirely reasonable. However, if you're going to be running
       | hundreds of Mhz of samples at 8 bits or more of resolution, power
       | goes up very quickly.
       | 
       | Typical values are 100millwatts/channel for 8 bits of resolution
       | at 100 Mhz in each direction.
        
       | nmstoker wrote:
       | Would be interested to hear the take that more experienced ML
       | practitioners had on this, but from skimming the Nature article
       | (https://www.nature.com/articles/s41586-022-04992-8) it seems
       | like there is some reconfigurability built into these chips but
       | overall they're at a fair early stage. It doesn't yet sound like
       | it's something one could take a model from one of the well known
       | frameworks and simply drop it in / automatically convert it for
       | use in these NeuRRAM chips but presumably something along those
       | lines might be on the cards as things mature.
        
         | visarga wrote:
         | The article says:
         | 
         | > They found that it's 99% accurate in letter recognition from
         | the MNIST dataset, 85.7% accurate on image classification from
         | the CIFAR-10 dataset, 84.7% accurate on Google speech command
         | recognition and showed a 70% reduction in image-reconstruction
         | error on a Bayesian image recovery task.
         | 
         | These are all small scale models that run on very low power, on
         | edge devices. Not gonna load GPT-3 on it soon. It's for wake
         | word detection and security cameras.
        
       | aborsy wrote:
       | Can someone explain this a bit further?
       | 
       | I understand RAM-CPU transfer is avoided, so some type of memory
       | is brought next to CPU. RRAM is non-volatile (unlike on-chip
       | SRAM/cache), so it's like persistent random access flash memory.
       | 
       | Why flash or even some type of NVMe can't be brought very close
       | to CPU? Is it due to space and driver voltage requirement?
       | 
       | Another question: a processor is a collection of transistors. How
       | do you organize these transistors to accelerate a particular type
       | of computation, such as matrix multiplication versus vs basic
       | arithmetic operations (addition and multiplication)?
        
       | WithinReason wrote:
       | The trade-off can be that you're not as free in designing the
       | architecture of your model, and you have to train it in a special
       | way. This is actually a huge obstacle to making neural networks
       | more efficient. You _could_ design HW that would run networks
       | 10x-100x more efficiently, but the networks would have to be
       | specially trained for the HW, and the SW investment to create the
       | SW infrastructure to go with your HW is 10x the effort of
       | designing the HW. Nvidia knows this in the area of GPUs, and they
       | have 10x as many SW engineers than HW engineers. Neural network
       | HW startups have yet to learn it.
        
         | m3kw9 wrote:
         | Can anyone explain further as a simple example?
        
           | ip26 wrote:
           | Neural networks are commonly evaluated with matrix
           | multiplication. A matrix multiply is orders of magnitude
           | faster if all three matrices fit in your caches. So you want
           | to tune the matrix size to the hardware.
        
             | WithinReason wrote:
             | In addition, once you did that (tuned your hardware to a
             | matrix size) it can be inefficient to use smaller matrices.
             | So even if you can make a network that uses smaller
             | components the HW will not gain the corresponding speedup.
        
               | zasdffaa wrote:
               | MatMult = the dot product, effectively. I can't believe
               | that allowing smaller vectors is difficult or even non-
               | trivial in hardware.
        
               | svnt wrote:
               | You're assuming caches know anything about dot products.
        
               | zasdffaa wrote:
               | That seems a non-sequiteur to me. Can you elaborate?
        
               | WithinReason wrote:
               | GPUs are a good example: SIMD means all 128 threads do
               | the same work. If only 1 thread has work to do the other
               | 127 threads still take up resources.
        
               | zasdffaa wrote:
               | Ah, right, gotcha
        
         | ComplexSystems wrote:
         | This is one reason why I wish there were more interest in
         | FPGAs.
        
           | rbanffy wrote:
           | The idea of DRAM with lots and lots of tiny FPGAs is an
           | intriguing idea. I wish I were a chip designer to run with it
           | and see where it got me.
        
           | mlazos wrote:
           | IMO FPGAs sound like they'd be good with extra flexibility
           | but in practice ASICs are just better. Higher frequency,
           | lower power and there just isn't a way to make up for the
           | frequency gap. You can quantize and do fancier stuff, but
           | then you need to fine tune or train on your actual inference
           | HW to ensure you're actually correct. You can emulate your
           | quantization scheme and train on GPUs, but then your GPU
           | training will be slow af. Building a true model specific
           | bitstream for an FPGA that is actually utilizing all of the
           | resources of the FPGA is not simple, and even if you do,
           | overcoming the frequency gap is hard.
        
         | carschno wrote:
         | "The hardware lottery" describes roughly the same phenomenon:
         | https://hardwarelottery.github.io/
        
       | wpietri wrote:
       | I hadn't read about RRAM before, but from what Wikipedia has on
       | it [1], it's been a technology of the shining press-release
       | future for a while. E.g.: "In 2013, Crossbar introduced an ReRAM
       | prototype as a chip about the size of a postage stamp that could
       | store 1 TB of data. In August 2013, claimed that large-scale
       | production of their ReRAM chips was scheduled for 2015. [...]
       | Also in 2013, Hewlett-Packard demonstrated a memristor-based
       | ReRAM wafer, and predicted that 100 TB SSDs based on the
       | technology could be available in 2018 with 1.5 PB capacities
       | available in 2020 [...]"
       | 
       | [1] https://en.wikipedia.org/wiki/Resistive_random-access_memory
        
         | rbanffy wrote:
         | True, but HP was looking at different applications. I'm not
         | sure why they didn't pan out - at the time it looked massively
         | promising.
         | 
         | Here they are playing with the analog aspect of it the same way
         | we store multiple bits in flash cells, but doing math in their
         | analog behaviour. Maybe the issue with using it for storage was
         | long term stability, and this is not really an issue here.
        
       | jeffbee wrote:
       | Certificate expires on: Sunday, August 21, 2022 at 8:44:22 AM PDT
       | 
       | I don't know what it's going to take, and I had assumed that a
       | big dedicated organization like Let's Encrypt would have squared
       | this situation by now, but you should always, always, always make
       | your certificates expire in the middle of a work day in your
       | local time, never, never, never just exactly 90 days from this
       | instant, which appears to be what happened here.
        
         | femto113 wrote:
         | The real problem here is browsers treating recently "expired"
         | (but otherwise still perfectly "valid") certificates as
         | comparable in danger to visiting a known malware den. This is
         | absolutely the sort of thing that could harmlessly wait until
         | the next Monday to address if it weren't for the overwrought
         | interstitial warning page.
        
         | jillesvangurp wrote:
         | I've had the lets encrypt renew script fail silently once. I
         | found out when we started getting warnings like this. Turned
         | out to be a mis-configured server that had broken some months
         | before. Our fault of course. But it still happened.
         | 
         | Probably a good idea to monitor your certificate expiration.
         | You don't want to cut it too close.
        
         | CaliforniaKarl wrote:
         | I spent around 15 minutes looking around, and discovered:
         | 
         | In RFC 8555 (defining the ACME protocol), when applying for a
         | certificate, although they are optional, you can specify a
         | notAfter time. https://www.rfc-
         | editor.org/rfc/rfc8555.html#section-7.4
         | 
         | Per that section:
         | 
         | > The server MUST return an error if it cannot fulfill the
         | request as specified, and it MUST NOT issue a certificate with
         | contents other than those requested. If the server requires the
         | request to be modified in a certain way, it should indicate the
         | required changes using an appropriate error type and
         | description.
         | 
         | I don't think your beef should be with Let's Encrypt. The
         | timestamps they get are all UTC (or should be UTC), and I don't
         | want them inferring timezone from the IP address of the
         | requestor.
         | 
         | So, that leaves two things:
         | 
         | * Let's Encrypt might blanket-refuse any ACME orders that
         | specify a notAfter time, even if the time difference (from
         | notBefore to notAfter, or from $NOW to notAfter) is within
         | their policies
         | 
         | * ACME clients might not support specifying a particular
         | notAfter time.
         | 
         | I leave it up to you to figure out the two points above, to
         | decide if your ire should go to Let's Encrypt, or to ACME
         | clients.
        
       | JonChesterfield wrote:
       | I for one _love_ the wild variety of AI hardware that lives one
       | sufficiently smart compiler away from viable. It 's great to move
       | complexity out of the hardware and into the compiler stack.
        
       | spyder wrote:
       | _"Having those calculations done on the chip instead of sending
       | information to and from the cloud could enable faster, more
       | secure, cheaper, and more scalable AI going into the future, and
       | give more people access to AI power," said H.-S Philip Wong,_
       | 
       | Huh... why is he comparing it to network (cloud) transfers and
       | not to the current CPU-RAM transfers? I thought the benefit of
       | compute-in-memory is reduction of CPU-RAM transfers and that's
       | what they're also saying in the beginning of the article.
        
         | svnt wrote:
         | You sell much higher quantities of chips if they are not being
         | time-shared in the cloud.
         | 
         | This statement is a between-the-lines message to potential
         | partners and investors that they intend to sell to end users
         | and not data centers, and capture that increase in volume.
        
         | KKKKkkkk1 wrote:
         | They are targeting edge devices like your phone or watch.
         | Computations that use too much battery (e.g., because they do
         | too many cache misses) have to go to the cloud.
         | 
         | I am guessing the reason they're targeting edge is because the
         | technology is energy efficient but slow. Any experts can chime
         | in?
        
       ___________________________________________________________________
       (page generated 2022-08-21 23:01 UTC)