[HN Gopher] Stanford engineers present new chip that ramps up AI...
___________________________________________________________________
Stanford engineers present new chip that ramps up AI computing
efficiency
Author : rbanffy
Score : 79 points
Date : 2022-08-21 12:20 UTC (10 hours ago)
(HTM) web link (news.stanford.edu)
(TXT) w3m dump (news.stanford.edu)
| mikewarot wrote:
| The assumption made in using a resistive network as a matrix
| multiply is that D/A and A/D circuits are available at the speeds
| and resolutions intended. For running a network a few times per
| second, to utilize a trained network in an application, this is
| entirely reasonable. However, if you're going to be running
| hundreds of Mhz of samples at 8 bits or more of resolution, power
| goes up very quickly.
|
| Typical values are 100millwatts/channel for 8 bits of resolution
| at 100 Mhz in each direction.
| nmstoker wrote:
| Would be interested to hear the take that more experienced ML
| practitioners had on this, but from skimming the Nature article
| (https://www.nature.com/articles/s41586-022-04992-8) it seems
| like there is some reconfigurability built into these chips but
| overall they're at a fair early stage. It doesn't yet sound like
| it's something one could take a model from one of the well known
| frameworks and simply drop it in / automatically convert it for
| use in these NeuRRAM chips but presumably something along those
| lines might be on the cards as things mature.
| visarga wrote:
| The article says:
|
| > They found that it's 99% accurate in letter recognition from
| the MNIST dataset, 85.7% accurate on image classification from
| the CIFAR-10 dataset, 84.7% accurate on Google speech command
| recognition and showed a 70% reduction in image-reconstruction
| error on a Bayesian image recovery task.
|
| These are all small scale models that run on very low power, on
| edge devices. Not gonna load GPT-3 on it soon. It's for wake
| word detection and security cameras.
| aborsy wrote:
| Can someone explain this a bit further?
|
| I understand RAM-CPU transfer is avoided, so some type of memory
| is brought next to CPU. RRAM is non-volatile (unlike on-chip
| SRAM/cache), so it's like persistent random access flash memory.
|
| Why flash or even some type of NVMe can't be brought very close
| to CPU? Is it due to space and driver voltage requirement?
|
| Another question: a processor is a collection of transistors. How
| do you organize these transistors to accelerate a particular type
| of computation, such as matrix multiplication versus vs basic
| arithmetic operations (addition and multiplication)?
| WithinReason wrote:
| The trade-off can be that you're not as free in designing the
| architecture of your model, and you have to train it in a special
| way. This is actually a huge obstacle to making neural networks
| more efficient. You _could_ design HW that would run networks
| 10x-100x more efficiently, but the networks would have to be
| specially trained for the HW, and the SW investment to create the
| SW infrastructure to go with your HW is 10x the effort of
| designing the HW. Nvidia knows this in the area of GPUs, and they
| have 10x as many SW engineers than HW engineers. Neural network
| HW startups have yet to learn it.
| m3kw9 wrote:
| Can anyone explain further as a simple example?
| ip26 wrote:
| Neural networks are commonly evaluated with matrix
| multiplication. A matrix multiply is orders of magnitude
| faster if all three matrices fit in your caches. So you want
| to tune the matrix size to the hardware.
| WithinReason wrote:
| In addition, once you did that (tuned your hardware to a
| matrix size) it can be inefficient to use smaller matrices.
| So even if you can make a network that uses smaller
| components the HW will not gain the corresponding speedup.
| zasdffaa wrote:
| MatMult = the dot product, effectively. I can't believe
| that allowing smaller vectors is difficult or even non-
| trivial in hardware.
| svnt wrote:
| You're assuming caches know anything about dot products.
| zasdffaa wrote:
| That seems a non-sequiteur to me. Can you elaborate?
| WithinReason wrote:
| GPUs are a good example: SIMD means all 128 threads do
| the same work. If only 1 thread has work to do the other
| 127 threads still take up resources.
| zasdffaa wrote:
| Ah, right, gotcha
| ComplexSystems wrote:
| This is one reason why I wish there were more interest in
| FPGAs.
| rbanffy wrote:
| The idea of DRAM with lots and lots of tiny FPGAs is an
| intriguing idea. I wish I were a chip designer to run with it
| and see where it got me.
| mlazos wrote:
| IMO FPGAs sound like they'd be good with extra flexibility
| but in practice ASICs are just better. Higher frequency,
| lower power and there just isn't a way to make up for the
| frequency gap. You can quantize and do fancier stuff, but
| then you need to fine tune or train on your actual inference
| HW to ensure you're actually correct. You can emulate your
| quantization scheme and train on GPUs, but then your GPU
| training will be slow af. Building a true model specific
| bitstream for an FPGA that is actually utilizing all of the
| resources of the FPGA is not simple, and even if you do,
| overcoming the frequency gap is hard.
| carschno wrote:
| "The hardware lottery" describes roughly the same phenomenon:
| https://hardwarelottery.github.io/
| wpietri wrote:
| I hadn't read about RRAM before, but from what Wikipedia has on
| it [1], it's been a technology of the shining press-release
| future for a while. E.g.: "In 2013, Crossbar introduced an ReRAM
| prototype as a chip about the size of a postage stamp that could
| store 1 TB of data. In August 2013, claimed that large-scale
| production of their ReRAM chips was scheduled for 2015. [...]
| Also in 2013, Hewlett-Packard demonstrated a memristor-based
| ReRAM wafer, and predicted that 100 TB SSDs based on the
| technology could be available in 2018 with 1.5 PB capacities
| available in 2020 [...]"
|
| [1] https://en.wikipedia.org/wiki/Resistive_random-access_memory
| rbanffy wrote:
| True, but HP was looking at different applications. I'm not
| sure why they didn't pan out - at the time it looked massively
| promising.
|
| Here they are playing with the analog aspect of it the same way
| we store multiple bits in flash cells, but doing math in their
| analog behaviour. Maybe the issue with using it for storage was
| long term stability, and this is not really an issue here.
| jeffbee wrote:
| Certificate expires on: Sunday, August 21, 2022 at 8:44:22 AM PDT
|
| I don't know what it's going to take, and I had assumed that a
| big dedicated organization like Let's Encrypt would have squared
| this situation by now, but you should always, always, always make
| your certificates expire in the middle of a work day in your
| local time, never, never, never just exactly 90 days from this
| instant, which appears to be what happened here.
| femto113 wrote:
| The real problem here is browsers treating recently "expired"
| (but otherwise still perfectly "valid") certificates as
| comparable in danger to visiting a known malware den. This is
| absolutely the sort of thing that could harmlessly wait until
| the next Monday to address if it weren't for the overwrought
| interstitial warning page.
| jillesvangurp wrote:
| I've had the lets encrypt renew script fail silently once. I
| found out when we started getting warnings like this. Turned
| out to be a mis-configured server that had broken some months
| before. Our fault of course. But it still happened.
|
| Probably a good idea to monitor your certificate expiration.
| You don't want to cut it too close.
| CaliforniaKarl wrote:
| I spent around 15 minutes looking around, and discovered:
|
| In RFC 8555 (defining the ACME protocol), when applying for a
| certificate, although they are optional, you can specify a
| notAfter time. https://www.rfc-
| editor.org/rfc/rfc8555.html#section-7.4
|
| Per that section:
|
| > The server MUST return an error if it cannot fulfill the
| request as specified, and it MUST NOT issue a certificate with
| contents other than those requested. If the server requires the
| request to be modified in a certain way, it should indicate the
| required changes using an appropriate error type and
| description.
|
| I don't think your beef should be with Let's Encrypt. The
| timestamps they get are all UTC (or should be UTC), and I don't
| want them inferring timezone from the IP address of the
| requestor.
|
| So, that leaves two things:
|
| * Let's Encrypt might blanket-refuse any ACME orders that
| specify a notAfter time, even if the time difference (from
| notBefore to notAfter, or from $NOW to notAfter) is within
| their policies
|
| * ACME clients might not support specifying a particular
| notAfter time.
|
| I leave it up to you to figure out the two points above, to
| decide if your ire should go to Let's Encrypt, or to ACME
| clients.
| JonChesterfield wrote:
| I for one _love_ the wild variety of AI hardware that lives one
| sufficiently smart compiler away from viable. It 's great to move
| complexity out of the hardware and into the compiler stack.
| spyder wrote:
| _"Having those calculations done on the chip instead of sending
| information to and from the cloud could enable faster, more
| secure, cheaper, and more scalable AI going into the future, and
| give more people access to AI power," said H.-S Philip Wong,_
|
| Huh... why is he comparing it to network (cloud) transfers and
| not to the current CPU-RAM transfers? I thought the benefit of
| compute-in-memory is reduction of CPU-RAM transfers and that's
| what they're also saying in the beginning of the article.
| svnt wrote:
| You sell much higher quantities of chips if they are not being
| time-shared in the cloud.
|
| This statement is a between-the-lines message to potential
| partners and investors that they intend to sell to end users
| and not data centers, and capture that increase in volume.
| KKKKkkkk1 wrote:
| They are targeting edge devices like your phone or watch.
| Computations that use too much battery (e.g., because they do
| too many cache misses) have to go to the cloud.
|
| I am guessing the reason they're targeting edge is because the
| technology is energy efficient but slow. Any experts can chime
| in?
___________________________________________________________________
(page generated 2022-08-21 23:01 UTC)