[HN Gopher] 100-GHz Single-Flux-Quantum Bit-Serial Adder Based o...
___________________________________________________________________
100-GHz Single-Flux-Quantum Bit-Serial Adder Based on 10-KA/Cm2
Niobium Process
Author : peter_d_sherman
Score : 92 points
Date : 2021-02-08 14:46 UTC (8 hours ago)
(HTM) web link (ieeexplore.ieee.org)
(TXT) w3m dump (ieeexplore.ieee.org)
| peter_d_sherman wrote:
| Related:
|
| "Automatic Single-Flux-Quantum (SFQ) Logic Synthesis Method for
| Top-Down Circuit Design"
|
| https://iopscience.iop.org/article/10.1088/1742-6596/43/1/28...
|
| >"Abstract. Single-flux-quantum (SFQ) logic circuits provide
| faster operations with lower power consumption, using Josephson
| junctions as the switching devices. In the top-down flow of SFQ
| circuit design, we have already developed a place-and-route tool
| that covers backend circuit design. In this paper, we present an
| automatic SFQ logic synthesis method that covers front-end
| circuit design. The logic synthesis is a process that generates a
| gate-level logic circuit from a functional specification written
| in hardware description languages. In our SFQ synthesis method,
| after we generate an intermediate circuit with the help of a
| synthesis tool for semiconductor circuits, we convert it into a
| gate-level pipelined SFQ circuit. To do this, an automatic
| synthesis tool was implemented."
|
| PDS: Phrased another way, the idea here can basically be boiled
| down to:
|
| For the most time-critical parts of a conventional CPU, i.e., the
| Adder/ALU -- instead of using conventional circuitry for that,
| let's use circuits specifically designed for Quantum Computers
| because of their fast switching speed...
|
| The end result is that you still get a conventional digital CPU
| -- albeit, in theory, a much, much faster one...
|
| The "magic phrase" (for research) for all of this -- is _Single-
| flux-quantum (SFQ) logic circuits_...
| AI_WAIFU wrote:
| This paper is over 10 years old.
| hdjfjrbrbt wrote:
| At 100 GHz signals can only propagate 3 cm per clock.
|
| So to run at those speeds, you need extremely compact circuits,
| and you need to bring memory closer too.
|
| Maybe move to 3D, another way to increase reach.
|
| What is the current thinking regarding this problem?
| rbanffy wrote:
| I remember research on asynchronous (or clockless) CPUs.
| Philips had an asynchronous ARM demonstrator.
|
| Where did that go?
| ben_w wrote:
| 3mm rather than 3cm, but also note that full CPUs have to run
| slower than the upper limit of the transistors inside them,
| because most operations need to go through multiple
| transistors.
| jjoonathan wrote:
| Caches are already closer than 3cm and can take multiple cycles
| to fetch from. DRAM takes 100s of clock cycles to fetch from!
| It's bad -- but the badness was already here.
|
| Propagation delay problems are real, but for caches IIRC the
| biggest problems are transmission lines being lossy and having
| propagation speeds much slower than the speed of light in a
| vacuum, and for DRAM the biggest problems are waiting for
| precharge and waiting for the sense amplifiers to stabilize.
| Waiting for the precision required to read obscenely small
| capacitors, in other words.
|
| If DRAM is 1T1C and SRAM is 6T, I always wondered why we
| couldn't just have DIMMs of SRAM at 1/3rd the capacity or
| whatever. Overprovisioning DRAM by that factor is already
| common, wouldn't eliminating all the downtime be a way to put
| that slack to work? Like the HDD -> SSD transition? Ah well,
| I'm sure there's a constraint I'm just not thinking of.
| simcop2387 wrote:
| You'd need a different CPU these days since the memory
| controller for sram would be a bit different. That said you
| do see sram in smaller embedded systems because it's also
| easier to drive and work with signal wise. I bet someone's
| done it as a research project with larger CPUs but I bet the
| cost of the SRAM and the cost of a niche cpu design just made
| it all not viable in the market place.
| jjoonathan wrote:
| Right, memory controllers wouldn't be able to fully
| leverage SRAM DIMMs right out of the gate, but they would
| be able to partially leverage them right out of the gate
| and fully leverage them within a JEDEC cycle. It has been
| multiple JEDEC cycles since over-provisioning became
| common, so I suspect there must be something else
| preventing this.
|
| SRAM costs way more, but afaik that's an artifact of market
| size, not fundamental. Hence the 1T1C vs 6T comparison to
| tease out the fundamental cost difference, which looks to
| be not worse than 3x-6x more expensive, which would put
| SRAM DIMMs will within reach. I lean on heinously expensive
| SRAM in my own embedded designs and would really like to
| see some market scale drive down those costs!
| klyrs wrote:
| One thing to note is that this is a bit-serial adder. If you
| use this to perform, say, 32-bit arithmetic, then the effective
| clock rate is about 3GHz. Half that for 64-bit. Compare this to
| a PCIe4 bus, which is clocked at 16GT/s -- that is, each "lane"
| runs at 16GHz and is used to transmit data around serially
| instead of routing more wires around to make a wider bus.
| That's readily-available (damn the chip shortages) tech, but we
| can't buy 16GHz general-purpose processors.
| mlyle wrote:
| It's a bit-serial adder not for any technical reason other
| than it's a convenient benchmark for characterization of the
| logic within. You could obviously use the process to make
| wide adders with faster clock speeds than stacking serial
| adders in front of each other.
| klyrs wrote:
| I'm sorry, but that's not obvious. Given that we've got PCI
| circuits running at 16GHz, why don't chipmakers give us
| full-scale processors with those faster clock speeds?
|
| Bit-serial adders are extremely simple circuits with a
| single carry register, which can run at essentially the
| full speed of the underlying switching devices. Ripple-
| carry adders have a circuit depth proportional to the
| number of bits in the adder -- they're only stable when the
| clock is substantially slower than the switching time
| (proportional to the circuit depth), because the carry
| signal needs to propagate through all of the 1-bit adders.
| [deleted]
| deepnotderp wrote:
| But for CPUs no one uses ripple carry, they use high
| radix adders like Kogge-Stone
| [deleted]
| mlyle wrote:
| > I'm sorry, but that's not obvious. Given that we've got
| PCI circuits running at 16GHz, why don't chipmakers give
| us full-scale processors with those faster clock speeds?
|
| That's not what I said. I'm just saying that it's obvious
| that if you wanted a fast 64 bit adder on the process,
| you'd build a carry lookahead adder or other fast adder,
| not stack a single bit adder 64 times. Saying "YOU NEED
| TO MULTIPLY PROP TIME BY 64" seems kinda dishonest.
|
| I said:
|
| > > You could obviously use the process to make wide
| adders with faster clock speeds than stacking serial
| adders in front of each other.
|
| Do you really think the optimum thing is going to be
| stacking 64 of these in a row?
| klyrs wrote:
| Ah, I misread you -- yes, there are obviously faster
| adder circuits. My point is that they won't have such
| impressive clock rates as this serial adder can achieve.
| avmich wrote:
| Asynchronous schemes don't need to propagate clock through the
| whole scheme, so they don't have this limitation.
| fctorial wrote:
| Maybe 10000 core processors, with each core in micrometer
| range, and threads occupying cores permanently. Cores will be
| assigned to processes like tiktaks (memory). Probably similar
| to the transition from bytes of ram to kbs of ram.
| api wrote:
| There would definitely be applications for a tiny ultrafast
| serial CPU with on-board RAM. Not all algorithms are
| parallelizable.
|
| I'm imagining a small in-order 32-bit core with ~256KiB static
| RAM running at 100ghz. Call it the Serial Killer.
| baxtr wrote:
| Is this a real title or was this title generated by GPT-3?
| jjoonathan wrote:
| 170 GHz is the speed of a PLL published by a popular chip
| designer on youtube a decade ago in 130nM SiGe BiCMOS.
|
| The "100GHz" headline tempts comparison to CPU clock speeds but
| those are absolutely not the right point of comparison. GHz doing
| one thing != GHz doing another. CPU clock cycles are very deep
| compared to a bit-serial adders and operate under heavy thermal
| constraints. To compare apples to apples, we need to know: what
| is the speed of a bit-serial adder in near-future CMOS? Silicon
| bipolar? 3-5 bipolar?
|
| My guess: you could implement a 100GHz bit serial adder in CMOS
| _today_ and get several hundred GHz if you dropped thermal
| constraints, went bipolar, etc. That doesn 't invalidate the
| research -- we need more public cutting-edge device research, not
| less -- but the posts assuming this result translates to 100GHz
| CPUs are getting a bit ahead of themselves.
| mlyle wrote:
| As you say-- it depends upon what you're doing.
|
| Comparing a PLL to an adder is silly.
|
| > My guess: you could implement a 100GHz bit serial adder in
| CMOS today
|
| 7nm FINFET CMOS will get you to roughly the same performance
| point. It's also quite mature compared to the technique in the
| article.
| jjoonathan wrote:
| > Comparing a PLL to an adder is silly.
|
| Yeah, but comparing a bit adder to a 64 bit full adder
| integrated into a modern CPU clock cycle is _really_ silly :)
|
| Glad to hear my intuition about modern CMOS was in the
| ballpark though. I've been out of this scene for the better
| part of a decade.
| mlyle wrote:
| Ring oscillators and bit adders and 30 gate benchmark
| circuits are how we characterize processes. We don't try
| and fabricate a superscalar CPU on our first round trips
| through developing a new process let alone new transistor
| types.
|
| This is doing quite well for early rounds on this
| technique.
| jjoonathan wrote:
| Are any of those figures public?
|
| I did a cursory search but couldn't find them so I went
| with slightly silly clickbait on the rationale that it
| was an order of magnitude less silly than the other
| comparisons being made in the thread. I stand by that: as
| "benchmarks," a bit adder is much closer to a PLL divider
| than a 64 bit full adder in a CPU cycle.
|
| I fully agree that there's another order of magnitude
| before the comparison starts to become apples-to-apples,
| or bit-adders to bit-adders, but you're going to have to
| help with the search if you want to see it happen.
| mlyle wrote:
| What figures, exactly?
|
| Here, we're discussing a paper characterizing the current
| version of the process first with ring oscillators and
| measured delays through a single cell, and then with a
| serial bit adder, so.. if you mean those, yes?
| jjoonathan wrote:
| I mean can you find the bit adder figures for, say, TSMC
| N7 or N5? I poked around google scholar and wikipedia for
| a few minutes before giving up and hoping that someone up
| with the times would know exactly where to look.
| mlyle wrote:
| No. Those are subject to NDA, but are broadly similar in
| fmax and prop time under reasonable voltages.
|
| I think it's exciting here that they're dicking around
| with new technologies in their infancy, and in a few
| coarse design changes of cells went from 60% of CMOS
| leading edge to about the same numbers.
|
| Of course, total system power, etc, is atrocious,
| considering that this is cryogenic, etc. You need a lot
| of logic before this could come ahead in efficiency.
| jjoonathan wrote:
| > No. Those are subject to NDA
|
| Yeah, that's what I thought and why I settled for a less
| accurate comparison.
|
| > I think it's exciting
|
| I think it's exciting, too. The extent of my claims is
| "no, this does not mean 100GHz LN2 CPUs in a few years."
| I did not intend these remarks to be meaningful to chip
| designers, only to CPU consumers who see "GHz" and think
| "CPU clock speeds."
|
| The leverage of having a slightly better fundamental
| device is extraordinary. We should spare no expense
| looking for them and leave no stone unturned.
| jandrese wrote:
| Signal propagation time seems like it should be an issue at
| 100Ghz. Light only travels about 3mm per clock cycle at that
| speed. A circuit built at that speed will need absurdly deep
| pipelines to do much more than single bit adding.
| fctorial wrote:
| Hire minecraft computer builders. They have a lot of
| experience in that.
| marcodiego wrote:
| When I read about quantum computing the impression I get is that
| quantum computers are more similar to analog computers than to
| classical computers. That is, there are a problems where they are
| better/faster than classical computer and problems where
| classical are better/faster than quantum computers.
|
| Having a "100-GHz Single-Flux-Quantum Bit-Serial Adder" means we
| can finally have hope for classical computations at hundreds of
| GigaHertz?
| Robotbeat wrote:
| To be clear, this is just a classical computer component but
| leveraging a quantum effect (Josephson junction, which can be
| sensitive to a single quantum of magnetic flux) for its
| operation.
|
| Quantum computers are very different in that they maintain a
| non-classical state throughout many components, allowing
| certain kinds of matrix calculations to be solved significantly
| faster than is possible for any classical machine, analog or
| digital. (Although quantum solutions have a stochastic nature
| and there are other caveats to quantumness that potentially
| stand in the way of actually exceeding classical computers).
| packetlost wrote:
| > quantum computers are more similar to analog computers
|
| I can't speak for all architectures, but the ones I'm aware of
| absolutely are analog computers.
| Robotbeat wrote:
| Quantum computers use qubits, which doesn't really have an
| analogue in analog computers.
| packetlost wrote:
| Not directly, but the problems and engineering that go into
| building one are very similar. Qubits _are_ analogous to
| regular bits, at least on 'measure'. They're either in 1
| state or another, and all the interesting stuff is in how
| you get to that state.
|
| Source: building a quantum computer.
| Robotbeat wrote:
| Yes, that's my point! The person I'm responding to is
| saying they're like _analog_ computers, which don't use
| bits. Quantum computers are a bit more like digital
| computers.
| packetlost wrote:
| I am the person who said that :) I guess what I really
| meant was I agree with your original statement.
|
| They're closer to analog computers than digital, but
| really they're neither and share some similarities of
| both. In the sense that the qubits usually only have 2
| measurable states in computation, they're similar to
| digital computers, but that only goes for when you
| measure. On the other side, the 'programs' going in are
| not really digital information (though they can be
| represented as such), and because qubits are not only in
| one of 2 states (unlike bits) the way in which you
| interact/program them is definitely not digital, it is
| analog.
|
| If you're interested in learning about writing quantum
| programs, I recommend checking out IBM's Qiskit
| toolkit[0]. I found the tutorials to be helpful for
| helping grasp the fundamentals of quantum programming.
|
| [0]:
| https://qiskit.org/documentation/getting_started.html
| M277 wrote:
| Off-topic, very sorry:
|
| Are there any resources you particularly recommend if I
| wanted to learn more about quantum computing and how
| quantum computers work on the hardware level?
| IlliOnato wrote:
| Perhaps not exactly what you ask for, i.e. not that much
| on hardware/engineering, rather on physics and
| mathematics the quantum computing is built on:
| https://www.amazon.com/Quantum-Computing-since-
| Democritus-Aa...
| M277 wrote:
| It does look like an interesting read. Thank you :)
| thatcherc wrote:
| To be clear, despite the appearance of 'quantum' in the title,
| this is a development toward faster classical computers by using
| superconducting circuit elements (Josephson junctions). A
| computer using this type of logic would need to be cryo-cooled
| down to ~single-digit Kelvin temperatures, but as discussed in a
| thread a few weeks ago[0], a superconducting computer can still
| beat a conventional computer in terms of ops/Watt, even when the
| power consumption of the cooling equipment is included.
|
| [0] - https://news.ycombinator.com/item?id=25765128
| mdip wrote:
| Thanks for summing that up; based on the title, alone, I was
| considering clicking because I so rarely encounter a sentence
| such as this: containing so many words I know/use regularly yet
| when put together in that precise manner, appeared to be as to
| have been assembled by a garbage disposal.
| blacksmith_tb wrote:
| Free cooling to those temperatures is available in orbit...
| well, free once you get there...
| deepsun wrote:
| Nope, satellites on orbit have thermal protection from the
| sun's radiation. It's also harder to dump excessive
| temperature without matter around. IIRC space suits expel
| some finite amount of gas to get rid of extra heat.
| contravariant wrote:
| Theoretically keeping something at the same temperature is at
| most going to double the energy requirements, though there's
| likely to be some additional practical issues with maintaining
| such a low temperature.
| amluto wrote:
| Not even close. A perfect refrigerator will use vastly more
| than 1J to remove 1J from a 4.2K device. Look up Carnot's
| theorem.
| a1369209993 wrote:
| > A perfect refrigerator will use vastly more than 1J to
| remove 1J from a 4.2K device.
|
| While your actual point is true, your example is too
| loosely specified - the energy used by a perfect (or even
| imperfect) refrigerator depends on the temperature of the
| environment it's dumping the heat into. In the extreme (and
| also engineering-wise preferred) case, with a heat sink at
| cosmic microwave background temperatures of ~2.7K, a
| _perfect_ refrigerator (rather, heat exchanger) would
| actually _gain_ energy. (Of course, since the literally-
| glowing-hot sun takes up some portion of the sky, actually
| _getting_ a <4.2K environment would likely require siting
| your computer in the Oort Cloud, if not outside the galaxy
| entirely, hence why your actual point is true.)
| contravariant wrote:
| By Carnot's theorem an ideal engine will convert convert 1J
| heat into less than 1J of work and this process is
| invertible.
|
| So as a corollary you can transfer 1J of heat using less
| than 1J of work. Since the temperature difference is so
| large the theoretical limit is really close to 1J of work
| though.
| amluto wrote:
| Something went wrong in your calculation.
|
| An ideal heat engine operating between reservoirs at T_h
| and T_c has efficiency 1 - T_c/T_h. This means, when
| operating reversibly, it will send Q_c heat to the cold
| side, remove Q_h from the hot side, and do W work, where
| Q_c = Q_h - W and W = (1 - T_c/T_h)Q_h. If T_c = T_h/10,
| then 1 - T_c/T_h = 90% (very efficient!) and Q_c =
| 0.1*Q_h (not much waste heat!). Now turn this around,
| because it's reversible. A heat pump removes Q_c from the
| cold side and exhausts Q_h to the hot side. If you remove
| 1J from the cold side, you exhaust 10J to the hot side,
| and that 9J difference is the work done. That is, it
| costs 9J to remove 1J of heat from a 27.3K refrigerator
| if the exhaust is at 273K. It's worse at 4.2K.
|
| (You can do an equivalent calculation a little more
| tidily by considering the entropy change of each side. To
| remove a given amount of entropy from the cold side, you
| must add at least as much entropy to the hot side. Plug
| in the usual formula for isothermal entropy change, and
| you get the answer.)
| KirillPanov wrote:
| > superconducting computer can still beat a conventional
| computer in terms of ops/Watt, even when the power consumption
| of the cooling equipment is included
|
| But likely not when the energy cost of manufacturing the
| cryocooler (which has a limited working life) is included.
|
| If you count dollars instead of joules the answer is a definite
| "no". The manufacturing cost per joule of the cryocooler,
| amortized over its usable life, far exceeds the value of the
| energy saved.
| HPsquared wrote:
| Very cool!
| wiz21c wrote:
| Cool as cold ?
| [deleted]
| ajb wrote:
| There have been attempts to make these SFQ circuits work for
| ages, eg see this from 1999:
| https://www.researchgate.net/publication/3310945_Rapid_Singl...
|
| What I want to know is, what were the problems that prevented
| them succeeding before, and have they been overcome now?
| tyingq wrote:
| Dated (2005), but this report seems comprehensive:
| https://www.nitrd.gov/pubs/nsa/sta.pdf
| rbanffy wrote:
| I think I've read about Josephson junction superconducting
| transistors in the mid-80's. And it wasn't in a cutting edge
| journal.
| ThePhysicist wrote:
| I designed such superconducting chips for my MSc thesis and had
| them manufactured by Hypres, one of the few foundries that
| offered such a process. Designing superconducting circuits is
| great fun, I even wrote my own circuit simulator
| (https://github.com/adewes/superconductor/).
|
| RSFQ (Rapid Single Flux Quantum Logic, or as some would jokingly
| say Russian Single Flux Quantum Logic) was quite hyped up in the
| nineties, Prof. Likharev at Stony Brook had a collaboration with
| IBM working on RSFQ circuit elements to replace conventional
| semiconductor logic. At the time the achievable speed was
| fantastic as compared to regular circuits, (un)fortunately
| semiconductor processes kept evolving and today RSFQ is only
| interesting for some niche applications like fast microwave
| circuits at cryogenic temperatures (and even there HEMT
| transistors are often a better solution nowadays).
|
| Also, getting circuits with more than 10.000 junctions to work
| was quite tricky as the fabrication processes weren't very
| reliable and transferring flux quanta is a bit more noisy than
| storing charges on an FET, so I'm doubtful whether we could even
| have large-scale RSFQ circuits without extensive error
| correction.
|
| Well, it's still an amazingly fun and fascinating field, really
| hope we might see a revival of it one day (maybe if we get room-
| temperature superconductors).
___________________________________________________________________
(page generated 2021-02-08 23:01 UTC)