[HN Gopher] 100-GHz Single-Flux-Quantum Bit-Serial Adder Based o...
       ___________________________________________________________________
        
       100-GHz Single-Flux-Quantum Bit-Serial Adder Based on 10-KA/Cm2
       Niobium Process
        
       Author : peter_d_sherman
       Score  : 92 points
       Date   : 2021-02-08 14:46 UTC (8 hours ago)
        
 (HTM) web link (ieeexplore.ieee.org)
 (TXT) w3m dump (ieeexplore.ieee.org)
        
       | peter_d_sherman wrote:
       | Related:
       | 
       | "Automatic Single-Flux-Quantum (SFQ) Logic Synthesis Method for
       | Top-Down Circuit Design"
       | 
       | https://iopscience.iop.org/article/10.1088/1742-6596/43/1/28...
       | 
       | >"Abstract. Single-flux-quantum (SFQ) logic circuits provide
       | faster operations with lower power consumption, using Josephson
       | junctions as the switching devices. In the top-down flow of SFQ
       | circuit design, we have already developed a place-and-route tool
       | that covers backend circuit design. In this paper, we present an
       | automatic SFQ logic synthesis method that covers front-end
       | circuit design. The logic synthesis is a process that generates a
       | gate-level logic circuit from a functional specification written
       | in hardware description languages. In our SFQ synthesis method,
       | after we generate an intermediate circuit with the help of a
       | synthesis tool for semiconductor circuits, we convert it into a
       | gate-level pipelined SFQ circuit. To do this, an automatic
       | synthesis tool was implemented."
       | 
       | PDS: Phrased another way, the idea here can basically be boiled
       | down to:
       | 
       | For the most time-critical parts of a conventional CPU, i.e., the
       | Adder/ALU -- instead of using conventional circuitry for that,
       | let's use circuits specifically designed for Quantum Computers
       | because of their fast switching speed...
       | 
       | The end result is that you still get a conventional digital CPU
       | -- albeit, in theory, a much, much faster one...
       | 
       | The "magic phrase" (for research) for all of this -- is _Single-
       | flux-quantum (SFQ) logic circuits_...
        
       | AI_WAIFU wrote:
       | This paper is over 10 years old.
        
       | hdjfjrbrbt wrote:
       | At 100 GHz signals can only propagate 3 cm per clock.
       | 
       | So to run at those speeds, you need extremely compact circuits,
       | and you need to bring memory closer too.
       | 
       | Maybe move to 3D, another way to increase reach.
       | 
       | What is the current thinking regarding this problem?
        
         | rbanffy wrote:
         | I remember research on asynchronous (or clockless) CPUs.
         | Philips had an asynchronous ARM demonstrator.
         | 
         | Where did that go?
        
         | ben_w wrote:
         | 3mm rather than 3cm, but also note that full CPUs have to run
         | slower than the upper limit of the transistors inside them,
         | because most operations need to go through multiple
         | transistors.
        
         | jjoonathan wrote:
         | Caches are already closer than 3cm and can take multiple cycles
         | to fetch from. DRAM takes 100s of clock cycles to fetch from!
         | It's bad -- but the badness was already here.
         | 
         | Propagation delay problems are real, but for caches IIRC the
         | biggest problems are transmission lines being lossy and having
         | propagation speeds much slower than the speed of light in a
         | vacuum, and for DRAM the biggest problems are waiting for
         | precharge and waiting for the sense amplifiers to stabilize.
         | Waiting for the precision required to read obscenely small
         | capacitors, in other words.
         | 
         | If DRAM is 1T1C and SRAM is 6T, I always wondered why we
         | couldn't just have DIMMs of SRAM at 1/3rd the capacity or
         | whatever. Overprovisioning DRAM by that factor is already
         | common, wouldn't eliminating all the downtime be a way to put
         | that slack to work? Like the HDD -> SSD transition? Ah well,
         | I'm sure there's a constraint I'm just not thinking of.
        
           | simcop2387 wrote:
           | You'd need a different CPU these days since the memory
           | controller for sram would be a bit different. That said you
           | do see sram in smaller embedded systems because it's also
           | easier to drive and work with signal wise. I bet someone's
           | done it as a research project with larger CPUs but I bet the
           | cost of the SRAM and the cost of a niche cpu design just made
           | it all not viable in the market place.
        
             | jjoonathan wrote:
             | Right, memory controllers wouldn't be able to fully
             | leverage SRAM DIMMs right out of the gate, but they would
             | be able to partially leverage them right out of the gate
             | and fully leverage them within a JEDEC cycle. It has been
             | multiple JEDEC cycles since over-provisioning became
             | common, so I suspect there must be something else
             | preventing this.
             | 
             | SRAM costs way more, but afaik that's an artifact of market
             | size, not fundamental. Hence the 1T1C vs 6T comparison to
             | tease out the fundamental cost difference, which looks to
             | be not worse than 3x-6x more expensive, which would put
             | SRAM DIMMs will within reach. I lean on heinously expensive
             | SRAM in my own embedded designs and would really like to
             | see some market scale drive down those costs!
        
         | klyrs wrote:
         | One thing to note is that this is a bit-serial adder. If you
         | use this to perform, say, 32-bit arithmetic, then the effective
         | clock rate is about 3GHz. Half that for 64-bit. Compare this to
         | a PCIe4 bus, which is clocked at 16GT/s -- that is, each "lane"
         | runs at 16GHz and is used to transmit data around serially
         | instead of routing more wires around to make a wider bus.
         | That's readily-available (damn the chip shortages) tech, but we
         | can't buy 16GHz general-purpose processors.
        
           | mlyle wrote:
           | It's a bit-serial adder not for any technical reason other
           | than it's a convenient benchmark for characterization of the
           | logic within. You could obviously use the process to make
           | wide adders with faster clock speeds than stacking serial
           | adders in front of each other.
        
             | klyrs wrote:
             | I'm sorry, but that's not obvious. Given that we've got PCI
             | circuits running at 16GHz, why don't chipmakers give us
             | full-scale processors with those faster clock speeds?
             | 
             | Bit-serial adders are extremely simple circuits with a
             | single carry register, which can run at essentially the
             | full speed of the underlying switching devices. Ripple-
             | carry adders have a circuit depth proportional to the
             | number of bits in the adder -- they're only stable when the
             | clock is substantially slower than the switching time
             | (proportional to the circuit depth), because the carry
             | signal needs to propagate through all of the 1-bit adders.
        
               | [deleted]
        
               | deepnotderp wrote:
               | But for CPUs no one uses ripple carry, they use high
               | radix adders like Kogge-Stone
        
               | [deleted]
        
               | mlyle wrote:
               | > I'm sorry, but that's not obvious. Given that we've got
               | PCI circuits running at 16GHz, why don't chipmakers give
               | us full-scale processors with those faster clock speeds?
               | 
               | That's not what I said. I'm just saying that it's obvious
               | that if you wanted a fast 64 bit adder on the process,
               | you'd build a carry lookahead adder or other fast adder,
               | not stack a single bit adder 64 times. Saying "YOU NEED
               | TO MULTIPLY PROP TIME BY 64" seems kinda dishonest.
               | 
               | I said:
               | 
               | > > You could obviously use the process to make wide
               | adders with faster clock speeds than stacking serial
               | adders in front of each other.
               | 
               | Do you really think the optimum thing is going to be
               | stacking 64 of these in a row?
        
               | klyrs wrote:
               | Ah, I misread you -- yes, there are obviously faster
               | adder circuits. My point is that they won't have such
               | impressive clock rates as this serial adder can achieve.
        
         | avmich wrote:
         | Asynchronous schemes don't need to propagate clock through the
         | whole scheme, so they don't have this limitation.
        
         | fctorial wrote:
         | Maybe 10000 core processors, with each core in micrometer
         | range, and threads occupying cores permanently. Cores will be
         | assigned to processes like tiktaks (memory). Probably similar
         | to the transition from bytes of ram to kbs of ram.
        
         | api wrote:
         | There would definitely be applications for a tiny ultrafast
         | serial CPU with on-board RAM. Not all algorithms are
         | parallelizable.
         | 
         | I'm imagining a small in-order 32-bit core with ~256KiB static
         | RAM running at 100ghz. Call it the Serial Killer.
        
       | baxtr wrote:
       | Is this a real title or was this title generated by GPT-3?
        
       | jjoonathan wrote:
       | 170 GHz is the speed of a PLL published by a popular chip
       | designer on youtube a decade ago in 130nM SiGe BiCMOS.
       | 
       | The "100GHz" headline tempts comparison to CPU clock speeds but
       | those are absolutely not the right point of comparison. GHz doing
       | one thing != GHz doing another. CPU clock cycles are very deep
       | compared to a bit-serial adders and operate under heavy thermal
       | constraints. To compare apples to apples, we need to know: what
       | is the speed of a bit-serial adder in near-future CMOS? Silicon
       | bipolar? 3-5 bipolar?
       | 
       | My guess: you could implement a 100GHz bit serial adder in CMOS
       | _today_ and get several hundred GHz if you dropped thermal
       | constraints, went bipolar, etc. That doesn 't invalidate the
       | research -- we need more public cutting-edge device research, not
       | less -- but the posts assuming this result translates to 100GHz
       | CPUs are getting a bit ahead of themselves.
        
         | mlyle wrote:
         | As you say-- it depends upon what you're doing.
         | 
         | Comparing a PLL to an adder is silly.
         | 
         | > My guess: you could implement a 100GHz bit serial adder in
         | CMOS today
         | 
         | 7nm FINFET CMOS will get you to roughly the same performance
         | point. It's also quite mature compared to the technique in the
         | article.
        
           | jjoonathan wrote:
           | > Comparing a PLL to an adder is silly.
           | 
           | Yeah, but comparing a bit adder to a 64 bit full adder
           | integrated into a modern CPU clock cycle is _really_ silly :)
           | 
           | Glad to hear my intuition about modern CMOS was in the
           | ballpark though. I've been out of this scene for the better
           | part of a decade.
        
             | mlyle wrote:
             | Ring oscillators and bit adders and 30 gate benchmark
             | circuits are how we characterize processes. We don't try
             | and fabricate a superscalar CPU on our first round trips
             | through developing a new process let alone new transistor
             | types.
             | 
             | This is doing quite well for early rounds on this
             | technique.
        
               | jjoonathan wrote:
               | Are any of those figures public?
               | 
               | I did a cursory search but couldn't find them so I went
               | with slightly silly clickbait on the rationale that it
               | was an order of magnitude less silly than the other
               | comparisons being made in the thread. I stand by that: as
               | "benchmarks," a bit adder is much closer to a PLL divider
               | than a 64 bit full adder in a CPU cycle.
               | 
               | I fully agree that there's another order of magnitude
               | before the comparison starts to become apples-to-apples,
               | or bit-adders to bit-adders, but you're going to have to
               | help with the search if you want to see it happen.
        
               | mlyle wrote:
               | What figures, exactly?
               | 
               | Here, we're discussing a paper characterizing the current
               | version of the process first with ring oscillators and
               | measured delays through a single cell, and then with a
               | serial bit adder, so.. if you mean those, yes?
        
               | jjoonathan wrote:
               | I mean can you find the bit adder figures for, say, TSMC
               | N7 or N5? I poked around google scholar and wikipedia for
               | a few minutes before giving up and hoping that someone up
               | with the times would know exactly where to look.
        
               | mlyle wrote:
               | No. Those are subject to NDA, but are broadly similar in
               | fmax and prop time under reasonable voltages.
               | 
               | I think it's exciting here that they're dicking around
               | with new technologies in their infancy, and in a few
               | coarse design changes of cells went from 60% of CMOS
               | leading edge to about the same numbers.
               | 
               | Of course, total system power, etc, is atrocious,
               | considering that this is cryogenic, etc. You need a lot
               | of logic before this could come ahead in efficiency.
        
               | jjoonathan wrote:
               | > No. Those are subject to NDA
               | 
               | Yeah, that's what I thought and why I settled for a less
               | accurate comparison.
               | 
               | > I think it's exciting
               | 
               | I think it's exciting, too. The extent of my claims is
               | "no, this does not mean 100GHz LN2 CPUs in a few years."
               | I did not intend these remarks to be meaningful to chip
               | designers, only to CPU consumers who see "GHz" and think
               | "CPU clock speeds."
               | 
               | The leverage of having a slightly better fundamental
               | device is extraordinary. We should spare no expense
               | looking for them and leave no stone unturned.
        
         | jandrese wrote:
         | Signal propagation time seems like it should be an issue at
         | 100Ghz. Light only travels about 3mm per clock cycle at that
         | speed. A circuit built at that speed will need absurdly deep
         | pipelines to do much more than single bit adding.
        
           | fctorial wrote:
           | Hire minecraft computer builders. They have a lot of
           | experience in that.
        
       | marcodiego wrote:
       | When I read about quantum computing the impression I get is that
       | quantum computers are more similar to analog computers than to
       | classical computers. That is, there are a problems where they are
       | better/faster than classical computer and problems where
       | classical are better/faster than quantum computers.
       | 
       | Having a "100-GHz Single-Flux-Quantum Bit-Serial Adder" means we
       | can finally have hope for classical computations at hundreds of
       | GigaHertz?
        
         | Robotbeat wrote:
         | To be clear, this is just a classical computer component but
         | leveraging a quantum effect (Josephson junction, which can be
         | sensitive to a single quantum of magnetic flux) for its
         | operation.
         | 
         | Quantum computers are very different in that they maintain a
         | non-classical state throughout many components, allowing
         | certain kinds of matrix calculations to be solved significantly
         | faster than is possible for any classical machine, analog or
         | digital. (Although quantum solutions have a stochastic nature
         | and there are other caveats to quantumness that potentially
         | stand in the way of actually exceeding classical computers).
        
         | packetlost wrote:
         | > quantum computers are more similar to analog computers
         | 
         | I can't speak for all architectures, but the ones I'm aware of
         | absolutely are analog computers.
        
           | Robotbeat wrote:
           | Quantum computers use qubits, which doesn't really have an
           | analogue in analog computers.
        
             | packetlost wrote:
             | Not directly, but the problems and engineering that go into
             | building one are very similar. Qubits _are_ analogous to
             | regular bits, at least on  'measure'. They're either in 1
             | state or another, and all the interesting stuff is in how
             | you get to that state.
             | 
             | Source: building a quantum computer.
        
               | Robotbeat wrote:
               | Yes, that's my point! The person I'm responding to is
               | saying they're like _analog_ computers, which don't use
               | bits. Quantum computers are a bit more like digital
               | computers.
        
               | packetlost wrote:
               | I am the person who said that :) I guess what I really
               | meant was I agree with your original statement.
               | 
               | They're closer to analog computers than digital, but
               | really they're neither and share some similarities of
               | both. In the sense that the qubits usually only have 2
               | measurable states in computation, they're similar to
               | digital computers, but that only goes for when you
               | measure. On the other side, the 'programs' going in are
               | not really digital information (though they can be
               | represented as such), and because qubits are not only in
               | one of 2 states (unlike bits) the way in which you
               | interact/program them is definitely not digital, it is
               | analog.
               | 
               | If you're interested in learning about writing quantum
               | programs, I recommend checking out IBM's Qiskit
               | toolkit[0]. I found the tutorials to be helpful for
               | helping grasp the fundamentals of quantum programming.
               | 
               | [0]:
               | https://qiskit.org/documentation/getting_started.html
        
               | M277 wrote:
               | Off-topic, very sorry:
               | 
               | Are there any resources you particularly recommend if I
               | wanted to learn more about quantum computing and how
               | quantum computers work on the hardware level?
        
               | IlliOnato wrote:
               | Perhaps not exactly what you ask for, i.e. not that much
               | on hardware/engineering, rather on physics and
               | mathematics the quantum computing is built on:
               | https://www.amazon.com/Quantum-Computing-since-
               | Democritus-Aa...
        
               | M277 wrote:
               | It does look like an interesting read. Thank you :)
        
       | thatcherc wrote:
       | To be clear, despite the appearance of 'quantum' in the title,
       | this is a development toward faster classical computers by using
       | superconducting circuit elements (Josephson junctions). A
       | computer using this type of logic would need to be cryo-cooled
       | down to ~single-digit Kelvin temperatures, but as discussed in a
       | thread a few weeks ago[0], a superconducting computer can still
       | beat a conventional computer in terms of ops/Watt, even when the
       | power consumption of the cooling equipment is included.
       | 
       | [0] - https://news.ycombinator.com/item?id=25765128
        
         | mdip wrote:
         | Thanks for summing that up; based on the title, alone, I was
         | considering clicking because I so rarely encounter a sentence
         | such as this: containing so many words I know/use regularly yet
         | when put together in that precise manner, appeared to be as to
         | have been assembled by a garbage disposal.
        
         | blacksmith_tb wrote:
         | Free cooling to those temperatures is available in orbit...
         | well, free once you get there...
        
           | deepsun wrote:
           | Nope, satellites on orbit have thermal protection from the
           | sun's radiation. It's also harder to dump excessive
           | temperature without matter around. IIRC space suits expel
           | some finite amount of gas to get rid of extra heat.
        
         | contravariant wrote:
         | Theoretically keeping something at the same temperature is at
         | most going to double the energy requirements, though there's
         | likely to be some additional practical issues with maintaining
         | such a low temperature.
        
           | amluto wrote:
           | Not even close. A perfect refrigerator will use vastly more
           | than 1J to remove 1J from a 4.2K device. Look up Carnot's
           | theorem.
        
             | a1369209993 wrote:
             | > A perfect refrigerator will use vastly more than 1J to
             | remove 1J from a 4.2K device.
             | 
             | While your actual point is true, your example is too
             | loosely specified - the energy used by a perfect (or even
             | imperfect) refrigerator depends on the temperature of the
             | environment it's dumping the heat into. In the extreme (and
             | also engineering-wise preferred) case, with a heat sink at
             | cosmic microwave background temperatures of ~2.7K, a
             | _perfect_ refrigerator (rather, heat exchanger) would
             | actually _gain_ energy. (Of course, since the literally-
             | glowing-hot sun takes up some portion of the sky, actually
             | _getting_ a  <4.2K environment would likely require siting
             | your computer in the Oort Cloud, if not outside the galaxy
             | entirely, hence why your actual point is true.)
        
             | contravariant wrote:
             | By Carnot's theorem an ideal engine will convert convert 1J
             | heat into less than 1J of work and this process is
             | invertible.
             | 
             | So as a corollary you can transfer 1J of heat using less
             | than 1J of work. Since the temperature difference is so
             | large the theoretical limit is really close to 1J of work
             | though.
        
               | amluto wrote:
               | Something went wrong in your calculation.
               | 
               | An ideal heat engine operating between reservoirs at T_h
               | and T_c has efficiency 1 - T_c/T_h. This means, when
               | operating reversibly, it will send Q_c heat to the cold
               | side, remove Q_h from the hot side, and do W work, where
               | Q_c = Q_h - W and W = (1 - T_c/T_h)Q_h. If T_c = T_h/10,
               | then 1 - T_c/T_h = 90% (very efficient!) and Q_c =
               | 0.1*Q_h (not much waste heat!). Now turn this around,
               | because it's reversible. A heat pump removes Q_c from the
               | cold side and exhausts Q_h to the hot side. If you remove
               | 1J from the cold side, you exhaust 10J to the hot side,
               | and that 9J difference is the work done. That is, it
               | costs 9J to remove 1J of heat from a 27.3K refrigerator
               | if the exhaust is at 273K. It's worse at 4.2K.
               | 
               | (You can do an equivalent calculation a little more
               | tidily by considering the entropy change of each side. To
               | remove a given amount of entropy from the cold side, you
               | must add at least as much entropy to the hot side. Plug
               | in the usual formula for isothermal entropy change, and
               | you get the answer.)
        
         | KirillPanov wrote:
         | > superconducting computer can still beat a conventional
         | computer in terms of ops/Watt, even when the power consumption
         | of the cooling equipment is included
         | 
         | But likely not when the energy cost of manufacturing the
         | cryocooler (which has a limited working life) is included.
         | 
         | If you count dollars instead of joules the answer is a definite
         | "no". The manufacturing cost per joule of the cryocooler,
         | amortized over its usable life, far exceeds the value of the
         | energy saved.
        
         | HPsquared wrote:
         | Very cool!
        
           | wiz21c wrote:
           | Cool as cold ?
        
             | [deleted]
        
       | ajb wrote:
       | There have been attempts to make these SFQ circuits work for
       | ages, eg see this from 1999:
       | https://www.researchgate.net/publication/3310945_Rapid_Singl...
       | 
       | What I want to know is, what were the problems that prevented
       | them succeeding before, and have they been overcome now?
        
         | tyingq wrote:
         | Dated (2005), but this report seems comprehensive:
         | https://www.nitrd.gov/pubs/nsa/sta.pdf
        
           | rbanffy wrote:
           | I think I've read about Josephson junction superconducting
           | transistors in the mid-80's. And it wasn't in a cutting edge
           | journal.
        
       | ThePhysicist wrote:
       | I designed such superconducting chips for my MSc thesis and had
       | them manufactured by Hypres, one of the few foundries that
       | offered such a process. Designing superconducting circuits is
       | great fun, I even wrote my own circuit simulator
       | (https://github.com/adewes/superconductor/).
       | 
       | RSFQ (Rapid Single Flux Quantum Logic, or as some would jokingly
       | say Russian Single Flux Quantum Logic) was quite hyped up in the
       | nineties, Prof. Likharev at Stony Brook had a collaboration with
       | IBM working on RSFQ circuit elements to replace conventional
       | semiconductor logic. At the time the achievable speed was
       | fantastic as compared to regular circuits, (un)fortunately
       | semiconductor processes kept evolving and today RSFQ is only
       | interesting for some niche applications like fast microwave
       | circuits at cryogenic temperatures (and even there HEMT
       | transistors are often a better solution nowadays).
       | 
       | Also, getting circuits with more than 10.000 junctions to work
       | was quite tricky as the fabrication processes weren't very
       | reliable and transferring flux quanta is a bit more noisy than
       | storing charges on an FET, so I'm doubtful whether we could even
       | have large-scale RSFQ circuits without extensive error
       | correction.
       | 
       | Well, it's still an amazingly fun and fascinating field, really
       | hope we might see a revival of it one day (maybe if we get room-
       | temperature superconductors).
        
       ___________________________________________________________________
       (page generated 2021-02-08 23:01 UTC)