[HN Gopher] Cores That Don't Count [pdf] (2021)
       ___________________________________________________________________
        
       Cores That Don't Count [pdf] (2021)
        
       Author : signa11
       Score  : 101 points
       Date   : 2024-09-29 00:43 UTC (22 hours ago)
        
 (HTM) web link (sigops.org)
 (TXT) w3m dump (sigops.org)
        
       | bla3 wrote:
       | [2021]
        
       | mofosyne wrote:
       | This is about unstable cores that randomly output incorrect
       | calculation and ways to mitigate it via better hardware testing
       | and duplicating parts of the core that can fail often.
       | 
       | I did however thought initially from the title that it's about
       | 1-bit CPUs like the MC14500B Industrial Control Unit (ICU) which
       | is a CMOS one-bit microprocessor designed by Motorola for simple
       | control applications in 1977. It completely lacks an ALU so
       | essentially cannot count, but is designed for PLCs.
        
         | winwang wrote:
         | Hey. It could count to 1, which is something.
        
       | freeqaz wrote:
       | Unrelated to the topic being discussed, but my mind immediately
       | went to "per core pricing" which is common for databases. Some
       | SQL servers would be charged for by the number of CPU cores in a
       | system, and manufacturers would often offer an SKU with fewer,
       | faster cores to compensate for this.
       | 
       | Taking that thought and thinking about adding "silent" cores is
       | interesting to me. What if your CPU core is actually backed by
       | multiple cores instead to get the "fastest" speed possible? For
       | example imagine if you had say 2 CPU cores that appeared as one
       | and each core would guess the opposite branch of the other
       | (branch prediction) so that it was "right" more of the time.
       | 
       | An interesting thought that had never occurred to me. It's
       | horribly inefficient but for constrained cases where peak
       | performance is all that matters, I wonder if this style of
       | thought would help. ("Competitive Code Execution"?)
        
         | buildbot wrote:
         | People have thought about it, but it's so incredibly wasteful
         | that it's impractical. At 20% branching, you rapidly run out of
         | resources pending the winning branch and spend possibly 8 cores
         | just to predict three branches ahead, or roughly 15
         | instructions. That's pretty rough!
        
           | hinkley wrote:
           | I wonder if you could put more logic units per core and load
           | balance to prevent thermal throttling, or if you'd make the
           | communication pathways slower at a rate that exceeds the
           | gains.
        
             | buildbot wrote:
             | Yep, you can do that, and yep, it gets slower.
             | 
             | That's basically the tradeoff Apple made with their M
             | series chips vs AMD/Intel which until recently have been
             | chasing fast and narrow designs. Apple in contrast, has a
             | crazy "wide" core aka it can issue and retire many more
             | instructions per clock than basically any other mainstream
             | CPU.
        
           | eep_social wrote:
           | In distributed computing, a few layers of abstraction up, an
           | analogous technique of sending two identical RPCs to distinct
           | backends can be used to reduce tail latency.
        
         | jcul wrote:
         | Not exactly the same thing, but I remember talking with a co-
         | worker before about strategies to use a core and a
         | hyperthreaded sibling core on the same work load, to get speed
         | up.
         | 
         | However, in practice I think it would be really difficult to
         | prevent them just trashing each others cache / using resources.
        
           | lallysingh wrote:
           | Yeah your options are to spin on a few lines of cache (e.g.
           | an iterated function or processing a ring buffer) or
           | streaming cache ops
        
         | userbinator wrote:
         | _For example imagine if you had say 2 CPU cores that appeared
         | as one and each core would guess the opposite branch of the
         | other (branch prediction) so that it was "right" more of the
         | time._
         | 
         | I belive some CPUs do speculate down both paths of branches if
         | the branch predictor was really uncertain which one to take.
        
       | cryptonector wrote:
       | https://hn.algolia.com/?query=Cores%20That%20Don%27t%20Count...
        
         | dredmorbius wrote:
         | From which there's one significant prior discussion, 3 years
         | ago, 72 comments:
         | 
         | <https://news.ycombinator.com/item?id=27378624>
        
       ___________________________________________________________________
       (page generated 2024-09-29 23:02 UTC)