[HN Gopher] Pi 5 overclocking: Silicon Lottery
       ___________________________________________________________________
        
       Pi 5 overclocking: Silicon Lottery
        
       Author : goranmoomin
       Score  : 83 points
       Date   : 2024-03-13 16:02 UTC (6 hours ago)
        
 (HTM) web link (www.jeffgeerling.com)
 (TXT) w3m dump (www.jeffgeerling.com)
        
       | moffkalast wrote:
       | Speaking of Geerling testing Pi 5 things, is it just me or is it
       | super weird that the uPCity PCIe breakout board hasn't been put
       | on sale yet? I think it's almost half a year since he got it
       | working in a pre-release state.
        
         | tuetuopay wrote:
         | I recently went on the lookout for this board and it is still
         | unavailable. All I could find were M.2 adapters. I guess the
         | market for full-size PCIe is tiny compared to the M.2 one.
         | 
         | Anyways, I ended up making my own and it works great. No fuss,
         | no wait, no complications.
        
           | moffkalast wrote:
           | Ah you mean going the Pi Port -> M.2 hat -> M.2 to PCIe riser
           | route? I suppose that should work. Any bandwidth issues?
        
             | tuetuopay wrote:
             | No, I literally made my own hand-soldered Pi Port -> PCIe
             | adapter. That's to plug a 2x25Gbps programmable NIC, so no
             | bandwidth needed because the NIC does all the work. All I
             | needed was something to power up the nic :D
             | 
             | As for bandwidth, well, it's one lane of PCIe gen 2. This
             | won't win any races but can be useful to access exotic
             | hardware not available in usb or if you don't care about
             | bandwidth. (e.g. HBA with many drives for mass storage
             | without speed requirement).
        
               | stu2010 wrote:
               | How much power is the Pi port capable of delivering, or
               | are you sending additional power to the PCIe adapter from
               | somewhere else?
               | 
               | What SmartNIC are you using? Most SmartNICs that I'm
               | aware of suck a decent amount of power, many more require
               | significant external airflow. Are you using the Mikrotik
               | active cooled one?
               | https://mikrotik.com/product/ccr2004_1g_2xs_pcie
        
               | tuetuopay wrote:
               | The Pi port can deliver 5 or 10W max at 5V IIRC. So I'm
               | not using it :D
               | 
               | The 12V comes from an external power supply through a
               | barrel jack, from which I also derive the 3.3V rail. The
               | Pi provides no power whatsoever. I should publish the
               | design files somewhere.
               | 
               | As for the NIC, it's a Netronome Agilio-CX which is fully
               | programmable using eBPF and such.
        
         | geerlingguy wrote:
         | I keep pushing Pineberry Pi to release that--apparently they're
         | still working on it--and 52Pi pre-announced something too...
         | 
         | But so far there's no straight PCIe expansion board available
         | to purchase yet. It is a slimmer market than 'NVMe on top' or
         | other more standard use cases, but it's one I think could
         | expand as people do weird things with the Pi 5.
        
       | TehCorwiz wrote:
       | My Pi 5 is still back ordered. :(
        
         | natebc wrote:
         | I ordered one from pishop.us just last Friday that arrived via
         | USPS on Monday morning. Might look around some and see if you
         | can just order from a different vendor.
        
           | techwiz137 wrote:
           | For the time it took them to develop it, the performance is
           | really lackluster. I would've definitely wanted an OrangePi 5
           | equivalent performance, but with the mature software of the
           | RPI.
        
           | TehCorwiz wrote:
           | Yeah, I ordered from Sparkfun. I've had good experiences with
           | them in the past, but it's a little frustrating seeing it
           | available elsewhere. I ended up reaching out about it.
        
         | teamonkey wrote:
         | https://rpilocator.com/?cat=PI5
        
         | geerlingguy wrote:
         | They've been on the shelf (well, behind the counter) at Micro
         | Center for a month or so now, never seen them out of stock. It
         | seems like all the models are available at one or two retailers
         | at minimum now, via rpilocator.com.
         | 
         | Are you in the US, Canada, or EU? Outside of those places,
         | there may still be some delays in getting stock to meet demand.
        
         | duffyjp wrote:
         | Oof, $80 is encroaching on Aliexpress N100 Mini PCs and used
         | "Tiny/Mini/Micro" territory.
         | 
         | I have the original and updated Pi Zero W-- unbelievable
         | bargains at ~$15 but if I needed any horsepower I think I'd
         | rather have an x86_64 so I can run whatever.
        
           | bobim wrote:
           | Idle consumption made me lean toward a rk3588 vs a n100. Half
           | the single thread performance though. Supported by dietpi, so
           | no issue with the os.
        
       | MuffinFlavored wrote:
       | > The result of that 3.0 GHz overclock? A marginally-improved
       | Geekbench 6 score of 1662, versus 1507 with no OC. To achieve
       | that 10% speedup, it ate up about 20% more power, so efficiency-
       | wise, it's not worth it.
       | 
       | What effect would running an overclock like this permanently have
       | on longevity? Is it even worth thinking about "longevity" of
       | chips?
       | 
       | Obviously stability suffers but for example... how much? Author
       | was able to get a Geekbench 6 benchmark to pass. If they tried
       | 100 times, would it be expected that a non-zero amount would
       | fail?
        
         | techwiz137 wrote:
         | Of course! These are like BGA or something similar, the solder
         | will crack after some number of on/off/cool/heat cycles.
         | 
         | If voltage is raised we enter the realm of electromigration,
         | though not sure how relevant it is for such a minuscule OC.
         | 
         | As for stability, yes. If the voltage is not sufficient there
         | will be stability issues which will require further raising it,
         | thus raising temps and requiring more power which you can't be
         | sure if you could deliver. And then of course electromigration.
        
           | 0x457 wrote:
           | > Of course! These are like BGA or something similar, the
           | solder will crack after some number of on/off/cool/heat
           | cycles.
           | 
           | This was mostly happening during the transition to lead-free
           | solder. Today, component should fail earlier than BGA.
        
         | sweetjuly wrote:
         | Unfortunately, most of the useful data you want about wearout
         | is locked behind tech NDAs so nobody will be able to offer
         | specifics here.
         | 
         | But in general, the way you tend to end up with wearout is
         | through electron migration ("electron wind") which is damage to
         | interconnects from electrons slamming into metal atoms over and
         | over and slowly ripping apart the wires. Modeling electron
         | migration correctly is really hard, but (from memory) a general
         | relationship is that voltage linearly increases failure rate
         | and temperature exponentially increases it. The constants for
         | these models are determined empirically and, of course, are
         | NDA'd.
         | 
         | In general, I wouldn't worry about it. The MTTF for
         | semiconductors is already very high even under awful
         | temperatures 100+ C, and so as long as you cool it properly
         | you'll be fine.
         | 
         | > would it be expected that a non-zero amount would fail?
         | 
         | The failures you describe here are going to be due to setup
         | time violations. These issues shouldn't be transient (assuming
         | identical temperature and voltage) since the performance
         | characteristics of an individual device don't really change
         | over time. Of course, the issues can seem transient as the
         | failures may not actually always cause noticeable corruption
         | (maybe you generate a wrong FPU result but that specific path
         | is only exercised under rare uarch conditions).
         | 
         | So, it's a non-answer, but: yes and no. Maybe your chip is
         | perfectly okay and never has any violations at the parameters
         | you selected. Maybe it does. Neither you (nor in fact the
         | manufacturer, though they do have a better chance since they
         | know the process/design) can ever really be sure--all that's
         | left is empirical burn in testing and hoping for the best :)
        
           | magicalhippo wrote:
           | I can't find the references right now, but I recall reading
           | that longevity of microcontrollers and similar could also be
           | somewhat accurately modeled by the Arrhenius equation[1],
           | meaning a 10C increase in operating temperature would result
           | in roughly half the expected lifetime.
           | 
           | [1]: https://en.wikipedia.org/wiki/Arrhenius_equation
        
             | sweetjuly wrote:
             | I believe you might be thinking of Black's equation? [1].
             | It's one equation which attempts to model the failure rate
             | due to electron migration. It isn't a physical model but it
             | seems to, with the right constants, fit reasonably well.
             | The 10C=>halving life time is going to depend on the
             | constants though.
             | 
             | [1] https://en.wikipedia.org/wiki/Black's_equation
        
         | mafuyu wrote:
         | From what I remember from school, the extremely rough rule of
         | thumb, specifically for thermal effects on the silicon itself,
         | is that +10C will halve the lifespan of the chip. When you try
         | to push a chip with an OC, the power/perf gets highly
         | nonlinear, so you end up making tradeoffs here. Chip vendors
         | like Intel and AMD do a lot of testing and validation to pick
         | power curves that will meet the warranty specs of the chip, but
         | they do have some wiggle room.
         | 
         | There's a whole bunch of other failure modes that aren't
         | captured by the 10C rule. It's more for estimating chip failure
         | due to things like electromigration. You can observe this if
         | you run a desktop CPU overclocked for many years. I had a 2600k
         | that I had to keep bumping the OC down on, and jt eventually
         | bit the dust after a decade.
        
           | rodgerd wrote:
           | I once attended a talk by one of the people at Weta Digital
           | about how they ran their datacentres; they worked out that
           | they could save six figure sums per month by running their
           | aircon lower and blades hotter; HP were prepared to keep the
           | blades in warranty for a three degree bump but no more.
        
         | latchkey wrote:
         | My experience with running GPUs is that overclocking tends to
         | go with undervolting and it has zero impact on longevity of the
         | chips themselves. Other components like power supplies, with
         | consumable or hand made things, like hand soldered components,
         | are what end up failing.
         | 
         | We had cards in the worst of the worst environments and they
         | ran fine for years on end.
        
           | bayindirh wrote:
           | I'd not be so sure, actually. Because we have seen other
           | processors on the systems, like RAID or Ethernet cards go
           | "insane" after some years. No overheat, no physical stress,
           | nothing. Normal if a bit too much (HPC) work.
           | 
           | Reboot the system, device just disappears, never to be never
           | seen again. It generally starts after ~6 year mark.
           | 
           | Sometimes device starts to corrupt things silently, but not
           | always. However they too disappear after some time.
           | 
           | Oh, sometimes GPUs do that, too.
        
       | kloch wrote:
       | I remember overclocking the 486DX2-66 in the early 90's. I got
       | the idea after reading my brother's Intel data book and noticed
       | that while the max clock speed was speced at 66 MHz, _all_ of the
       | timing diagrams implied it could run to 80. I borrowed a variable
       | speed clock generator and sure enough it was stable at 80, and
       | started to crash at around 82MHz.
       | 
       | When I started to help friends overclock theirs, I quickly
       | realized the "silicon lottery" variance. Some would only run
       | reliably at 78 or 76 MHz. I bought a bunch of fixed frequency
       | clock generators (that were drop-in replacements for the original
       | on the motherboard) in 2MHz increments due to the variance.
       | 
       | This was back before CPU's had heat sinks or fans, so we quickly
       | figured out that adding those gave better margins. We even made
       | some 10-LED bar temperature display that had a thermocouple glued
       | to the CPU case and indicated 10 degree C increments
       | (green=0-60c, yellow=70-80c, red=90-100c).
        
         | winslow wrote:
         | That sounds like a lot of fun. Do you happen to have any photos
         | of you tinkering especially with the 10-LED bar temp displays?
        
         | gorkish wrote:
         | I remember overclocking my calculator (TI-85, 1992, Z80 CPU)
         | ... its LO was a 2.7K/22pF RC oscillator which gave it an
         | approximately 2.5MHz clock. To get this type of oscillator to
         | speed up you'd normally lower the capacitance a bit.
         | 
         | The reason that this story is interesting is that in most cases
         | you could just yank C9 entirely and with nothing more than a
         | resistor between the clock pins, you'd get a roughly 300%
         | performance increase. I guess the parasitic capacitance was
         | enough to still oscillate a bit although mostly it would have
         | been random. Looking back, this was basically a CPU being
         | clocked with 50mhz noise and still running happily! Amazing!
        
           | lloydatkinson wrote:
           | Not quite the same but I once was bored enough to keep trying
           | to see how low power a solar powered calculator could work
           | with.
           | 
           | I held my hand over the solar panel at various lengths until
           | the screen cut out and while doing this I just kept hitting
           | random keys and while lifting my hand up.
           | 
           | One day when I did this I must have hit the one in a million
           | chance. It started _rapidly_ counting up by itself!
           | 
           | I think I only got it to happen once more. I suspect the
           | fluctuating voltage and it trying to do calculations while I
           | was pressing keys was just enough to get some gates latched
           | into the wrong state, somehow.
        
           | bobim wrote:
           | Good memories... I ended up adding a switch under the cells
           | cover because the mod was just draining these way faster. But
           | curve plotting was finally snappy.
        
         | epakai wrote:
         | I have a ST486-DX2-66GS (1998), and I found it is unstable at
         | the stock 66MHz. I actually have to run it at 80MHz to prevent
         | random freezes.
        
         | hinkley wrote:
         | That was before binning really got to be a business model. Of
         | course once a production line was stable, they could generate
         | more high end chips than they actually needed, and so the chip
         | you bought from bin 3 might actually be a bin 2 chip. It always
         | seemed like AMD was really conservative that way, which is why
         | hobbyists loved them.
         | 
         | I have a recollection of a guy who got a 486 DX-33 up to 133
         | MHz by putting the entire computer in mineral oil and floating
         | chunks of dry ice in it. Watch out for asphyxiation.
        
       | smellf wrote:
       | > RK3588-based SBC
       | 
       | Anyone know which SBCs use this chip?
        
         | geerlingguy wrote:
         | Radxa Rock 5 model B, Turing Pi RK1, Orange Pi 5 (and Plus);
         | there are a few others but those are the models I have
         | purchased and tested. All are more efficient/faster... but also
         | more expensive and less supported. Though RK3599 and 3588 SoCs
         | have both been some of the most widely supported out of
         | Rockchip for Linux applications. They still lack compared to
         | Pi's support though.
        
           | mort96 wrote:
           | The NanoPi R6C/R6S as well.
           | 
           | The rk3588 is a nice chip, but support just isn't there yet
           | if you want to do anything with the GPU. The "Panthor" GPU
           | driver, which is the FOSS driver which supports its GPU, was
           | just merged in to Linux and mesa this month[1] (yay!) which
           | means you're probably gonna have to build your own kernel if
           | you want it.
           | 
           | The old mali proprietary driver is borderline unusable on
           | anything remotely modern, only really working on Linux 5.10
           | and special X11 builds with legacy features re-enabled.
           | 
           | It's crazy that the rk3588 has been on the market for many
           | years at this point and is just now starting to be usable on
           | Linux, but it's exciting that things are taking shape.
           | 
           | [1] https://www.collabora.com/news-and-blog/news-and-
           | events/rele...
        
           | lenerdenator wrote:
           | > They still lack compared to Pi's support though.
           | 
           | This should be the central lesson learned from the Raspberry
           | Pi by open-source projects.
           | 
           | There will be faster, there will be smaller, there will be
           | cheaper. But if the user can go on the web and find the
           | _exact_ thing they're looking to do spelled out, they'll buy
           | that product, every time.
        
             | wmf wrote:
             | Now imagine if RPi applied their magic to slightly newer
             | hardware so there was no need to mess around with poorly-
             | supported Allwinner/Rockchip/Mediatek boards.
        
               | sitzkrieg wrote:
               | or if they actually open sourced anything... or if you
               | could buy the broadcom chips directly, or or or
        
       | bee_rider wrote:
       | > To achieve that 10% speedup, it ate up about 20% more power, so
       | efficiency-wise, it's not worth it.
       | 
       | There's a trade off between single threaded performance and
       | power, right? I'd expect the increase in power cost to be between
       | the performance increase squared or cubed. If you expect a one-
       | to-one trade it is never worth it to increase frequency, haha.
       | 
       | The universe will give you throughput at a fair rate, but it is
       | very stingy about latency, in general.
        
       | latchkey wrote:
       | I ran 150,000 GPUs that were individually tuned for maximum
       | performance.
       | 
       | Silicon lottery, where the chip was on the wafer (edges tend to
       | be less reliable), manufacturing batches, component batches,
       | heat, cooling, power supplies, etc... the list goes on and on...
       | 
       | It is understated how much all of this is a huge impactful thing
       | on performance and stability.
        
       ___________________________________________________________________
       (page generated 2024-03-13 23:01 UTC)