[HN Gopher] Pi 5 overclocking: Silicon Lottery
___________________________________________________________________
Pi 5 overclocking: Silicon Lottery
Author : goranmoomin
Score : 83 points
Date : 2024-03-13 16:02 UTC (6 hours ago)
(HTM) web link (www.jeffgeerling.com)
(TXT) w3m dump (www.jeffgeerling.com)
| moffkalast wrote:
| Speaking of Geerling testing Pi 5 things, is it just me or is it
| super weird that the uPCity PCIe breakout board hasn't been put
| on sale yet? I think it's almost half a year since he got it
| working in a pre-release state.
| tuetuopay wrote:
| I recently went on the lookout for this board and it is still
| unavailable. All I could find were M.2 adapters. I guess the
| market for full-size PCIe is tiny compared to the M.2 one.
|
| Anyways, I ended up making my own and it works great. No fuss,
| no wait, no complications.
| moffkalast wrote:
| Ah you mean going the Pi Port -> M.2 hat -> M.2 to PCIe riser
| route? I suppose that should work. Any bandwidth issues?
| tuetuopay wrote:
| No, I literally made my own hand-soldered Pi Port -> PCIe
| adapter. That's to plug a 2x25Gbps programmable NIC, so no
| bandwidth needed because the NIC does all the work. All I
| needed was something to power up the nic :D
|
| As for bandwidth, well, it's one lane of PCIe gen 2. This
| won't win any races but can be useful to access exotic
| hardware not available in usb or if you don't care about
| bandwidth. (e.g. HBA with many drives for mass storage
| without speed requirement).
| stu2010 wrote:
| How much power is the Pi port capable of delivering, or
| are you sending additional power to the PCIe adapter from
| somewhere else?
|
| What SmartNIC are you using? Most SmartNICs that I'm
| aware of suck a decent amount of power, many more require
| significant external airflow. Are you using the Mikrotik
| active cooled one?
| https://mikrotik.com/product/ccr2004_1g_2xs_pcie
| tuetuopay wrote:
| The Pi port can deliver 5 or 10W max at 5V IIRC. So I'm
| not using it :D
|
| The 12V comes from an external power supply through a
| barrel jack, from which I also derive the 3.3V rail. The
| Pi provides no power whatsoever. I should publish the
| design files somewhere.
|
| As for the NIC, it's a Netronome Agilio-CX which is fully
| programmable using eBPF and such.
| geerlingguy wrote:
| I keep pushing Pineberry Pi to release that--apparently they're
| still working on it--and 52Pi pre-announced something too...
|
| But so far there's no straight PCIe expansion board available
| to purchase yet. It is a slimmer market than 'NVMe on top' or
| other more standard use cases, but it's one I think could
| expand as people do weird things with the Pi 5.
| TehCorwiz wrote:
| My Pi 5 is still back ordered. :(
| natebc wrote:
| I ordered one from pishop.us just last Friday that arrived via
| USPS on Monday morning. Might look around some and see if you
| can just order from a different vendor.
| techwiz137 wrote:
| For the time it took them to develop it, the performance is
| really lackluster. I would've definitely wanted an OrangePi 5
| equivalent performance, but with the mature software of the
| RPI.
| TehCorwiz wrote:
| Yeah, I ordered from Sparkfun. I've had good experiences with
| them in the past, but it's a little frustrating seeing it
| available elsewhere. I ended up reaching out about it.
| teamonkey wrote:
| https://rpilocator.com/?cat=PI5
| geerlingguy wrote:
| They've been on the shelf (well, behind the counter) at Micro
| Center for a month or so now, never seen them out of stock. It
| seems like all the models are available at one or two retailers
| at minimum now, via rpilocator.com.
|
| Are you in the US, Canada, or EU? Outside of those places,
| there may still be some delays in getting stock to meet demand.
| duffyjp wrote:
| Oof, $80 is encroaching on Aliexpress N100 Mini PCs and used
| "Tiny/Mini/Micro" territory.
|
| I have the original and updated Pi Zero W-- unbelievable
| bargains at ~$15 but if I needed any horsepower I think I'd
| rather have an x86_64 so I can run whatever.
| bobim wrote:
| Idle consumption made me lean toward a rk3588 vs a n100. Half
| the single thread performance though. Supported by dietpi, so
| no issue with the os.
| MuffinFlavored wrote:
| > The result of that 3.0 GHz overclock? A marginally-improved
| Geekbench 6 score of 1662, versus 1507 with no OC. To achieve
| that 10% speedup, it ate up about 20% more power, so efficiency-
| wise, it's not worth it.
|
| What effect would running an overclock like this permanently have
| on longevity? Is it even worth thinking about "longevity" of
| chips?
|
| Obviously stability suffers but for example... how much? Author
| was able to get a Geekbench 6 benchmark to pass. If they tried
| 100 times, would it be expected that a non-zero amount would
| fail?
| techwiz137 wrote:
| Of course! These are like BGA or something similar, the solder
| will crack after some number of on/off/cool/heat cycles.
|
| If voltage is raised we enter the realm of electromigration,
| though not sure how relevant it is for such a minuscule OC.
|
| As for stability, yes. If the voltage is not sufficient there
| will be stability issues which will require further raising it,
| thus raising temps and requiring more power which you can't be
| sure if you could deliver. And then of course electromigration.
| 0x457 wrote:
| > Of course! These are like BGA or something similar, the
| solder will crack after some number of on/off/cool/heat
| cycles.
|
| This was mostly happening during the transition to lead-free
| solder. Today, component should fail earlier than BGA.
| sweetjuly wrote:
| Unfortunately, most of the useful data you want about wearout
| is locked behind tech NDAs so nobody will be able to offer
| specifics here.
|
| But in general, the way you tend to end up with wearout is
| through electron migration ("electron wind") which is damage to
| interconnects from electrons slamming into metal atoms over and
| over and slowly ripping apart the wires. Modeling electron
| migration correctly is really hard, but (from memory) a general
| relationship is that voltage linearly increases failure rate
| and temperature exponentially increases it. The constants for
| these models are determined empirically and, of course, are
| NDA'd.
|
| In general, I wouldn't worry about it. The MTTF for
| semiconductors is already very high even under awful
| temperatures 100+ C, and so as long as you cool it properly
| you'll be fine.
|
| > would it be expected that a non-zero amount would fail?
|
| The failures you describe here are going to be due to setup
| time violations. These issues shouldn't be transient (assuming
| identical temperature and voltage) since the performance
| characteristics of an individual device don't really change
| over time. Of course, the issues can seem transient as the
| failures may not actually always cause noticeable corruption
| (maybe you generate a wrong FPU result but that specific path
| is only exercised under rare uarch conditions).
|
| So, it's a non-answer, but: yes and no. Maybe your chip is
| perfectly okay and never has any violations at the parameters
| you selected. Maybe it does. Neither you (nor in fact the
| manufacturer, though they do have a better chance since they
| know the process/design) can ever really be sure--all that's
| left is empirical burn in testing and hoping for the best :)
| magicalhippo wrote:
| I can't find the references right now, but I recall reading
| that longevity of microcontrollers and similar could also be
| somewhat accurately modeled by the Arrhenius equation[1],
| meaning a 10C increase in operating temperature would result
| in roughly half the expected lifetime.
|
| [1]: https://en.wikipedia.org/wiki/Arrhenius_equation
| sweetjuly wrote:
| I believe you might be thinking of Black's equation? [1].
| It's one equation which attempts to model the failure rate
| due to electron migration. It isn't a physical model but it
| seems to, with the right constants, fit reasonably well.
| The 10C=>halving life time is going to depend on the
| constants though.
|
| [1] https://en.wikipedia.org/wiki/Black's_equation
| mafuyu wrote:
| From what I remember from school, the extremely rough rule of
| thumb, specifically for thermal effects on the silicon itself,
| is that +10C will halve the lifespan of the chip. When you try
| to push a chip with an OC, the power/perf gets highly
| nonlinear, so you end up making tradeoffs here. Chip vendors
| like Intel and AMD do a lot of testing and validation to pick
| power curves that will meet the warranty specs of the chip, but
| they do have some wiggle room.
|
| There's a whole bunch of other failure modes that aren't
| captured by the 10C rule. It's more for estimating chip failure
| due to things like electromigration. You can observe this if
| you run a desktop CPU overclocked for many years. I had a 2600k
| that I had to keep bumping the OC down on, and jt eventually
| bit the dust after a decade.
| rodgerd wrote:
| I once attended a talk by one of the people at Weta Digital
| about how they ran their datacentres; they worked out that
| they could save six figure sums per month by running their
| aircon lower and blades hotter; HP were prepared to keep the
| blades in warranty for a three degree bump but no more.
| latchkey wrote:
| My experience with running GPUs is that overclocking tends to
| go with undervolting and it has zero impact on longevity of the
| chips themselves. Other components like power supplies, with
| consumable or hand made things, like hand soldered components,
| are what end up failing.
|
| We had cards in the worst of the worst environments and they
| ran fine for years on end.
| bayindirh wrote:
| I'd not be so sure, actually. Because we have seen other
| processors on the systems, like RAID or Ethernet cards go
| "insane" after some years. No overheat, no physical stress,
| nothing. Normal if a bit too much (HPC) work.
|
| Reboot the system, device just disappears, never to be never
| seen again. It generally starts after ~6 year mark.
|
| Sometimes device starts to corrupt things silently, but not
| always. However they too disappear after some time.
|
| Oh, sometimes GPUs do that, too.
| kloch wrote:
| I remember overclocking the 486DX2-66 in the early 90's. I got
| the idea after reading my brother's Intel data book and noticed
| that while the max clock speed was speced at 66 MHz, _all_ of the
| timing diagrams implied it could run to 80. I borrowed a variable
| speed clock generator and sure enough it was stable at 80, and
| started to crash at around 82MHz.
|
| When I started to help friends overclock theirs, I quickly
| realized the "silicon lottery" variance. Some would only run
| reliably at 78 or 76 MHz. I bought a bunch of fixed frequency
| clock generators (that were drop-in replacements for the original
| on the motherboard) in 2MHz increments due to the variance.
|
| This was back before CPU's had heat sinks or fans, so we quickly
| figured out that adding those gave better margins. We even made
| some 10-LED bar temperature display that had a thermocouple glued
| to the CPU case and indicated 10 degree C increments
| (green=0-60c, yellow=70-80c, red=90-100c).
| winslow wrote:
| That sounds like a lot of fun. Do you happen to have any photos
| of you tinkering especially with the 10-LED bar temp displays?
| gorkish wrote:
| I remember overclocking my calculator (TI-85, 1992, Z80 CPU)
| ... its LO was a 2.7K/22pF RC oscillator which gave it an
| approximately 2.5MHz clock. To get this type of oscillator to
| speed up you'd normally lower the capacitance a bit.
|
| The reason that this story is interesting is that in most cases
| you could just yank C9 entirely and with nothing more than a
| resistor between the clock pins, you'd get a roughly 300%
| performance increase. I guess the parasitic capacitance was
| enough to still oscillate a bit although mostly it would have
| been random. Looking back, this was basically a CPU being
| clocked with 50mhz noise and still running happily! Amazing!
| lloydatkinson wrote:
| Not quite the same but I once was bored enough to keep trying
| to see how low power a solar powered calculator could work
| with.
|
| I held my hand over the solar panel at various lengths until
| the screen cut out and while doing this I just kept hitting
| random keys and while lifting my hand up.
|
| One day when I did this I must have hit the one in a million
| chance. It started _rapidly_ counting up by itself!
|
| I think I only got it to happen once more. I suspect the
| fluctuating voltage and it trying to do calculations while I
| was pressing keys was just enough to get some gates latched
| into the wrong state, somehow.
| bobim wrote:
| Good memories... I ended up adding a switch under the cells
| cover because the mod was just draining these way faster. But
| curve plotting was finally snappy.
| epakai wrote:
| I have a ST486-DX2-66GS (1998), and I found it is unstable at
| the stock 66MHz. I actually have to run it at 80MHz to prevent
| random freezes.
| hinkley wrote:
| That was before binning really got to be a business model. Of
| course once a production line was stable, they could generate
| more high end chips than they actually needed, and so the chip
| you bought from bin 3 might actually be a bin 2 chip. It always
| seemed like AMD was really conservative that way, which is why
| hobbyists loved them.
|
| I have a recollection of a guy who got a 486 DX-33 up to 133
| MHz by putting the entire computer in mineral oil and floating
| chunks of dry ice in it. Watch out for asphyxiation.
| smellf wrote:
| > RK3588-based SBC
|
| Anyone know which SBCs use this chip?
| geerlingguy wrote:
| Radxa Rock 5 model B, Turing Pi RK1, Orange Pi 5 (and Plus);
| there are a few others but those are the models I have
| purchased and tested. All are more efficient/faster... but also
| more expensive and less supported. Though RK3599 and 3588 SoCs
| have both been some of the most widely supported out of
| Rockchip for Linux applications. They still lack compared to
| Pi's support though.
| mort96 wrote:
| The NanoPi R6C/R6S as well.
|
| The rk3588 is a nice chip, but support just isn't there yet
| if you want to do anything with the GPU. The "Panthor" GPU
| driver, which is the FOSS driver which supports its GPU, was
| just merged in to Linux and mesa this month[1] (yay!) which
| means you're probably gonna have to build your own kernel if
| you want it.
|
| The old mali proprietary driver is borderline unusable on
| anything remotely modern, only really working on Linux 5.10
| and special X11 builds with legacy features re-enabled.
|
| It's crazy that the rk3588 has been on the market for many
| years at this point and is just now starting to be usable on
| Linux, but it's exciting that things are taking shape.
|
| [1] https://www.collabora.com/news-and-blog/news-and-
| events/rele...
| lenerdenator wrote:
| > They still lack compared to Pi's support though.
|
| This should be the central lesson learned from the Raspberry
| Pi by open-source projects.
|
| There will be faster, there will be smaller, there will be
| cheaper. But if the user can go on the web and find the
| _exact_ thing they're looking to do spelled out, they'll buy
| that product, every time.
| wmf wrote:
| Now imagine if RPi applied their magic to slightly newer
| hardware so there was no need to mess around with poorly-
| supported Allwinner/Rockchip/Mediatek boards.
| sitzkrieg wrote:
| or if they actually open sourced anything... or if you
| could buy the broadcom chips directly, or or or
| bee_rider wrote:
| > To achieve that 10% speedup, it ate up about 20% more power, so
| efficiency-wise, it's not worth it.
|
| There's a trade off between single threaded performance and
| power, right? I'd expect the increase in power cost to be between
| the performance increase squared or cubed. If you expect a one-
| to-one trade it is never worth it to increase frequency, haha.
|
| The universe will give you throughput at a fair rate, but it is
| very stingy about latency, in general.
| latchkey wrote:
| I ran 150,000 GPUs that were individually tuned for maximum
| performance.
|
| Silicon lottery, where the chip was on the wafer (edges tend to
| be less reliable), manufacturing batches, component batches,
| heat, cooling, power supplies, etc... the list goes on and on...
|
| It is understated how much all of this is a huge impactful thing
| on performance and stability.
___________________________________________________________________
(page generated 2024-03-13 23:01 UTC)