[HN Gopher] PCIe trouble with 4TB Crucial T500 NVMe SSD for >1 p...
___________________________________________________________________
PCIe trouble with 4TB Crucial T500 NVMe SSD for >1 power cycle on
MSI PRO X670-P
Author : transpute
Score : 159 points
Date : 2024-12-28 03:04 UTC (1 days ago)
(HTM) web link (forum.level1techs.com)
(TXT) w3m dump (forum.level1techs.com)
| tfwnopmt wrote:
| HDMI provides power - that's how old chromecasts can work without
| a separate power plug.
|
| The comment about NPNs and PNPs is garbage, but there is a design
| fault with the board - it shouldn't allow HDMI power to flow
| backwards into the motherboard when the motherboard shuts off.
| That would likely cause a power rail sequencing issue on the
| board or SSD, leading to latch-up of various ICs, and non-
| detection on the SSD on the flowing bootup
| LeifCarrotson wrote:
| And by "the board" I trust you mean the MSI PRO X670-P WIFI
| motherboard.
|
| There's nothing incorrect about the behavior of the SSD when
| it's being operated outside the prescribed voltage and power
| thresholds.
|
| If there's a trickle (and to be clear, the 5V at 300 mA
| available from an HDMI cable is a trickle for a full
| motherboard) of current into the 3V3 bus on the ATX connector,
| _something_ will be the very lowest PMIC to turn on. It 's just
| that on this system, the SSD was the first thing. If anything,
| the SSD will probably be highly tolerant of brownouts because
| its LDO will run at around 1.9V.
| hulitu wrote:
| > There's nothing incorrect about the behavior of the SSD
| when it's being operated outside the prescribed voltage and
| power thresholds.
|
| It shall set itself in Reset state.
| LeifCarrotson wrote:
| That would be nice, in practice, the SSD requires its power
| rails to start up in a particular sequence and with very
| particular voltages.
| shadowpho wrote:
| Only few devices are actually able to do that. Vast
| majority require require proper voltage sequencing, because
| to do otherwise is to add cost to your IC
| Dylan16807 wrote:
| > There's nothing incorrect about the behavior of the SSD
| when it's being operated outside the prescribed voltage and
| power thresholds.
|
| I'd put some more emphasis on "when", though. If it never
| comes back when power comes back that's not particularly
| correct.
| crest wrote:
| That's because if this theory is correct from the point of
| view of the SSD there was no reboot yet, because there was
| never any total power loss.
| Dylan16807 wrote:
| It handles warm reboots without power loss just fine, so
| it deciding now it needs to wait for power loss seems
| like a flaw.
| wtallis wrote:
| If the SSD reacts to the start of a brown-out with supply
| voltage dropping way below spec as a signal that an
| unplanned power loss is happening, then it may do an
| emergency flush and shutdown that leaves it simply
| waiting for power to finish dropping to zero. It makes at
| least some sense for the drive to not try to wake up from
| that state without a clean power cycle.
| Dylan16807 wrote:
| I think "makes at least some sense" and "not particularly
| correct" can be true at the same time.
| smileybarry wrote:
| It should still handle PCIe probing and (logical)
| reconnection without a reboot, though, e.g.: PCIe
| redirection for a VM.
| magic_smoke_ee wrote:
| The reality is retail PC electronics, like much consumer
| electronics with short lifespans, are designed/engineered and
| manufactured more-or-less like disposable e-waste garbage.
| Eevblog Dave or Bigclive might be able to get to the bottom of
| the circuit or manufacturing design error, albeit with some
| help if it turns out to be a digital-or-up-the-stack issue.
| KeplerBoy wrote:
| meh, I rarely have electronics fail these days. Whatever
| corners designers are cutting seem perfectly adequate to be
| cut to make stuff affordable.
| lazide wrote:
| The rise of mass produced cheap ICs with somewhat
| reasonable behavior are the cause. It's cheap to add some
| logic to something when you're making a million or more of
| them, than when it's an additional couple discrete
| components and an additional circuit you need to add
| yourself.
| gbil wrote:
| >HDMI provides power - that's how old chromecasts can work
| without a separate power plug.
|
| I still have the first Chromecast released, it doesn't operate
| without external power plugged in so I'm not sure about the
| validity of your comment, at least for the chromecast part
| bradfitz wrote:
| https://www.hdmi.org/spec21sub/cablepower
| rzzzt wrote:
| Connection is the same as attaching an ordinary, "wired"
| HDMI Cable, except that active cables can only be
| attached in one direction: One end of the cable is
| specifically labeled for attachment to the HDMI Source
| (transmitting) device, and the other end of the
| cable must be attached to the HDMI Sink (receiving)
| device. If the cable is attached in reverse, no damage will
| occur, but the connection will not work.
| HDMI Cables with HDMI Cable Power include a separate power
| connector for use with source devices that do not
| support the HDMI Cable Power feature.
|
| This is not your run-of-the-mill HDMI cable for sure.
| numpad0 wrote:
| No, not that feature. HDMI supported 5V/55mA power out for
| years. It's meant for EDID ROM chips and maybe HDMI
| selectors too, not Linux based computers, but some TVs
| could take it in gross violation of specifications and its
| spirits.
| nosrepa wrote:
| And the serial number of that power plug is MST3K-US
| kuschku wrote:
| The first chromecast actually operated without external
| power, but it only worked with some TVs.
|
| It's possible yours didn't provide enough power via HDMI, but
| at least ours worked just fine.
| ssl-3 wrote:
| It is possible that your memory of a device from a decade
| ago is faulty. No Chromecast has ever been able to be
| powered by HDMI alone. That has never been a thing.
|
| You may instead by remembering the fact only some TVs back
| then were successful at powering the Chromecast without an
| external power brick, using a USB port on the TV itself to
| power up the Chromecast.
|
| In applications where this worked (and it often did work,
| although it also often did not work), it could provide a
| solution that existed entirely on the back of the TV with
| nothing additional plugged into the wall.
|
| But it was still [micro] USB that provided the power to the
| OG streaming stick, not HDMI.
| kuschku wrote:
| > It is possible that your memory of a device from a
| decade ago is faulty. No Chromecast has ever been able to
| be powered by HDMI alone. That has never been a thing.
|
| It is not - I still use my 11yo Chromecast Gen1 today.
| And it still works fine without USB power (as long as you
| don't try to play YouTube videos).
| altcognito wrote:
| I also had this device and would concur it was supposed
| to work without USB power, but in my experience worked
| extremely poorly.
| lightedman wrote:
| "You may instead by remembering the fact only some TVs
| back then were successful at powering the Chromecast
| without an external power brick, using a USB port on the
| TV itself to power up the Chromecast."
|
| I'm looking at my first gen plugged into the ARC HDMI
| port on my Vizio TV. It is ONLY attached to the HDMI port
| and nothing else.
| 486sx33 wrote:
| +1 my visio powers this as well It also powers lots of
| stuff via usb
|
| Maybe because it's NOT a smart tv and doesn't have some
| crazy android chip SoC to constantly power. I mean
| obviously you can make a power supply that could do both
| - or neither. But it likely comes down to price for the
| manufacturer of the tv
| smileybarry wrote:
| Right, but I think it wasn't a real intended use case and
| that some TVs provided amperage over the spec (maybe by
| accident? simpler circuit bridging the same power pin for
| USB and HDMI?).
|
| I had the same first gen Chromecast (may even have it lying
| around somewhere) but it came with explicit directions to
| use the included power cable, so maybe they updated the
| included guide some time after release.
| photon_rancher wrote:
| They probably just provide extra power over the port. It
| costs extra to design an extra supply for a specific port
| so it's probably shared, and likewise also costs extra to
| current limit each port. So more than likely a cost
| saving measure
| ssl-3 wrote:
| HDMI does provide power, but this is not how Chromecast (or
| similar) devices have ever been powered.
|
| It supplies 5v at up to 50mA from a sink device like a TV.
|
| That's only a quarter of a Watt, which is perhaps enough for
| something like an EDID ROM, or maybe a switch or perhaps an
| extender. It is not enough power to run a Chromecast.
|
| HDMI 2.1b Amendment 1 [0] can supply up to 300mA at 5v, but
| that specification is only a year or so old. It requires a
| special cable. And 1.5 Watts maximum isn't enough to run a
| Chromecast, either. (The intent is to be able to use it to run
| a somewhat thirstier extender than the earlier specifications
| would permit.)
|
| 0: https://www.hdmi.org/spec21sub/cablepower
| kalleboo wrote:
| > _It supplies 5v at up to 50mA from a sink device like a
| TV._
|
| And USB is also only supposed to supply 100 mA until the
| device negotiates for more.
|
| But literally every device in the real-world just wires the
| port to the 5V rail with 2 A overcurrent protection and your
| "dumb" USB-powered fan gadget can draw as much as it wants
| without any negotiation.
|
| I can totally see TVs doing the same
| mschuster91 wrote:
| > But literally every device in the real-world just wires
| the port to the 5V rail with 2 A overcurrent protection
|
| Except Macs, Macbooks, iMacs, I _think_ also at least the
| Thunderbolt Display from <very many years ago>. They all
| have a software overcurrent protection that is _very_
| triggerhappy. No negotiation and it will whine and shut the
| offending device off, and same if the negotiated current
| draw is exceeded.
|
| Might be worth a try somewhen when I'm rich enough to
| afford a dynamic resistor bank to verify all the
| characteristics...
| userbinator wrote:
| I've looked at Macbook (pre M1) schematics; they do the
| same as any other PC laptop. The USB power switches do
| not have adjustable current limits.
| kalleboo wrote:
| I've never had any issues running dumb USB loads off any
| of my MacBooks. Just tested it, no problem running 1.7 A
| of dumb resistors just soldered to the power pins with
| nothing on the data pins at all (not even the passive
| "apple charging" resistors)
| https://kalleboo.com/linked/usb-dummy-load.jpg
|
| Macs _will_ shut down a port if it goes over 2.4 A (IIRC)
| without USB-PD negotiation (mainly with the cable rather
| than the device).
|
| But the USB standard says they should limit to 100 mA
| without USB 1.x negotiation, and it's not doing that.
| indrora wrote:
| > But literally every device in the real-world just wires
| the port to the 5V rail with 2 A overcurrent protection
|
| Not quite. To be USB Compliant, you have to do some work
| here and there. There's about six different options. The
| most common _is_ overcurrent detection, such as is seen in
| [1]. There is a whole specification built by USB-IF on how
| to handle higher current ( "battery charge") situations,
| spurred by apple [2], with all sorts of weird corner cases
| [3].
|
| Now, USB-C changes that and specifically calls out that a
| "compliant" downstream device has to negotiate USB PD or
| declare yourself a USB-2.0 type-C device. [4] It's not
| uncommon for newer devices that conform strictly to the
| USB4 specification to not even power a port that hasn't
| negotiated USB-PD or Legacy PD -- if you encounter devices
| that get weird when powered via a usb-c to usb-c cable but
| work fine on a usb a-to-c cable, you've seen someone skimp
| out on $0.00001 in resistors.
|
| [1] https://www.microchip.com/en-us/development-tool/EVB-
| USB2514... [2] https://www.usb.org/document-
| library/battery-charging-v12-sp... [3]
| https://www.graniteriverlabs.com/en-us/technical-blog/usb-
| ba... [4] https://community.infineon.com/t5/Knowledge-Base-
| Articles/Te....
| 0xTJ wrote:
| The HDMI source, not the HDMI sink, provides the power at 5 V.
| As far as I know, every Chromecast required an external power
| connection.
| globnomulous wrote:
| My office stereo has physical connections between the following
| devices (simplifying a bit)
|
| - Speakers connect via speaker wire to monoprice 7x200 amp
|
| - Monoprice amp connects via RCA to denon x3800h
|
| - X3800h receives HDMI from desktop computer and sends HDMI to
| a monitor.
|
| - Same computer connects via Displayport to the same monitor
|
| I used to hear an infuriating buzz when my 2080TI started to
| work hard. It changed depending on the screen output, GPU
| strain, and mouse activity but was constant. It acted like a
| combination ground loop cum coil whine.
|
| The first fix I discovered was to ground my monoprice amp to
| the 2080 TI PCB by wrapping one end of the exposed-copper (12
| awg, I think) grounding wire through and around one of the
| holes in the board and attaching the other end to the Monoprice
| amp's grounding pin.
|
| This fixed the issue completely.
|
| Then I realized I could fix the issue more elegantly and
| elminate the need for grounding: I removed the grounding wire
| and replaced my normal HDMI and Displayport cables with fiber
| optic HDMI and Displayport cables. The buzz has never recurred.
|
| I've never delved further into the problem, but my conclusion
| is the same as yours: there's a design fault somewhere on the
| board, which is causing electricity to flow in ways it
| shouldn't. I'm using an MSI z690 ddr4 edge wifi board. Same
| brand, same generation, as the board where this guy is having
| his SSD power issue.
|
| I still hear a weird, loud buzz through the stereo (including a
| separate amp and separate pair of speakers) when my partner
| runs her hair dryer upstairs, even though my stereo runs on its
| own separate circuit, so regardless of the design issues in the
| board, there's definitely also an issue in my electrical
| system.
| transpute wrote:
| Power conditioner can improve AC isolation
|
| https://www.amazon.com/Furman-AC-215A-Conditioner-Auto-
| Reset...
|
| https://surgestop.com/surge-products/m-474.html
| globnomulous wrote:
| Thanks, this is great advice. I'm using two SurgeX SX
| 2120-SEQ power conditioner+sequencers -- one for the
| desktop devices and one for the stereo.
|
| I'm baffled that, even with the conditioners and even
| though I'm a separate circuit in my office, the hairdryer
| is still able to do _something_ to affect the electricity
| in my office.
| alduin32 wrote:
| > the hairdryer is still able to do something to affect
| the electricity in my office.
|
| This may indicate that your neutral line is undersized
| and/or damaged.
| globnomulous wrote:
| How could I test this?
| alduin32 wrote:
| A first thing to test would be that your voltages are
| nominal, but the exact details depend on how many phases
| are coming from the transformer, how they are wired, and
| whether you are on a TT, TN-C-S or other kind of
| grounding system, which depends mostly on where you live.
| Also, you need to take your voltages both at low
| impedance (simulates a load) and at high impedance
| (negligible load, "classical" meters are generally high
| impedance).
|
| Generally, you want to measure the voltage difference
| between live and neutral depending on the load. However,
| depending on the tools you have access to, taking this
| reading properly can be a bit tricky both because simple
| high-impendance multimeters can easily be tricked by
| ghost voltages caused by bad connections and inductions
| from other cables, and also because understanding what to
| measure requires knowing how is the electrical system
| wired.
|
| If you know you are in a TT system with 240V between
| Live/Neutral, I can tell my procedure for inspecting
| neutrals. In a two-pole TN-C-S system with 120V between
| L1/Neutral and 240V between L1/L2, I suppose it would be
| similar, expect that we'd have to do more tests (both L1
| and L2 to neutral, and I imagine also L1 to L2).
|
| EDIT: a first simple check to do is to check, using any
| multimeter, if there is voltage drop in your office when
| the hairdryer is in use.
| tinfever wrote:
| Interestingly, the PCIe 8-pin power cable into a GPU doesn't
| carry all of the return current. If you put a current clamp
| meter around the +12V wires and then the ground wires, you'll
| measure more amps on the +12V wires than the ground wires.
| This means some of the return current goes through the PCIe
| slot into motherboard and makes its way back to the PSU. This
| lets the GPU create audio noise because GPUs draw high
| current pulses at the frame rate of your monitor, which means
| the return current through the motherboard has high current
| pulses, which can create ground bounce on the motherboard
| where the ground voltage level moves up and down and that can
| affect other devices in the system.
|
| I don't totally know how that noise would traveling over the
| ground shield of the HDMI cable into the analog section of
| the Denon receiver though. Maybe some of that GPU return
| current is going through the HDMI cable, through the Denon
| receiver to mains earth, and then through your building
| wiring back to the ATX PSU? Grounding is freaking weird.
| globnomulous wrote:
| Oh, wow, yeah, that's really interesting. I don't
| understand electricity or know nearly enough about
| electrical engineering to be sure I understand the effect
| or flow you're describing, but if I (dimly) grasp what
| you're saying, it would explain exactly the behavior I
| observed.
|
| Grounding really is incredibly weird (and, again, I say
| this as someone who is shamefully ignorant of electrical
| principles). It's no surprise that some 'audiophiles'
| become so superstitious about electricity. Its behavior in
| a stereo can be mysterious. Just looking at an amp funny
| seems like enough to cause a ground loop.
| jauntywundrkind wrote:
| I can't get my Crucial P3+ to wake from sleep.
|
| I'd like to dig in more but I haven't had this issue with any
| other SSD in this system. Pretty close to saying I'm done with
| Crucial.
| NewJazz wrote:
| I've had a similar experience with a crucial nvme drive, but a
| kernel update seems to have introduced a quirk-based fix. Not
| sure how much of a kludge that fix is, though.
| wtallis wrote:
| The quirks tables in the Linux NVMe drivers are impressive
| and depressing:
|
| https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin.
| ..
|
| https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin.
| ..
|
| https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin.
| ..
|
| And they're not even close to being comprehensive.
| fulafel wrote:
| Interesting that there are also some anti-quirk special
| cases in the vendor combo function (second link above), so
| a certain platform is excepted from the quirk workaround:
| \* \* Exclude some Kingston NV1 and A2000 devices
| from \* NVME_QUIRK_SIMPLE_SUSPEND. Do a full suspend
| to save a \* lot fo energy with s2idle sleep on some
| TUXEDO platforms. \*/ if
| (dmi_match(DMI_BOARD_NAME, "NS5X_NS7XAU") ||
| dmi_match(DMI_BOARD_NAME, "NS5x_7xAU") ||
| dmi_match(DMI_BOARD_NAME, "NS5x_7xPU") ||
| dmi_match(DMI_BOARD_NAME, "PH4PRX1_PH6PRX1")) return
| NVME_QUIRK_FORCE_NO_SIMPLE_SUSPEND;
| wtallis wrote:
| I think some of those issues probably stem from the fact
| that there's not really any alignment between the NVMe
| spec and the PCIe spec with respect to power management
| capabilities. I've encountered drives that have implicit
| dependencies where certain NVMe power management features
| only work as intended when certain PCIe power management
| features are available, but there's no way for the drive
| to express those requirements to the host system, and no
| standard compliance test suite that will reveal the
| broken behavior that can occur in the wild.
|
| Sometimes figuring out who to blame for misbehaving
| hardware requires custom kernel patches, a hardware
| protocol analyzer at the M.2 slot, and reverse-
| engineering the motherboard firmware. Most of the entries
| in the quirks tables are based on a lot of guess-work and
| inferences because the kernel developers don't have the
| resources to fully investigate and reproduce these kinds
| of issues (and the hardware vendors simply don't care
| about thoroughly ironing out these bugs). It really sucks
| when you have to look at power flow out of the laptop
| battery and try to figure out from that whether your SSD
| is pulling more power than it should.
| fulafel wrote:
| Wow. I guess this also explains some of the s2idle
| troubles, with S3 sleep there are the vendor-tested
| motherboard+peripheral combos that are shown to work with
| the power states attempted by suspend and any hw/fw bugs
| get troubleshot before they make it out of the vendors
| lab.
| jandrese wrote:
| Oh yeah, and in some cases if the system attempts to go
| into S2 sleep it simply bricks the SSD forever. I lost a
| whole lab worth of drives once before I figured it out.
| The vendor was the opposite of helpful, refusing to
| acknowledge the problem and then wiping their hands of it
| and walking away. The only solution I've found requires a
| hardware modification of the drive, downloading a rip of
| the vendor's internal repos from a sketchy russian
| website, building a new firmware from scratch, and then
| flashing it with some custom hardware.
| hulitu wrote:
| That would explain why, sometimes, my linux will not find
| the NVME SSD when booting. (MSI mobo with Kingston SSD).
| chupasaurus wrote:
| Model or at least year of that SSD? Early on Kingston
| used faulty controllers that randomly fail to initiate
| and degrade with power cycles.
| hulitu wrote:
| Since last year.
| doubled112 wrote:
| I have a pair of ASUS Vivobook laptops with Kingston
| NVMEs.
|
| While running the factory install of Windows, those NVMEs
| would cause a BSOD every third or so boot. Clean install
| didn't help either, nor any driver or firmware update.
|
| No Linux install has shown any signs of problems.
| wtallis wrote:
| Is this on a Linux system? NVMe power management has always
| been hit or miss for consumer SSDs under Linux because the SSD
| vendors don't write their firmware against the NVMe spec, they
| write it to work with the Microsoft Windows NVMe driver and any
| feature Windows doesn't use is liable to be broken. This
| applies to basically every SSD brand, by the way.
| jauntywundrkind wrote:
| Yes, it's an NVMe.
|
| Western Digital & OCZ nvme drives have both worked fine in
| this system, so I'm feeling a bit salty about this. Would
| like to try some Samsung drives at some point.
|
| (Running Linux 6.11.7 atm.)
| Astronaut3315 wrote:
| I returned a Crucial P3+ after I discovered a massive
| performance degradation with Bitlocker. It was slower than
| spinning rust. Seems these drives have some unresolved firmware
| issues.
| zamadatix wrote:
| On the topic of odd failure modes involving Crucial SSDs and MSI
| motherboards (though one that seems to actually be the drives
| fault) I have a t705 which at some point started only coming up
| as x2 lanes instead of x4 no matter which board I put it into
| (with no visible damage or indication as to why, though I did try
| to wipe down the contact side with some rubbing alcohol anyways).
|
| The particularly interesting part is I have a new x870
| motherboard which supports m.2 slot 2 as being 0x, 2x, or 4x CPU
| direct lanes depending if you want 4x, 2x, or 0x to go to the USB
| 4 ports respectively. At first it sounds like a good combo - put
| the drive which wants to run at x2 only in the extra slot where
| x2 only mode is a reasonable tradeoff and still get great
| bandwidth because those lanes are pcie 5 and not through the
| chipset. For whatever reason though that drive only ever comes up
| in an x4 slot (at x2 speed) but not any x2 slots I've tried. I
| don't know enough about PCIe to assume why that is for sure but
| it seemed odd to me it was any way but "something is wrong with
| the 3rd or 4th lane and setting the slot to x2 lets the first 2
| work at x2 the same as when the slot is set to x4 and it only
| comes up as x2".
| magicalhippo wrote:
| PCIe devices are required to boot up using x1 lane only, and
| then negotiate further lanes with upstream.
|
| AFAIK it shouldn't matter if they're direct to CPU or not, at
| least not logically.
|
| I note the drive is Gen5 capable, does it negotiate x2 5.0
| lanes or something else?
| zamadatix wrote:
| Negotiates to 2x 5.0 so long as the board it's plugged into
| supports it. 2x 4.0 or 3.0 otherwise. Hadn't tested even
| lower.
| tfwnopmt wrote:
| I came across this in a manual/datasheet:
|
| > _16.Link Width Negotiation in the Presence of Bad Lanes
|
| >In an effort to maximize the link width when one or more lanes
| of a multi-lane link are not functioning correctly (i.e.,
| reliable communication of training sets across the lane is not
| possible), PES64H16G2 down-stream switch ports automatically
| attempt a lane reversed configuration when doing so has the
| potential to enhance the achievable link width.For example, if
| lane 1 of a x4 link is not operating correctly, the device's
| downstream switch port attached to the link attempts a lane
| reversed configuration to form a x2 link using lanes 2 and 3
| (Figure 7.4(d)). If the link partner accepts the lane reversed
| configuration, the optimal x2 link will be formed using lanes2
| and 3. If the link partner does not accept the lane reversed
| configuration, but instead requests a lane configuration
| supported by the PES64H16G2 (e.g., x1 link using lane 0), the
| device accepts the configuration and forms the reduced width
| link. Otherwise, if the lane numbering agreement fails, the
| device automatically re-trains the link from the Detect state.
| During this re-training, the PES64H16G2 port does not re-
| attempt a lane reversed configuration, but rather tries to form
| the link without reversing the lanes. As a result, a x1 link is
| formed using lane 0 (Figure 7.4 (e)). _
|
| My guess is it's likely a bad BGA solder ball on Lane1, or
| possibly ESD damage if you took the SSD out and molested it or
| rubbed it on a cat right before it broke. Does it indicate it's
| using reversed lanes?
| zamadatix wrote:
| Nice digging, that lines up perfectly with the observed
| behavior! I'll have to poke around and see if anything
| indicates that's the operational mode to be sure.
|
| The failure mode was that one day I just noticed it was
| copying sequential data from another drive slower than it
| normally did. Don't recall it ever having been touched after
| install (it is the heatsinkless variant of the T705 4TB
| mounted on the motherboard m.2 hearsink for that slot). Temps
| always reported quite reasonable, even when under stress
| bench load (which was rare, the drive was just a secondary
| drive for loading games). Since then it's been popped between
| about 10 boards in confusion though haha. No cat yet!
| lizknope wrote:
| I just got a Crucial T700 last month which is a PCIE Gen5 x4
| NVMe M.2 drive.
|
| I put it in an ASUS PRIME Z890M-PLUS motherboard with an Intel
| Core Ultra 7 265K
|
| Started to install Fedora Linux version 41. The drive would
| just completely disappear from the OS and the kernel would
| report I/O errors on a missing device. Sometimes this happened
| during the initial install. Sometimes 5 minutes after the
| install when starting a terminal. I couldn't even type "ls"
| because the "ls" command is on the drive that went away.
|
| Saw reports of PCIE Gen5 incompatibilities so I moved it to a
| Gen4 slot and then it worked.
|
| But the machine had so many other random crashes and errors
| reported in system logs saying "This is a hardware error not
| software" and stuff like that. Returned it all.
|
| Just got an AMD Ryzen 9 9950X and Gigabyte X870E AORUS PRO
|
| The Gen5 drive seems to be working at Gen5 speeds.
|
| lspci -vv shows
|
| 02:00.0 Non-Volatile memory controller: Micron/Crucial
| Technology T700 NVMe PCIe SSD (prog-if 02 [NVM Express])
| LnkSta: Speed 32GT/s, Width x4
| geor9e wrote:
| Why's a random tech support forum post from yesterday with 2
| people replying getting reposted to HN
| aprilnya wrote:
| I personally found it interesting.
| frantathefranta wrote:
| Slow week but people probably enjoy the methodical
| troubleshooting.
| ejiblabahaba wrote:
| For what it's worth, this post just helped me explain several
| years of failure to wake from sleep state, across several
| different MSI-based machines, when I've connected them to an
| HDMI port in my TV. I think this debug is interesting in its
| own right, and unlike 99% of the content on this website, it
| was directly and immediately useful to me. I doubt I'm the only
| one, too.
| transpute wrote:
| This post described a rare interoperability failure with
| unexpected root cause, of possible interest to:
| Motherboard designers People upgrading PCs/laptops
| SSD firmware developers BIOS developers attempting PCIe
| device boot OS/hypervisor developers attempting PCIe
| device reset
|
| If you don't like this HN story, you could contribute your
| first story to HN.
| sebazzz wrote:
| I have something similar with my webcam, which is connected to my
| Samsung monitor usb hub, which is connected to a usb-c dongle,
| which is connected to my work laptop.
|
| If my laptop crashes during a Microsoft Teams call, possibly due
| to the webcam, it will not show up in Windows again without it
| physically being disconnected from the USB hub in my Samsung
| monitor. I can disconnect the USB-C dongle or the monitor from
| USB, change ports, power off the laptop, it doesn't matter
| because that doesn't work. Only physically disconnecting and
| reconnecting it makes it show up in device manager again.
| qingcharles wrote:
| I hate faults like that.
|
| Used to work in PC repair. Man brings in PC, mouse right click
| doesn't work. Everything else operates perfectly.
|
| Replaced in this order: mouse, IO card, hard drive with fresh OS,
| RAM, CPU, graphics card, motherboard. Still no right-click.
|
| Replaced the PSU last. Right-click works. FML.
| Frenchgeek wrote:
| You didn't have to replace the house's wiring at least
| (Happened to an aunt of mine: Gave her a computer, it worked
| perfectly outside of her home. The electrician was a tad
| horrified. She still scoffed when I suggested the computer
| wasn't the problem first.)
| Moru wrote:
| I plugged my old Atari into an outlet in the old basement in
| a different building. The HDD-cable started burning.
|
| Electric company plugged in some device to measure power over
| time. Turns out the power was slightly below normal but
| within tollerances. The OEM power supply that was powering my
| Atari wasn't up to standards. If I remember right, badly
| designed PSU's can feed too high current if the voltage is
| too low. Or something like that, was a very long time ago...
| ajb wrote:
| Many switch mode power supplies will increase the current
| draw if the voltage drops, that's why many of them will
| work on both 120 and 248V, while old school power supplies
| need a manual switch. I had a brownout once and thought my
| washing machine was broken because that was the only thing
| that stopped working (Until evening when I switched on the
| lights. That was back in the days of incandescents, oddly
| though led lights still dim with lower power, I don't know
| how they do voltage conversion).
|
| We have so many cheap power supplies in our houses that it
| would not surprise me if at least some become unsafe if the
| source voltage drops too low. Being unsafe with only a
| slight drop is weird though.
| ksec wrote:
| >Replaced the PSU last. Right-click works. FML.
|
| My experience is always replace DRAM, and then PSU, and then
| Swap Motherboard.
|
| I don't think people realise how many faults there are with
| DRAM, PSU and MB. DRAM quality has gotten a lot better in the
| past 10 years so that is less of an issue. PSU, however it
| where cost cutting are and more often than not causes problems.
| donalhunt wrote:
| Reminds me of an old hwops story where one machine just
| constantly failed despite replacing every part on the tray
| multiple times. The conclusion was that the tray was bad.
|
| Google's definition of a server was (and still is afaik) based
| on the tray (chassis) so there was no way to replace it. IIRC
| it was "retired" with vengeance leaving a gap in the cabinet --
| a warning to other trays to behave.
| userbinator wrote:
| This is a good cautionary story of why random parts-swapping can
| be a waste of time and money. Getting out the DMM and measuring
| voltages is something fewer and fewer people know how to do when
| troubleshooting electronics, but it certainly saved the OP here;
| I'd go a little further and figure out why the monitor seems to
| be leaking power into its HDMI input when switched off ---
| possibly an ESD-damaged MOSFET or similar?
|
| _The issue does not occur when the monitor is connected via
| DisplayPort._
|
| https://en.wikipedia.org/wiki/DisplayPort#DP_PWR_(pin_20)
|
| _Standard DisplayPort cable connections do not use the DP_PWR
| pin._
|
| There's also an interesting paragraph there, about some
| nonstandard cables connecting that pin through.
| Arcanum-XIII wrote:
| Not all DMM have probe small enough to connect to the lane. If
| it's even possible. What's more, you need to know where to put
| it, which can be daunting without the proper knowledge.
| Switching hardware is easier, faster and often the best
| solution in those case.
|
| Finding hardware fault is hard. Tracing it is even harder.
| userbinator wrote:
| I think there's something wrong with your DMM probes if you
| can't measure the ATX power connector with them.
| hamandcheese wrote:
| On the other hand, I recently fried a motherboard while trying
| to probe it with a multimeter. My fat fingers shorted out two
| adjacent pins, causing a loud spark and magic smoke.
| bunnie wrote:
| Reading the thread it looks like the issue is leakage power on
| the internal 3.3v line. When the system is off 1.9v is still
| present. This is not uncommon, although 1.9v is a bit high. A lot
| of laptops have explicit active pull downs on power supplies to
| clamp them to zero when power is off to ensure peripherals are
| not accidentally powered on by stray leakage (because laptops are
| extremely low power by design and there is not enough stray
| leakage to bring the power lines down in a sleep state). My guess
| is main boards might not have this feature because normally there
| is enough off state loading that it takes care of itself. however
| maybe in this case the loading is not enough.
|
| A dirty fix could be to just put a static load on the 3.3v line
| to ground. I'd start with a 1/4w resistor around 100 ohms and
| just stick it from 3.3v to ground to see if that does not soak up
| the stray current. if it works just leave it, it's about 0.1
| watts of static power and no big deal for a non portable setup.
|
| The larger picture is that the controller on the nvme might not
| hit its power on reset condition because it may be rated to run
| at 1.8v (just a guess), so 3.3v is not going low enough for the
| controller to perceive the system has been power cycled. Usually
| a supplemental power monitor is needed in those cases to ensure a
| reset is generated in case of leakage problems like this.
| starslab wrote:
| Hi! I'm the OP from the Level1Techs thread.
|
| That HDMI power has some grunt behind it. During power-off
| state with that 1.90v phantom voltage, I put a 48ohm resistor
| between 3V3 and ground, the phantom voltage only dropped to
| 1.80v, and the SSD still didn't work when I powered the machine
| back on.
| oneplane wrote:
| Depending on the PMIC and the SSD DC conditioning, even 1.2v
| might be enough for it to brownout/latchup without self-
| resetting. (or it might power up the PHY partially or in a
| bad state and never link up)
|
| Try more resistors in series? (or just a bigger one if you
| have any -- scratch that we needed smaller ;-) ).
| starslab wrote:
| 12 ohms brings the rail down to 1.47 volts, still no SSD. 6
| ohms is enough to finally break/trip whatever circuit is
| allowing this situation, bringing the rail down to 0v in
| power-off. Of course, that's almost 2 watts of constant
| draw during the power-on state, so not a long-term
| solution.
| oneplane wrote:
| Oof, that is a giant leak somewhere. It's really sad we
| have to go to some shady websites to find schematics for
| mainboards, otherwise we could just get to the cause of
| this pretty quickly.
| numpad0 wrote:
| 6 Ohms! Might as well just jumper it(don't)
|
| Does it sound like reverse current through SBD? They have
| higher reverse current and leaky I-V curve. 3.3V of drop
| must mean something inline.
| starslab wrote:
| > scratch that we needed smaller ;-)
|
| Well... Needed smaller in terms of resistance, but needed
| bigger in terms of power rating, in the interests of not
| catching fire.
| okanat wrote:
| I bought the same model SSD for my Thinkpad P1 last month and saw
| the exact issue. I had to return it because it was breaking the
| NVMe detection completely. So it wasn't a broken unit but a
| design issue after all?
| BearOso wrote:
| Since we're talking SSDs, I wonder if we could get some attention
| to the Phison E18 degradation issue [1]. Only one manufacturer,
| Kingston, has put out firmware containing Phison's fix, while the
| others just ignore it.
|
| A bunch of these drives with this controller were on sale during
| black Friday, so a lot more people are going to have problems in
| a month or so.
|
| 1.
| https://www.reddit.com/r/pcmasterrace/comments/1f1piwf/psa_p...
| userbinator wrote:
| That sounds like NAND degradation (retention failures) which
| can only be partially worked around in firmware (and causing
| more write cycles on already-marginal QLC). Unfortunately the
| real solution is "use better NAND", which is unlikely to happen
| unless enough people demand it.
| ciupicri wrote:
| Kingston KC3000 supposedly uses Micron 176L TLC memory [1].
|
| The Seagate Firecuda 530 datasheet clearly says "Built with a
| Seagate-validated E18 controller and the latest 3D TLC NAND".
| A review is more precise: "Phison PS5018-E18" & "Micron B47R
| 176-layer 3D TLC NAND" [2].
|
| [1]: https://www.tomshardware.com/reviews/kingston-
| kc3000-m2-ssd-...
|
| [2]: https://www.kitguru.net/components/ssd-drives/simon-
| crisp/se...
| userbinator wrote:
| B47R is indeed TLC, rated for only 1000 cycles (and 35k in
| SLC mode, at 1/3 the capacity.) There's also the question
| of whether this is "true" Micron NAND, or SpecTek which is
| basically Micron's rejects (and rated for even fewer
| cycles; only 300 in the case of their B16A.)
| ciupicri wrote:
| Kingston doesn't seem to offer any support for Linux, so their
| new firmware is virtually non-existent to me. Why can't I just
| download the firmware and use standard nvme-cli tools to update
| the SSD, beats me. If Seagate (which by the way uses Phison E18
| too) can do it, so can Kingston, Samsung, Crucial, Western
| Digital and many others.
|
| Even better would be use Linux Vendor Firmware Service
| (https://fwupd.org/).
| amelius wrote:
| I have a similar problem with a Jetson board. If I turn off the
| power long enough (one night) and then turn it on, the only PCI
| card is not recognized and I have to power-cycle it to get it
| running.
| structural wrote:
| Mind sharing what board/Jetson module you've seen this on? I've
| seen this exact symptom very intermittently on a custom board
| and we've wondered for a long time if was an issue with a
| specific type of module (or manufacturing lot of modules).
| amelius wrote:
| This one: https://www.avermedia.com/professional/product-
| detail/D315%2...
|
| My startup logic now power-cycles it until the PCI board is
| recognized; it works, but it's not a great solution.
| structural wrote:
| Interesting, we're using a completely different module
| (Xavier NX). And the same, disgustingly hacky, fix, of
| forcing a reset until it works.
| amelius wrote:
| I also run these commands: echo 1 >
| /sys/bus/pci/rescan sleep 1
|
| Sometimes it brings the PCI card back, so I just run this
| as part of my boot sequence.
| undertaken wrote:
| anecdotal/weird computer experience:
|
| I have a rebadged Tongfang laptop (NB02 GMxRGxx w/ Ryzen 9) and
| upgraded it shortly after purchase.
|
| The machine arrived with lower capacity Samsung SODIMMs. Swapped
| in 64GB of Crucial DDR5.
|
| Shortly afterwards the machine became instable to the point of
| RMA. Kernel logs clogged with all sorts of panics related to
| NVMe, PCIe, and filesystem. Freezing. Reboots.
|
| Spent hours diagnosing it. Many permutations of kernel command
| line arguments; pcie, acpi tables, iommu. All for naught.
|
| The machine passed memtest86 / memtest86+ with flying colors.
|
| bonnie++ absolutely trashed it. reliably.
|
| Occasionally the NVMe drives fell off the pci bus and it wouldn't
| boot until I disable the slot in bios, power cycle, then re-
| enable the slot.
|
| Fast forward to me getting fed up with a dysfunctional system, I
| attmepted RMA and gave them the rundown of all the weird
| seemingly chipset related failures.
|
| They pushed back with "Try our RAM again."
|
| I nearly had an aneurysm when everything was stable again.
|
| After thanking the support staff profusely I bought larger
| capacity Samsung DIMMs in the same chip family. Still running
| flawlessly after almost a year.
|
| Maybe try new RAM for yucks? ;)
| bb88 wrote:
| So these guys [1] mention something similar where HDMI from a TV
| is backfeeding 40-50 volts into a cable box. This could be
| because of many things from electrical outlet wiring to power
| supply issues on the monitor to a bad component on the monitor
| giving a high voltage, or the monitor is badly grounded, etc,
| etc.
|
| I read the original thread but it doesn't look like you've
| measured the voltage at the HDMI port wrt motherboard ground. I
| think we're assuming it's 5 volts, but it could be higher, and it
| could have shorted (or weakened) a component on your motherboard.
| And that would explain why a 100 ohm resistor didn't give a
| meaningful voltage drop.
|
| If you need an isolation solution, Amazon sells a 50ft fiber
| optic one way HDMI cable [2]. The thing I don't know is if
| there's any actual copper to provide power over the link. There
| are other options which transmit the HDMI signal over pure
| multimode fiber as well [3].
|
| Or you can go with a DP KVM, since you're on L1T, they sell a few
| DP models. I have one I purchased from L1T, and I like it a lot.
|
| Definitely though I would check out the outlets to make sure they
| were wired correctly. Incorrectly wired outlets because someone
| tried to DIY it in the US is absolutely a problem.
|
| [1] https://www.avsforum.com/threads/hdmi-cable-backfeeding-
| volt...
|
| [2] https://www.amazon.com/HDMI-FURUI-HDCP2-2-18Gbps-
| Subsampling...
|
| [3] https://fibercommand.com/products/8k-fiber-plugs?gQT=1
| starslab wrote:
| I already own one of those fiber-hdmi cables. Brilliant, but
| sometimes doesn't interoperate with DVI devices using passive
| DVI -> HDMI adapters. I've no idea if it has any copper
| conductors for HDMI power, though one end is labelled for the
| source and one for the display, suggesting that however it's
| designed it's not bi-directional.
|
| I'd love a DisplayPort KVM, but not every device that comes
| across my bench has a DisplayPort output, and those few that
| have DisplayPort but no HDMI can be accommodated with one of
| those commodity DisplayPort -> HDMI adapters. This situation is
| actually getting worse over time, not better, as many modern
| devices and laptops are skipping DisplayPort in favor of USB-c
| alt-mode.
|
| This issue has actually been going on through a monitor change
| on my testbench. It has happened with a Samsung SyncMaster 204T
| though my KVM switch, an HP ZR24w through my KVM switch, and
| the ZR24w directly connected. I don't think this is an issue
| with the rest of my equipment.
|
| This electrical was done about 15 years ago, by a ticketed
| electrician. One of those $5 plug testers indicates all is
| well, and I have no reason to believe there's any issue here.
|
| By almost pure coincidence, I have an MSI PRO X870-P
| motherboard on order. I'm looking forward to seeing if this
| same 3V3 leakage issue is present on this board too.
| bdavbdav wrote:
| I had the same on an AORUS X570. Displayport cables with a line
| tied both ends (shouldn't be, but many are) would cause BIOS
| resets, corruption and memory retraining.
| blagie wrote:
| I was an exclusive user of Crucial for memory and storage until
| about a year ago. My general thought was that:
|
| - It would give me a trusted supply chain, since the company
| makes the silicon; and
|
| - I would have a credible standing behind it, which wasn't likely
| to want to tarnish its reputation cutting corners.
|
| The thinking was very much along the lines of "No one got fired
| buying IBM." And I think it was pretty correct for most of the
| past quarter-century. Historically, storage had a lot of
| counterfeits and shenanigans, and a credible vendor was nice.
| Price/performance for memory was adequate; there was a modest
| premium.
|
| However, post-2020, I bought a defective Crucial DIMM (and didn't
| find out it was defective until I was past the return window).
| The RMA experience was strange. Crucial said they could either:
|
| - Replace it with an inferior part with different, slower
| timings, which may or may not have worked in my system
|
| - Give me a quickly-expiring store credit for "fair market value"
| (never disclosing what that was, and stopped responding to emails
| when I asked)
|
| Neither of these was helpful at all.
|
| Reading online, there were many similar stories, unfortunately.
| They seem to be going the same direction as Sandisk / Western
| Digital. I replaced it with a cheap TeamGroup DIMM which worked
| without problems.
|
| I'm not quite sure what to do about the continued
| enshitification. There seem to be almost no credible brands left.
| sciencesama wrote:
| I have a similar issue with nvme on a wlan slot on the lenovo
| thinkpad gen 8 !!
___________________________________________________________________
(page generated 2024-12-29 23:02 UTC)