[HN Gopher] I booted Linux 293k times in 21 hours
___________________________________________________________________
I booted Linux 293k times in 21 hours
Author : jandeboevrie
Score : 560 points
Date : 2023-06-14 13:54 UTC (9 hours ago)
(HTM) web link (rwmj.wordpress.com)
(TXT) w3m dump (rwmj.wordpress.com)
| ineedasername wrote:
| If there is a platonic ideal of 'uptime' then this has got to be
| its opposite.
| w-m wrote:
| 292,612 is not an interesting number, it's not contained in any
| known integer sequence. The search in OEIS only brings up
| sequence A292612
| (https://oeis.org/search?q=292612&fmt=data&sort=number).
| akira2501 wrote:
| 2 * 2 * 191 * 383
|
| Which is mildly interesting.
| w-m wrote:
| Neat indeed.
|
| 3 * 2 ^ {0, 0, 6, 7} - 1
|
| And all of them are palindromes.
| jwilk wrote:
| For people confused by the above notation:
|
| 2 = 3 x 20 - 1
|
| 191 = 3 x 26 - 1
|
| 383 = 3 x 27 - 1
| high_pathetic wrote:
| > it's not contained in any known integer sequence
|
| I think this makes it interesting!
| w-m wrote:
| Ah yes, the good old
| https://en.wikipedia.org/wiki/Interesting_number_paradox
| Dylan16807 wrote:
| Then your standard is too low.
|
| And I mean that objectively. That standard would not allow an
| uninteresting number.
| adverbly wrote:
| Make sure you add it to the integration test suite so it doesn't
| get re-introduced later ;)
| vintagedave wrote:
| > I found the culprit, a regression in the printk time feature:
| https://lkml.org/lkml/2023/6/13/733
|
| The issue hasn't been fixed yet, but if it affects you the
| proximate cause is known and can be reverted locally.
| efitz wrote:
| I told him not to turn on Windows Update.
| Laremere wrote:
| Here they mention that each bisect ran a large number of times to
| try and catch the rare failure. Reminds me of a previous
| experience:
|
| We had a large integration test suite. It made calls to an
| external service, and took ~45 minutes to fully run. Since it
| needed an exclusive lock on an external account, it could only
| run a few tests at a time. We started getting random failures, so
| we were in a tough spot: bisecting didn't work because the
| failure wasn't consistent, and you couldn't run a single version
| of a test enough times to verify that a given version definitely
| did or didn't have the failure in any practical way. I ended up
| triggering a spread of runs over night, and then used Bayesian
| statistics to hone in on where the failure was introduced. I felt
| mighty proud about figuring that out.
|
| Unfortunately, it turns out the tests were more likely to pass at
| night when the systems were under less strain, so my prior for
| the failure rate was off and all the math afterwards pointed to
| the wrong range of commits.
|
| Ultimately, the breakage got worse and I just read through a
| large number of changes trying to find a likely culprit. After
| finally finding the change, I went to fix it only to see that the
| breakage had been fixed by a different team a hour or so before.
| It turned out to be one of our dependencies turning on a feature
| by slowly increasing the probability it was used. So when the
| feature was on it broke our tests.
| ambicapter wrote:
| > Ultimately, it turned out to be one of our dependencies
| turning on a feature by slowly increasing the probability it
| was used.
|
| Wow. I feel like this dependency should be named and shamed.
| thehappypm wrote:
| Isn't this how multi-armed bandits work?
| rootsudo wrote:
| algo 101, but I can see how it can be nifty for
| $internalapp.
| Laremere wrote:
| Big company internal dependency. So nothing for the public to
| care about.
| vamega wrote:
| What company. I've seen this being done (and my team does
| it a lot at Amazon) but curious to know if others are doing
| it at build time too.
|
| If done in a company with a monorepo I'd be especially
| interested in hearing more
| aeyes wrote:
| > If done in a company with a monorepo I'd be especially
| interested in hearing more
|
| Are there any big companies left which haven't adopted a
| monorepo?
| [deleted]
| PartiallyTyped wrote:
| AWS. We probably have the worst build systems :(
| n49o7 wrote:
| Probabilistic feature flags! Love it.
| thehappypm wrote:
| Multi-armed bandits utilize this
| Thorrez wrote:
| Always base the probability on something stable, such as hash
| of the username.
| IshKebab wrote:
| Bug report: changing my username breaks $product.
|
| Yeah no thanks. It's probably better than completely random
| but software should be predictable and unsurprising.
| btilly wrote:
| I've used the hash of username+string trick before for a
| flag. I used it to replace a home-grown heavyweight A/B
| testing framework which had turned into a performance
| bottleneck.
|
| It worked quite well.
| burnished wrote:
| The important part is the stability - if your usernames
| can change then they aren't stable so you don't select
| it.
|
| I think it is a good reminder that most things you think
| of as being unchanging that are also directly related to
| a person.. aren't unchanging. Or at least any conceivable
| attribute probably has some compelling reason why some
| one will need to change it.
| dietr1ch wrote:
| That's why you have internal user ids instead of using
| data directly provided by users.
|
| Will it cost an extra lookup? It's cheap, and if you
| really need to, you could embed the lookup in some
| encrypted cookie so you can verify you approved some
| name->id mapping recently without doing a lookup.
| robocat wrote:
| > changing my username breaks $product.
|
| https://m.youtube.com/watch?v=r-TLSBdHe1A&t=14m10s
|
| Discussing a performance regression due to longer
| username due to username being in ENVIRONMENT variable
| which changes memory layout of process.
| [deleted]
| painted-now wrote:
| Man, this story sounds like you could be on my team :-) Pretty
| much experienced the same stuff working at BigCo!
|
| In the end, I think the real problem is that you can't test all
| combinations of experiments. I don't trust "all off" or "all
| on" testing. In my book, you should indeed sample from the true
| distribution of experiments that real users see. Yes, you get
| flaky tests, but you also actually test what matters most, i.e.
| what users will - statistically - see.
| joosters wrote:
| This sounds like a situation that would benefit from using an
| approach like all-pairs testing -
| https://en.wikipedia.org/wiki/All-pairs_testing
|
| Basically, if you have N different features (let's assume
| they are all on/off switches, but it works for multi-values
| too), in theory you'd need to run 2^N tests to cover them
| all, which would become completely impractical. But, you can
| generate a far, far smaller set of test setups that guarantee
| that every pair of features gets tested together. Run those
| tests and you'll probably encounter most feature-interaction
| bugs in a much quicker time.
| cscheid wrote:
| All-pairs is for _pairs of features_. For subsets you're in
| much deeper trouble because of the exponential dependence
| on N. For a fixed polynomial dependence, you can get clever
| and let tail bounds eventually work for you, but for
| exponentially growing hypothesis sets, that won't work.
| yojo wrote:
| Yikes!
|
| FWIW, I think best practice here is to hardcode all feature
| flags to off in the integration test suite, unless explicitly
| overwritten in a test. Otherwise you risk exactly these sorts
| of heisenbugs.
|
| At a BigCo that's probably going to require coordinating with
| an internal tools team, but worth getting it on their backlog.
| All tests should be as deterministic as possible, and this goes
| double for integration tests that can flake for reasons outside
| of the code.
| btilly wrote:
| No, the best practice is that on each test run, every feature
| flag used implicitly or explicitly needs to be captured AND
| it must be possible to re-run the test with the same set of
| feature flags.
|
| That way when you get a failure, you can reproduce it. And
| then one of the easy things to do is test which features may
| have contributed to it.
| nosefrog wrote:
| But then you won't catch the bug before it hits production :)
| dmoy wrote:
| Also you end up with some strange long term test behavior.
| Because people will often leave feature flags in place long
| after full release ( _years_ sometimes), you end up with a
| default-off-in-tests only testing behavior with everything
| newer than N years since the last feature flag cleanup
| disabled.
|
| Yes it's kinda fractal of bad practices that have to align
| for this problem to occur, but that's the nature of tech
| debt.
| linuxdude314 wrote:
| You are both misunderstanding the post.
|
| He's not saying to alter any of the feature flags used
| for the test, but simply to record which were used during
| the test.
|
| Simply logging doesn't introduce any of the issues you
| are describing.
| ASinclair wrote:
| This is my daily life at BigCo. These bugs are the worst.
| anotherhue wrote:
| Excellent 'obsessed detective' story
| hinkley wrote:
| I used to think I was amazing at performance tuning and
| debugging but after working with a few hundred different people
| it turns out I'm just really fucking stubborn. I am not going
| to shrug at this bug again. You are going down. I do have a
| better way of processing concurrency information in my head,
| but the rest is just elbow grease.
|
| I had a friend in college who was dumb as a post but could
| study like nobody's business. Some of us skated through, some
| of us earned our degree, but he really _earned_ his. We became
| friends over computer games and for a long time I wondered if
| games and fiction were the only things we had in common. Turns
| out there's maybe more to that story than I thought at the
| time.
| allenrb wrote:
| I think you're absolutely right. Some of the things I've been
| most proud of have been products of stubbornly refusing to
| give up. On the other hand, some vast oceans of wasted time
| have been another result. It's tricky to know _when_ to be
| tenacious!
| hinkley wrote:
| In my defense, I am a strong proponent of refactoring to
| make all problems shallow. So there are classes of bug that
| I will see before anyone else because I move the related
| bits around and it becomes obvious that there are missing
| modes in the decision tree.
|
| I tend to believe that discipline and tenacity are separate
| traits. Often appearing in the same people, but different
| skills with different exercises.
| allenrb wrote:
| Bingo, that is very well put. Discipline is where I'll
| tend to fall short. :-)
| 7ewis wrote:
| Reminds me how cosmic rays were noted to have caused computer
| glitches. [0]
|
| Impressive that they managed to discover this bug.
|
| [0] - https://www.bbc.com/future/article/20221011-how-space-
| weathe...
| Musky wrote:
| In the speed running community there is a pretty famous clip
| [0], where a glitch caused a Super Mario speed runner to
| suddenly teleport to the platform above him, saving him some
| valuable time.
|
| Of course people tried to find ways to reproduce the bug
| reliably, as saving even milliseconds can mean everything in a
| speed run. They went as far as replicating the state of the
| game from the original occurrence 1:1, but AFAIK no one has
| been able to reproduce the glitch without messing with the
| games memory.
|
| For that reason it is speculated that a cosmic ray caused a
| bit-flip in the byte that stores the players y coordinate,
| shooting him up into the air and onto the next platform.
|
| [0] - https://youtu.be/o3Cx2wmFyQQ?t=16
| DerekBickerton wrote:
| Before clicking I thought someone kept note of how many times
| Linux booted in regard to their computing habits, and not testing
| software. I know for me I boot roughly 3 times a day into
| different machines, do my work, shutdown, then rinse & repeat.
|
| Then you have those types who put their machine into
| hibernate/sleep with 100+ Chrome tabs open and never do a full
| boot ritual. Boggles my mind that people do that.
| Tubru3dhb22 wrote:
| > Boggles my mind that people do that.
|
| Why? I only restart my (linux) laptop every 3-4 months when I
| update software.
|
| I can't think of any downside that I've experienced from this
| practice. I do a lot of work with data loaded in a REPL, so
| it's certainly saved me time having everything restored to as I
| left it.
| bbarn wrote:
| I had a developer that I inherited from a previous manager some
| years ago. Made tons of excuses about his machine, the
| complexity of the problem, etc. I offered to check his machine
| out and he refused because it had "private stuff" on it. He had
| the same machine as the rest of the team, so since he hadn't
| made a commit in two weeks on a relatively simple problem,
| refused help from anyone, etc., we ultimately let him go.
|
| When we looked at his PC to see if there was anything useful
| from the project, his browser had around a thousand tabs open.
| Probably 80% of them were duplicates of other tabs, linking to
| the same couple stack overflow and C# sites for really basic
| stuff. The other 20% were... definitely "private stuff".
| hinkley wrote:
| I'm at the other extreme of "private stuff". Nothing work
| related should live on my work machine. It should all be
| pushed to git or dumped in the wiki (personal pages if
| nothing else).
|
| On one of my largest projects the IT dept made bulk orders
| for hardware and doled them out to new hires. 18 months into
| our new project someone's hard drive died.
|
| Everyone acted like his dog died. I said no problem let's go
| through the onboarding docs. The longest step by far was that
| the company mandated Whole Disk Encryption but IT hadn't put
| it in their old inventory yet. So that was 2/3 of setup time.
| We found some issues with the docs and fixed them.
|
| Every two to four weeks that summer, someone else's drive
| would go. You see, we got all of these machines from the same
| production run. So the hard drives came from the same
| production run, which was apparently faulty. The process got
| a little faster as we went. By the end of the summer it was
| my turn, and people still looked at me like I needed
| condolences. I got a faster machine for a few hours worth of
| work. I'm not sad. All my stuff was in the network already. I
| lost a couple hours' of work, tops.
| opello wrote:
| This is the best way to reduce bus factor and not fall
| behind documenting key details!
| teachrdan wrote:
| > Nothing work related should live on my work machine.
|
| I thought this was a typo at first. Love this as an
| engineering koan.
| noSyncCloud wrote:
| And the corollary - nothing personal should be on your
| work machine, either
| canucker2016 wrote:
| "Nothing work related should live ONLY on my work
| machine." is the intent.
| sureglymop wrote:
| He was let go after two weeks? No confrontation nothing?
|
| Sounds very american. In European working culture if you
| don't show up for two weeks people will be worried that
| something happened to you and try to work it out with you.
| This type of all or nothing reaction is a bit sporadic imo.
| mikestew wrote:
| _Sounds very american._
|
| Yeah, it's not like that part of the story was condensed
| and might have left out a bunch of details that weren't
| important to the story. So let's give OP a hard time and
| make judgements about a situation for which we have not
| even the slightest bit of context.
| sureglymop wrote:
| Oh absolutely, you're right. I am saying that despite
| whatever may have happened, two weeks is very short. I
| feel like it would be at least a month here regardless.
| RandallBrown wrote:
| He was let go after two weeks of not doing any work,
| despite the manager offering to help him.
| JohnFen wrote:
| > he refused because it had "private stuff" on it.
|
| There's a huge red flag. "Private stuff" (embarrassing or
| otherwise) shouldn't be on company machines in the first
| place.
| dijit wrote:
| I agree completely.
|
| However if anyone touches my computer: don't you dare
| f*%king touch my private key.
|
| (ditto for my browsers sessions database, google cloud
| credentials directory etc;)
|
| I'm paranoid about it, but not enough to buy a yubikey,
| apparently.
| lostlogin wrote:
| > However if anyone touches my computer: don't you dare
| f*%king touch my private key.
|
| Touch the computer, sure, but please don't touch the
| screen with your filthy grease fingers.
| mdpye wrote:
| My work laptop has a touchscreen. I've never used it, but
| other people use it by accident fairly often. Usually
| only once each though, the look of shock is sometimes
| even worth the fingerprint :D
| JohnFen wrote:
| I'm unusually strict about maintaining a separation
| between work and personal (for instance, I would never
| allow my personal smartphone to connect to my employer's
| WiFi), so I wouldn't use personal keys on a work machine
| at all.
|
| But if those keys (or passwords, etc.) are generated for
| work purposes, I consider them to be as much company
| property as the machine itself, so I'm no more protective
| of them than I am of any other sensitive company data.
| dijit wrote:
| Interesting thought.
|
| How do you feel about giving your colleague your
| password?
|
| My personal opinion is that I can hold someone legally
| culpable if _their account_ does something like leak
| financial information; you have a professional
| responsibility to secure your account from absolutely
| everyone.
|
| Administrators acting on your account must of course be
| heavily logged and audited, which is the case.
| JohnFen wrote:
| > How do you feel about giving your colleague your
| password?
|
| I usually don't, mostly just out of good security habits,
| but also because most employers specifically prohibit
| doing that.
|
| Almost always, your colleague can be given his own access
| to whatever the password is for anyway. If that's not
| possible, then I'll share the password and change it
| immediately after my colleague doesn't need access
| anymore.
|
| > you have a professional responsibility to secure your
| account from absolutely everyone.
|
| I agree -- that's part of treating credentials the same
| way as all other sensitive company data. But it's still
| my employer's data, not mine.
|
| If I quit the company or if my supervisor wants to see
| the contents of my machine, I'm fine with that. The
| machine and everything on it belongs to the company
| anyway.
| chucksmash wrote:
| > If I quit the company or if my supervisor wants to see
| the contents of my machine, I'm fine with that. The
| machine and everything on it belongs to the company
| anyway.
|
| I'm fine with that, but I still will not share my
| passwords. I'd be happy to reset the passwords for them
| if they can't access the data by other means, but as
| another commenter pointed out, the fact that anything
| needs to be recovered from my^H^H _not my_ laptop
| indicates mistakes were made.
| StillBored wrote:
| Isn't this largely the point of company directory
| services? The machines/routers/applications/etc are all
| doing their authentication against the directory service,
| and permissions are granted and revoked there. Its a
| large part of running a company with more than a couple
| employees because when someone leaves you don't need to
| run around changing passwords and wondering if they still
| have access to the AWS account to spin stuff up, or punch
| through the VPN. The account in the directory service is
| just deactivated and with it all access.
|
| By default this should be what is happening on all but
| the most ephemeral of machines/testing platforms/etc. And
| even then if its a formal testing system it should
| probably be integrated too.
|
| Directory service integration BTW is the one feature that
| clearly delineates enterprise products from the rest.
| dijit wrote:
| Ok, but your private key, session tokens and CLI access
| tokens (kube configs, gcloud etc;) _are_ your password in
| those situations.
|
| They tie to your identity, thus you must not treat them
| the same as company secrets, they are professional
| _personal_ secrets which should not be disclosed or
| allowed to fall into anyone elses hands (less they be
| revoked and cycled).
|
| It's not just good security posture it could affect your
| career quite badly or lead to legal issues.
| JohnFen wrote:
| I agree. I don't think I've said anything counter to that
| (or perhaps I wasn't being clear?)
|
| > thus you must not treat them the same as company
| secrets, they are professional personal secrets
|
| They are company secrets that are tied to my identity.
| The company owns those secrets, not me. Just like my
| keycard to get into the building.
| dijit wrote:
| > I agree. I don't think I've said anything counter to
| that (or perhaps I wasn't being clear?)
|
| I think given the context of the thread (don't touch my
| secrets), saying that you don't have anything you would
| consider confidential towards your employer or colleagues
| is a direct contradiction to what I stated.
|
| That's why I'm "arguing" because my employer/colleagues
| should not have access to my private key, ever.
| JohnFen wrote:
| Ah, OK. Then we do disagree to an extent.
|
| There are several very legitimate times when my employer
| needs to have access to my keys. If I'm leaving the
| company, for an obvious instance.
|
| But my core point is that such keys/passwords aren't
| really mine, they're the company's and in the end, the
| company gets to decide what I'm to do with them.
|
| I think the building access keycard is a perfect analogy.
| I'd never let anyone borrow mine on my own volition, but
| if the company wants to retrieve it from me, that's their
| prerogative. It's theirs, after all.
| brazzledazzle wrote:
| If an employer needs someone's particular keys something
| probably went wrong or there's bad processes in place.
| But that aside I think the default course of action
| should be to aggressively guard your secrets and tokens
| since they represent you. Not as personal or private
| property but to keep someone (be it a fellow employee or
| a 3rd party attacker) from impersonating you without
| authorization.
|
| There are exceptions but the circumstances where an
| employer would need to retrieve my keys without my
| assistance are extremely rare and in those instances it's
| unlikely I'd still be an employee anyway.
| dijit wrote:
| We disagree.
|
| The handing of the keycard is necessary to ensure it's
| destroyed and can't be used as a "proof" you work
| somewhere (most access cards these days have your name,
| face and the company logo printed on the front).
|
| The keycard will be removed from the access list to the
| building even when it's destroyed, they're not considered
| reusable by most companies.
|
| Your private key is not reusable, it should be destroyed
| and revoked from all system when you leave a company.
| lmm wrote:
| We could destroy the keycard with both parties present,
| that seems safest. I don't mind turning in a private key
| permanently and getting a receipt at the time, but it
| needs to be very clear that it's no longer my
| responsibility.
| JohnFen wrote:
| > but to keep someone (be it a fellow employee or a 3rd
| party attacker) from impersonating you without
| authorization.
|
| Aside from a third party attacker (which is well-covered
| by my normal practices), that's a threat model that I'm
| personally not worried about at all, really. In part
| because I've never seen or heard of that happening and in
| part because if it did, I am confident that there are
| enough records to be able to prove it.
| ryanjshaw wrote:
| I used to shutdown regularly, then the power situation here in
| South Africa got so bad that we'd regularly have about 3 hours
| of power between interruptions.
|
| Restoring all my work every couple of hours was becoming a
| pain, so I decided to re-enable hibernation support on Windows
| for the first time in 10 years... And surprisingly it works
| absolutely flawlessly.
|
| Even on my 12yr old hardware, even if I'm running a few virtual
| machines. I honestly haven't seen any reason to reboot other
| than updates.
| lelanthran wrote:
| > I used to shutdown regularly, then the power situation here
| in South Africa got so bad that we'd regularly have about 3
| hours of power between interruptions.
|
| I'm in SA too, and I used to have 100s of days uptime (one
| even over a year and a half) ... until the regular blackouts.
|
| Had to stop using a desktop, I've resigned myself to using a
| laptop, purely so that I don't have to boot the thing all the
| time and lose my context.
| pessimizer wrote:
| This thread is like reading that someone is shocked that
| other people don't burn their beds every morning after they
| wake up.
| rmbyrro wrote:
| I get anxious just to think that restoring from
| sleep/hibernation may fail and I lose all my workspace state...
|
| If there was no boot failure, nor the need to reboot after some
| upgrade, I'd never, ever reboot my system.
| eertami wrote:
| Sleep uses almost 0 power and works flawlessly. I'm never going
| to waste my time, however short, waiting for a machine to boot.
| vbezhenar wrote:
| I think that there are two types of people. One set of people
| (I guess, relatively small) don't trust software and prefer to
| reboot OS and even periodically reinstall it to keep it
| "uncluttered". Another set of people prefer to run and repair
| it forever.
|
| I'm from the first set of people and the only reason I stopped
| shutting down my macbook is because I'm now keeping its lid
| closed (connected to display) and there's no way to turn it on
| without opening a lid which is very inconvenient. I still
| reboot it every few days, just in case.
| ComputerGuru wrote:
| I'm in the second group (avoid reboots like the plague) but
| for the reason you attribute to the first: I never trust that
| my Windows machine - currently working - will reboot
| successfully and into the same working condition between OS
| update regressions, driver issues, etc.
| coldtea wrote:
| > _Then you have those types who put their machine into
| hibernate /sleep with 100+ Chrome tabs open and never do a full
| boot ritual. Boggles my mind that people do that._
|
| If the OS and hardware drivers properly support sleep, you
| almost never need to do otherwise (except to install a new
| kernel driver or similar).
|
| In macOS for example it hasn't been the case that you need
| reboot in your regular OS use for over 10+ years.
|
| The "100+ Chrome tabs" or whatever mean nothing. They're paged
| out when not directly viewed anyway, and if you close just
| Chrome (not reboot the OS) the memory will be freed in any
| case...
| [deleted]
| moron4hire wrote:
| > If the OS and hardware drivers properly support sleep...
|
| That's like the biggest of big IFs.
| tom_ wrote:
| I've found sleep very reliable on macOS, and both sleep and
| hibernate reliable on Windows.
|
| I once had my work PC unhibernate and not pop up the login
| box. The computer appeared to be running normally
| otherwise; I just couldn't log in, and I had to tap the
| power button to shut it down. This stuck in my mind due to
| its rarity.
|
| Can't remember ever having a serious issue on macOS. A
| couple of my programs sometimes don't survive the
| sleep/wake cycle, but it's intermittent, and I'm always in
| the middle of something else when it happens. I've never
| lost any meaningful work.
| andrekandre wrote:
| > Can't remember ever having a serious issue on macOS.
|
| macos is fine for the most part, but there are some edge
| cases, such as some sketchy corporate required "security
| software" that eats up kernel memory or cpu for some
| unknown reason, a reboot can fix performance issues there
|
| also if you are a dev and apps (like xcode, android
| studio etc) fill your drive with cache files* or have
| weird background daemons that eat up cpu, at the least a
| logout/login (or a reboot) can fix some of those eierd
| things
|
| you could manually delete them without a reboot but ymmv
| tasuki wrote:
| > Boggles my mind that people do that.
|
| Why?
|
| It boggles my mind that you'd reboot needlessly. My uptime is
| usually in the hundreds of days.
|
| Sleep is good: I just close the lid. Next time I open the lid
| it immediately picks up where I left off. _Why_ on earth would
| you want any other behaviour?
| 2b3a51 wrote:
| Full drive encryption on Linux.
|
| I close down my laptop when I'm moving around or when I leave
| it somewhere while I'm in another part of the building.
| tom_ wrote:
| I reboot most weeks, just to make sure the right stuff
| happens when I do. (I try to do it in the middle of the day,
| so there's time to sort out any matters arising.)
|
| A couple of times I've discovered I've forgotten to set stuff
| to auto-run on login, or things turn out to have lost their
| settings, or stuff doesn't work for whatever reason - I'd
| much rather discover this at a time of my own choosing!
| rolandog wrote:
| Security-wise: encryption at rest? In high security scenarios
| you may be required to shutdown so you're forcing "attackers"
| to go through several layers: motherboard password, disk
| password, encryption password, OS user password + 2FA, etc.
| JohnFen wrote:
| On my personal machines? I don't shut them down or reboot
| very often.
|
| At work, however, I have to use Windows. In that case, I shut
| it down at the end of every workday, in part because that
| prevents weird issues Windows tends to develop when running
| too long.
|
| Mostly, though, it's because of those damned forced updates.
| Since I can't trust Windows to not reboot itself at any
| random point in time, having the habit of shutting down at
| the end of the day at least ensures that I won't accidentally
| lose my state overnight or over the weekend.
| tom_ wrote:
| How to stop Windows installing updates behind your back:
| https://news.ycombinator.com/item?id=18157968
|
| If you don't/won't/can't use the group policy editor, I got
| a lot of mileage out of hibernating the PC and powering it
| off at the mains. You can't leave it running something
| overnight, but you can at least quickly get back to exactly
| where you left things the previous day.
|
| (Powering it off at the mains ensures that even if you have
| a device connected that could wake the PC up - thus putting
| your computer in a state where WIndows Update can reboot it
| - it can't. You can turn this feature off on a per-device
| basis with powercfg, but then one day you'll plug something
| new in and leave it plugged in and it'll wake the PC up
| while you're away and Windows Update will do its thing.)
| jameson71 wrote:
| Security patching?
| pessimizer wrote:
| What do you need to reboot to patch other than the kernel?
| I just restart things.
| cannonpalms wrote:
| Can all be done online, no?
| mcculley wrote:
| A long time ago, I had desktops with huge uptimes. The world
| has changed. I will no longer go that long without a security
| update. Too much is now passing through my machine.
| sieabahlpark wrote:
| I just have it running 24/7 and never restart for weeks. I
| don't even have the 100 tab problem, I just like having the
| immediate availability without waiting for startup.
| 5e92cb50239222b wrote:
| Unless you're on solar, does wasting electricity not bother
| you? I used to seed a lot of stuff for years (with typical
| uptime measured in months), but the CO2 impact, however tiny
| it is in the grand scheme of things, does not seem to worth
| it anymore.
| sieabahlpark wrote:
| [dead]
| pessimizer wrote:
| If you're shutdown or hibernating, is the power draw
| anything compared to a lightbulb?
| Hikikomori wrote:
| My desktop uses 2w in sleep mode. Likely less if i disable
| the motherboard RGB.
| aeyes wrote:
| > Boggles my mind that people do that.
|
| :( I only reboot when my machine freezes or when updates
| require a reboot. I did a lot of on-call in my life and I saved
| tons of time by leaving everything open exactly as I left it
| during the day. ~> w 11:19 up 18 days,
| 17:03, 9 users, load averages: 3.87 2.96 2.39
| ComputerGuru wrote:
| You haven't properly kept a machine alive until the clock
| rolls over.
|
| I logged into a firewalled Windows VM on EC2 that's been
| running an internal micro service that was acting up and it
| caught my eye that task manager showed an uptime of 6 days
| making my mind immediately think it might be a bug caused by
| the recent reboot or perhaps the update that triggered it.
|
| It turns out no reboot had taken place and in fact, the
| uptime counter had merely rolled over - and not for the first
| time! Bug was unrelated to the machine and it's still (afaik)
| ticking merrily away.
|
| (Our `uptime` tool for Windows [0] reported the actual time
| the machine was up correctly.)
|
| [0]: https://neosmart.net/uptime/
| exikyut wrote:
| Okay, what was the actual uptime? :) (:E)
| andrewaylett wrote:
| Conversely, it boggles _my_ mind that people think 100+ tabs is
| a lot. I 've got >500 open in Firefox at the moment, they won't
| go away just because I reboot or upgrade. I'll probably not
| look at most of them again, but they're not doing any harm just
| sitting there waiting to be cleaned up.
| db48x wrote:
| That's because in Firefox an open tab that you haven't
| recently viewed uses no memory.
| drbawb wrote:
| >Then you have those types who put their machine into
| hibernate/sleep with 100+ Chrome tabs open and never do a full
| boot ritual.
|
| I would never suspend to RAM or disk, far too error-prone in my
| experience. (Plus serializing out 128GiB of RAM is not great.)
| I just leave my machine running "all the time." My most
| recently retired disks (WD Black 6TB) have 309 power cycles
| with ~57,382 power-on hours. Seems like that works out to
| rebooting a little less than once per week. That tracks: I
| usually do kernel updates on the weekend, just in case the
| system doesn't want to reboot unattended.
| trashburger wrote:
| > Then you have those types who put their machine into
| hibernate with 100+ Chrome tabs open and never do a full boot
| ritual. Boggles my mind that people do that.
|
| Hey, I'm that guy (although I put it to sleep instead)! It
| honestly works really well and is in stark contrast to how
| Linux and sleep mode interacted just ~10 years ago. It's
| amazing for keeping your workspace intact.
|
| (FWIW, I also don't reboot or shutdown my desktop where it acts
| as a mainframe for my "dumb" laptop.)
| bregma wrote:
| > Boggles my mind that people do that. $
| uptime 15:39:13 up 359 days, 2:02, 16 users, load
| average: 0.09, 0.08, 0.15
|
| 16 users is 16 tmux sessions, all me doing different tasks.
| exikyut wrote:
| _[Cries in outdated kernel]_
|
| One of the fascinating curiosities you're missing out on is
| Pressure Stall Information
| (https://docs.kernel.org/accounting/psi.html). Here's what
| the PSI gauges look like in htop when kernel support is
| available: PSI some CPU: 0.37% 0.78%
| 1.50% PSI some IO: 0.38% 0.33% 0.25% PSI
| full IO: 0.38% 0.31% 0.23% PSI some memory:
| 0.02% 0.04% 0.00% PSI full memory: 0.02% 0.04%
| 0.00%
| jchw wrote:
| I have found that my MicroPC fails on some newer kernels: when
| GDM starts up, the machine locks up and the LCD goes wonky. I'm
| not particularly looking forward to the bisect, but at least it
| won't take 292,612 reboots.
| StillBored wrote:
| I some ways an early boot kernel only failure is easier. Late
| boot failures like that, could just as well have been something
| changing in wayland/X/gdm/mesa/dbus/whatever at the same time.
| And then if it turns out everything but the kernel is constant,
| its easy to take a wild guess and look for something in say the
| DRM/GPU driver in use vs the entire kernel. Although last time
| I did that turns out it wasn't even in the GPU specific code
| but a refactoring in the generic display mgmt code. Still ended
| up doing a bisect across like 5 kernel revisions after
| everything else failed. Which points to the fact that if linux
| had a less monolithic tree it would be possible to a/b test
| just the kernel modules and then bisect their individual trees,
| rather than adjusting each bisect point to the closest related
| commit if your sure its a driver specific problem. There is a
| very good chance that if say a particular monitor config + GPU
| stops working on my x86, the problem is likely in /drivers/gpu
| rather than all the commits in arch/riscv that are also mixed
| into the bisect. Ideally the core kernel, arch specific code,
| and driver subystems would all be independent trees with
| fixed/versioned ABIs of their own. That way one could upgrade
| the GPU driver to fix a bug without having to pull forward
| btrfs/whatever and risk breaking it.
| jchw wrote:
| Since I'm in NixOS, I can at least emphatically confirm it is
| JUST the kernel.
|
| Though, given the way the LCD panel wonks out, I'm actually
| concerned it's power management related. It looks like what
| happens to an LCD panel when the voltage goes too low. (Or at
| least, I think that's what that effect is, based on what I've
| seen with other weird devices with low battery.) Since
| MicroPC is x86, though, I doubt the kernel is driving any of
| the voltages too directly, so who knows.
| rjmunro wrote:
| I wonder if bisect is the optimal algorithm for this kind of
| case. Checking for the error still existing still takes an
| average of [?]500 iterations before a fail, checking for the
| error not existing takes 10,000 iterations, 20 times longer, so
| maybe biasing the bisect to only skip 1/20th of the remaining
| commits, rather than half of them would be more efficient.
| pacaro wrote:
| Biasing a binary search would only be beneficial if you know
| something about the distribution of the search space
| bgirard wrote:
| If the factor in one direction is large enough then a linear
| search becomes more efficient. Say you have 20 commits
| remaining and the factor is 1,000x more costly to make it
| easier to picture. You're better off doing a linear search
| which guarantees you'll spend less than 2,000x searching the
| space.
|
| That suggests that for a larger search space with a large
| enough difference, the optimal bisection point is probably
| not always the midpoint even if you know nothing about the
| distribution.
|
| Perhaps someone can find the exact formula for selecting the
| next revision to search?
| jwilk wrote:
| > You're better off doing a linear search which guarantees
| you'll spend less than 2,000x searching the space.
|
| _Almost_. If only the last commit is slow, binary search
| is still faster.
| bgirard wrote:
| > better off
|
| Better off as in expected/average case. Good point, but
| only marginally better in the worse case.
| electroly wrote:
| There's an additional stopping problem here that isn't
| present in a normal binary search. Binary search assumes you
| can do a test and know for sure whether you've found the
| target item, a lower item, or a higher item. If the test
| itself is stochastic and you don't know how long you have to
| run it to get the hang, I'd think you'd get results faster by
| running commits randomly and excluding them from
| consideration when they hang. Effectively, you're running all
| the commits at the same time instead of working on one commit
| and not moving on until you've made a decision on it. Then at
| any time you will have a list of commits that have hanged and
| a list of commits that have not hanged yet, and you can keep
| the entire experiment running arbitarily long to catch the
| long-tail effects rather than having to choose when to stop
| testing a single non-hanging commit and move onto the next
| one.
| pacaro wrote:
| I can see some interesting approaches here. Given n
| threads/workers you could divide the search space into n
| sample points (for simplicity let's divide it evenly) and
| run the repeated test on each point. When a point hangs,
| that establishes a new upper limit, all higher search
| points are eliminated, the workers reassigned in the
| remaining search space.
|
| Given the uncertainty I can see how this might be more
| efficient, especially if the variance of the heisenbug is
| high.
| mortehu wrote:
| Each boot updates your empirical distribution. As a trivial
| example, if you have booted a version 9999 times with no
| hanging, a later version will likely give you more
| information per boot.
| coldtea wrote:
| Still, why would they need to reboot 292,612 times?
|
| Is that supposed to be the log of the commit messages space?
| remram wrote:
| If they boot it 10,000 times for revisions that don't fail,
| and ~1,000 times for revisions that do fail, you can reach
| this number with log2(revisions) about 30.
| x86x87 wrote:
| read the article. they booted so many times to show that it
| was not reproducing. it's overkill but you don't need to boot
| 200k times
| rwmj wrote:
| I didn't mention it in the blog, but Paolo Bonzini was
| helping me and suggested I run the bootbootboot test for 24
| hours, to make sure the bug wasn't latent in the older
| kernel. I got bored after 21 hours, which happened to be
| 292,612 boots.
|
| Maybe it would have failed on the 292,613rd boot ...
| quickthrower2 wrote:
| I think your p value is pretty good here
| opello wrote:
| I've been on a similar quest for hard to reproduce,
| timing/hardware/... bugs, and if you're facing any kind
| of skepticism (your own or otherwise) it can be very
| comforting to have a 10x or even 100x no failure occurred
| confidence.
|
| It's particularly comforting when the reason for the
| failure/fix/change in behavior isn't completely
| understood.
| bsilvereagle wrote:
| If the bug occurs reasonably often, say usually once
| every 10 minutes, you can model an exponential
| distribution of the intervals between the bug triggering
| and then use the distribution to "prove" the bug is fixed
| in cases where the root cause isn't clear:
| https://frdmtoplay.com/statistically-squashing-bugs/
| ajb wrote:
| There is actually a bayesian version which I wrote:
| https://github.com/ealdwulf/bbchop
|
| Basically it calculates the commit to test at each step which
| gains the most information, under some trivial assumptions. The
| calculation is O(N) in the number of commits if you have a
| linear history, but it requires prefix-sum which is not O(N) on
| a DAG so it could be expensive if your history is complex.
|
| Never got round to integrating it into git though.
| muxator wrote:
| Hidden gem! Thanks!
| defen wrote:
| That's a cool idea. Would also be interesting to consider the
| size of the commit - a single 100-line change is probably
| more likely to introduce a bug than 10 10-line changes.
| phist_mcgee wrote:
| You haven't met the developers at my last company.
| [deleted]
| dumbaccount123 wrote:
| [flagged]
| TechBro8615 wrote:
| This reminded me of another story [0] (discussed on HN [1]) about
| debugging hanging U-Boot when booting from 1.8 volt SD cards, but
| not from 3.0 volt SD cards, where the solution involved a kernel
| patch that actually _introduced_ a delay during boot, by
| "hardcoding a delay in the regulator setup code
| (set_machine_constraints)." (In fact it sounded so similar that I
| actually checked if that patch caused the bug in the OP, but they
| seem unrelated.)
|
| The story is a wild one, and begins with what looks like a patch
| with a hacky workaround:
|
| > The patch works around the U-Boot bug by setting the signal
| voltage back to 3.0V at an opportune moment in the Linux kernel
| upon reboot, before control is relinquished back to U-Boot.
|
| But wait... it was "the weirdest placebo ever!" Turns out the
| only reason this worked was because:
|
| > all this setting did was to write a warning to the kernel
| log... the regulator was being turned off and on again by
| regulator code, and that writing that line took long enough to be
| a proper delay to have the regulator reach its target voltage.
|
| The full story is well worth a read.
|
| [0]
| https://kohlschuetter.github.io/blog/posts/2022/10/28/linux-...
|
| [1] https://news.ycombinator.com/item?id=33370882
| headline wrote:
| Very interesting, I wonder the _why_
| mgsouth wrote:
| Disclaimer: not a kernel dev, opinion based upon very cursory
| inspection.
|
| The patch references the "scheduler clock," which is a high-
| speed, high-resolution monotonic clock used to schedule future
| events. For example, a network card driver might need to reset
| a chip, wait 2 milliseconds, and then do another initialization
| step. It can use the scheduler to cause the second step to be
| executed 2 milliseconds in the future; the "scheduler clock" is
| the alarm clock for this purpose.
|
| Measuring the "current time" is pretty complicated when you're
| dealing with multiple-core variable-frequency processors, need
| a precise measurement, and can't afford to slow things down.
| The "scheduler clock" code fuses together time sources and
| elapsed-time indicators to provide an estimated current time
| which has certain guarentees (such as code running a particular
| core will never see time go backwards, it will be accurate
| within particular limits, and it won't need global locks). The
| sources and elapsed-time indicators it has available varies by
| computer architecture, vendor, and chip family; therefore the
| exact behavior on an Intel core 5 will differ from that of an
| Arm M7.
|
| The patch in question changes the behavior of local_time();
| this is the function used by code which wants to know what the
| current time is on its particular core. The patch tries to make
| local_time() return a sane value if the schedule clock hasn't
| been fully initialized but is at least running.
|
| As you can imagine, there a lot of things that can go wrong
| with that. I _think_ the problem is that
| sched_clock_init_late() is marking the clock as "running"
| before it should. I could very well be wrong. Regardless, it's
| pretty clear that there's some kind of architecture-dependent
| clock initialization race condition that once in a while gets
| triggered.
| cryptonector wrote:
| Great thinking. I'll also note that `sched_clock_register()`
| uses `pr_debug()`, which can be an alias of `printk()`,
| though I don't think that's it.
| rwmj wrote:
| If anyone would like to try reproducing the bug, I have a fairly
| solid reproducer here:
|
| https://lore.kernel.org/lkml/20230614173430.GB10301@redhat.c...
|
| You will need a vmlinux or vmlinuz file from Linux 6.4 RC.
|
| If these are the last two lines of output then congratulations
| you reproduced the bug: [ 0.074993] Freeing
| SMP alternatives memory: 48K *** ERROR OR HANG ***
|
| You could also try reverting f31dcb152a3 and rerunning the test
| to see if you get through 10,000 iterations.
| Twirrim wrote:
| I've been having flashbacks to troubleshooting some
| particularly thorny unreliable boot stuff several years ago. In
| the end tracked that one down to the fact that device order was
| changing somewhat randomly between commits (deterministically,
| though, so the same kernel from the same commit would always
| have devices return in the same order), and part of the early
| boot process was unwittingly dependent on particular network
| device ordering due to an annoying bug. The kernel has never
| made any guarantees about device ordering, so the kernel was
| behaving just fine.
|
| That one was.. fun. First time I've ever managed to identify
| dozens of commits widely dispersed within a large range, all
| seem to be the "cause" of the bug, while clearly having nothing
| to do with anything related to it, and having commits all
| around them be good :)
| chenxiaolong wrote:
| I gave that reproducer a try and it failed after 1968
| iterations.
|
| * CPU: Intel(R) Core(TM) i9-9900KS
|
| * qemu: qemu-kvm-7.2.1-2.fc38.x86_64
|
| * host kernel: 6.3.6-200.fc38.x86_64
|
| * guest kernel: 6.4.0-0.rc6.48.fc39.x86_64 (grabbed latest from
| mirrors.kernel.org/fedora since fedoraproject.org DNS is down
| and I can't access koji)
|
| Log: <...> 1966... 1967... 1968...
| [ 0.075343] LSM: initializing
| lsm=lockdown,capability,yama,bpf,landlock,integrity [
| 0.075514] Yama: becoming mindful. [ 0.075514] LSM
| support for eBPF active [ 0.075514] landlock: Up and
| running. [ 0.075514] Mount-cache hash table entries:
| 4096 (order: 3, 32768 bytes, linear) [ 0.075514]
| Mountpoint-cache hash table entries: 4096 (order: 3, 32768
| bytes, linear) [ 0.075514] x86/cpu: User Mode
| Instruction Prevention (UMIP) activated [ 0.075514]
| Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [
| 0.075514] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
| [ 0.075514] Spectre V1 : Mitigation: usercopy/swapgs
| barriers and __user pointer sanitization [ 0.075514]
| Spectre V2 : Mitigation: Enhanced / Automatic IBRS [
| 0.075514] Spectre V2 : Spectre v2 / SpectreRSB mitigation:
| Filling RSB on context switch [ 0.075514] Spectre V2
| : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT
| [ 0.075514] RETBleed: Mitigation: Enhanced IBRS [
| 0.075514] Spectre V2 : mitigation: Enabling conditional
| Indirect Branch Prediction Barrier [ 0.075514]
| Speculative Store Bypass: Mitigation: Speculative Store Bypass
| disabled via prctl [ 0.075514] TAA: Mitigation: TSX
| disabled [ 0.075514] MMIO Stale Data: Vulnerable:
| Clear CPU buffers attempted, no microcode [
| 0.075514] SRBDS: Unknown: Dependent on hypervisor status
| [ 0.075514] Freeing SMP alternatives memory: 48K ***
| ERROR OR HANG ***
|
| I'll try reverting f31dcb152a3 and testing again later. Happy
| to test anything else if needed.
| rwmj wrote:
| Yup, that's the bug. If it does away after reverting the
| commit, that would be interesting too. I don't have any other
| suggestions.
| chenxiaolong wrote:
| I tested with 6.4.0-0.rc6.48.fc39.x86_64 + f31dcb152a3
| revert and all 10000 iterations succeeded (same hardware
| and environment as my previous post).
|
| To guarantee that there's absolutely no other difference
| between the two tests, I took the source RPM, added the
| commit f31dcb152a3 diff + `%patch -P 2 -R`, and built the
| kernel RPM with mock.
| swordbeta wrote:
| I wasn't able to reproduce this with 10k iterations on arch,
| I'm probably doing something wrong. Does the host kernel
| matter?
|
| Host kernel: 6.1.33
|
| Guest kernel: 6.4-rc6
|
| Guest config: http://oirase.annexia.org/tmp/config-bz2213346
|
| QEMU: 8.0.2
|
| Hardware: AMD Ryzen 7 3700X CPU @ 4.2GHz
| [deleted]
| rwmj wrote:
| > Does the host kernel matter?
|
| Honestly I don't know! We've seen it appear with host kernel
| 6.2.15
| (https://bugzilla.redhat.com/show_bug.cgi?id=2213346#c5) but
| I'm not aware of anyone either reproducing or not reproducing
| it with earlier host kernels. All your other config looks
| right.
| garaetjjte wrote:
| vmlinuz-6.4.0-0.rc6.48.fc39.x86_64 failed on my 6.0.0 host
| after 249 iterations.
| rwmj wrote:
| We had another report that it happens on RHEL _8_ host,
| which is a very much older (franken) kernel.
| [deleted]
| allanrbo wrote:
| Running binary search on something that's flaky is a pain. "Noisy
| binary search" or "robust binary search" can help here:
| https://github.com/adamcrume/robust-binary-search
| hoten wrote:
| That README is light on details. How is this different from
| selecting some N (and hoping it is high enough) and repeating
| your test case that many times? You just don't have to select a
| value for N using this tool?
|
| EDIT: I missed the link to the white paper.
| IshKebab wrote:
| The paper lists the algorithm (which is relatively simple)
| but basically it is much more efficient than repeating test
| cases.
|
| You can see that that must be possible fairly easily.
| Consider two algorithms:
|
| 1. Classic binary search - test each element once and 100%
| trust the result.
|
| 2. Overkill - test each element 100 times because you don't
| trust the result one bit.
|
| The former will clearly give you the wrong result most of the
| time, and the latter is extremely inefficiency. There's
| clearly a solution that's more efficient without sacrificing
| accuracy in-between.
|
| Skimming the algorithm, it looks like they maintain Bayesian
| probabilities for each element being "the one" and then test
| an element 50% probability point each iteration, then update
| the probabilities accordingly. Basically a Bayesian version
| of the traditional algorithm.
| allanrbo wrote:
| Good explanation! And in the case of "I booted Linux 293k
| times in 21 hours" it wasn't just 100 times, it was 10,000
| :-)
| allanrbo wrote:
| You do still have to select an N, but it's not as critical
| that the N gives 100% guarantee of the flaky failure (which
| can be really difficult or even impossible to achieve).
| Unlike regular binary search, robust binary search doesn't
| permanently give up on the left or right half based on just a
| single result.
| NelsonMinar wrote:
| What a fantastic bug report writeup this is. Both the linked post
| and the backing LKML and QEMU bug report.
| [deleted]
| [deleted]
| sp332 wrote:
| To save anyone clicking through the email thread: there is no
| resolution in there so far.
| loeg wrote:
| Bisect points at this commit, even if the cause isn't known
| yet:
| https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
| [deleted]
| parentheses wrote:
| It makes sense n-sect (rather than bi-sect) as long as these can
| be run in parallel. For example, if you're searching 1000
| commits, a 10-sect will get you there with 30 tests, but only 3
| iterations. OTOH, a 2-sect will take more than 3x the time, but
| require 10 iterations.
|
| There's ofc always some sort of bayesian approach mentioned in
| other answers.
| eichin wrote:
| Yeah, I did a 4-way search like this on gcc back in the Cygnus
| days - way before git, and the build step involved "me setting
| up 4 checkouts to build at once and coming back in a few hours"
| so it was more about giving the human more to dig into at
| comparison time than actual computer time and usage. (It always
| amazes me that people _have_ bright-line tests that make the
| fully automated version useful, but I 've also seen "git bisect
| exists" used as encouragement to break up changes into more
| sensible components...)
| eknkc wrote:
| No disrespect to Peter Zijlstra, I'm sure he has been a lot more
| impactful on the open source community than I will ever be but
| his immmediate reply caught my attention:
|
| >> [Being tracked in this bug which contains much more detail: >>
| https://gitlab.com/qemu-project/qemu/-/issues/1696 ]
|
| > Can I please just get the detail in mail instead of having to
| go look at random websites?
|
| Maybe it's me but if I did boot boot linux 292.612 times to find
| a bug, you might as well click a link to a repository of a major
| open source project on a major git hosting service.
|
| Is it really that weird to ask people online to check a website?
| Maybe I don't know the etiquette of these mail lists so this is a
| geniune question. I guess it is better to keep all conversation
| in a single place, would that be the intention?
| dezgeg wrote:
| Many kernel people really are stuck in their ways like that.
| They don't want to leave their Mutt (e-mail client) at any
| cost. I recall some are even to this day running using a text
| console (ie. no X11 or Wayland).
| donalhunt wrote:
| Don't blame them. I'm fed up of browsers using gigs of ram to
| display kb of data. :(
| CommitSyn wrote:
| I am only guessing here, but I assume it's so the content of
| the mailing list archive remains. If a linked website goes down
| or changes at any time in the future, then that archive is no
| longer fulfilling its purpose of archiving important
| information.
| zxexz wrote:
| I'm pretty much 100% sure that's the reason, and a good one
| at that. Mailing lists are the lifeblood of a lot of big open
| source projects.
| cjsawyer wrote:
| This is the same logic in avoiding link-only answers on Stack
| Overflow. They're both good rules.
| sidfthec wrote:
| The irony being that he presumably wants more information on
| the mailing list to keep a good archive, while not giving
| enough information for people to understand that and follow
| the advice later.
| kevincox wrote:
| If that was the reason it would have been best to state that
| in the request.
|
| > Can I please just get the detail in mail so that it is
| archived with the list?
|
| Of course you can't expect every email written to be perfect,
| it is generally treated as an informal medium in these
| settings. But stating the reason helps people understand your
| motives and serve them better.
| enedil wrote:
| I think that hardcode kernel devs already know the reasons,
| and there is no point in raising it again. For you it might
| seem like a random requirement, but it's because of lack of
| familiarity.
| Szpadel wrote:
| i think in that case explaination is needed even more, if
| you are hardcore dev, then no one need to remind you
| about such rule, on the other hand if you are not so
| familiar with those rules yet, explanation would be very
| helpful
| actionfromafar wrote:
| Maybe it's so the mail threads keep the full records.
| aidenn0 wrote:
| My suspicion is that it's not about reading the bug info once,
| but having the information in the mailing-list, which is the
| archive of record for kernel bugs.
| dale_glass wrote:
| It's LKML. The volume of that list is insane, and technical
| discussion is very much the point, so they'd expect you to
| explain the problem right there, where people can quote parts
| of it, and comment on each part separately.
| nroets wrote:
| Many of the participants may also be reading it in a terminal
| emulator with no web browser nearby.
| _zoltan_ wrote:
| maybe those people should rethink how to do stuff in 2023.
| mulmen wrote:
| You're welcome to go tell the Linux kernel devs what they
| are doing wrong. Fuck around and find out as the kids
| say. Or start the Zolnux project and see how far that
| goes chasing shiny objects.
| owenmarshall wrote:
| Their software, their workflow. "Bend to it or pick
| something else" seems entirely fine to me.
| _zoltan_ wrote:
| this is not really true for open source, I think. since
| it's collaborative I think it's fair to expect people to
| be able to open a GitHub link
| snapcaster wrote:
| you're wrong. instead you should adopt the standards of
| the group you're attempting to join. Getting "tourist who
| complains about customs of country they visit" vibes from
| this comment
| owenmarshall wrote:
| I run OpenBSD on most of my systems. The OpenBSD
| development team collaborates using cvs instead of git
| because it fits their workflow well. If I wanted to
| collaborate with them, I'd use cvs too - and if I wanted
| to move them to git I'd do it _after_ becoming a core
| contributor, not before. If I 'm going to send bug
| reports & patches here and there, I'm going to do it in a
| way that makes it easy for Theo and team to review.
|
| This is very much a Chesterton's fence topic, I think.
| Linux developers have settled on a workflow that works
| for them, and if you want to get time from the people who
| are doing the bulk of the work it's fair to expect _you_
| to work within their requests.
| mulmen wrote:
| It's a gitlab link, not github. And it isn't reasonable
| in this context. GitHub hosts a lot of open source
| projects but it is not the only place where open source
| happens. That's kinda the point of open source, and
| especially of git.
|
| Git itself is a satellite project of the Linux kernel. It
| can work without the web at all. That someone EEE'd it so
| hard that even Microsoft couldn't resist is no reason to
| expect the kernel devs to change their workflow.
| rblatz wrote:
| Are they on a PDP-11 or a dumb terminal?
| treeman79 wrote:
| https://en.m.wikipedia.org/wiki/Lynx_(web_browser)
|
| Used this daily for many years. Was great when connecting
| to the internet was only practical via a shell.
| Dylan16807 wrote:
| Did you try it on this site?
|
| All of the comments/updates on the bug report are loaded
| by javascript and don't work for me in lynx or elinks.
| aabbcc1241 wrote:
| Do you mean hacker news as "this site"? HN seems to be
| server side rendered, so it should display well without
| Javascript.
| jwilk wrote:
| I think they meant <https://gitlab.com/qemu-
| project/qemu/-/issues/1696>.
| inetknght wrote:
| > _Are they on...?_
|
| I've met people who seriously do use dumb terminals and
| other people who have seriously discussed using a PDP-11.
|
| So, while your question might sound sarcastic, the answer
| is definitely yes.
|
| Nerds gonna nerd. Nothing wrong with that.
|
| I personally don't like going to gitlab or github because
| I don't like the businesses behind them. That's another
| point irrespective of whether I'm browsing in a terminal
| or ancient device.
| rwmj wrote:
| I was a bit short in the original description, but luckily
| we've since reached an understanding on how to try to reproduce
| this bug.
|
| Unfortunately he's not been able to reproduce it, even though I
| can reproduce it on several machines here (and it's been
| independently reproduced by other people at Red Hat). We do
| know that it happens much less frequently on Intel hardware
| than AMD hardware (likely just because of subtle timing
| differences), and he's of course working at Intel.
| mulmen wrote:
| Asking to click a link in an email is unreasonable in this
| context. The email list is the official channel and project
| participants are expected to use it. They are not expected to
| have a web browser. The popularity of the linked site is
| irrelevant. Part of filing good bug reports is understanding a
| project's communication style. A link to supplementary
| information is fine. But like a Stack Overflow answer the email
| should stand on its own.
| sigzero wrote:
| Yes, he should have just went and looked there. Github is not a
| "random website".
| mulmen wrote:
| The link is to gitlab, not github. But any website is
| inappropriate in this context because it's not permanent. The
| email list is, at least as far as the project is concerned.
| gfiorav wrote:
| I once had to bisect a Rails app between major versions and
| dependencies. Every bisect would require me to build the app, fix
| the dependency issues, and so on.
|
| And I thought I had it bad!
| hoten wrote:
| > For unclear reasons the bisect only got me down to a merge
| commit, I then had to manually test each commit within that which
| took about another day.
|
| Having hit this before myself... does anyone know how to finagle
| git bisect to be useful for non-linear history?
| voytec wrote:
| What was the title editorialized for, few hours after posting,
| with "21 hours" (not important, clickbait-ish)? It was not
| breaking any of guidelines[1] to my understanding.
|
| [1] https://news.ycombinator.com/newsguidelines.html
___________________________________________________________________
(page generated 2023-06-14 23:00 UTC)