Post AMcnQYPEPbLm1FHIvY by RandomDamage@mastodon.technology
(DIR) More posts by RandomDamage@mastodon.technology
(DIR) Post #AMcc7Dia0QzOrIuGdE by wolf480pl@mstdn.io
2022-08-17T14:35:05Z
1 likes, 0 repeats
When you encounter a problem with a stateful system, do you:a) debug for hoursb) reset to the known-good state, destroying all evidence?
(DIR) Post #AMccOINxuCjsG2PG1g by quad@weeaboo.space
2022-08-17T14:38:11.683828Z
1 likes, 0 repeats
@wolf480pl c) know how your system actually works so you can fix it without wasting hoursdocument your shit and don't pile your projects full of tools you can't at least vaguely comprehend the workings off
(DIR) Post #AMccbt9wUbPy4fKShs by quad@weeaboo.space
2022-08-17T14:40:37.149953Z
0 likes, 0 repeats
@wolf480pl Unless you're talking about corruption-likely scenarios like a dying database. In that case do b) and restore a backup, which should not be particularly old if the job is done properly
(DIR) Post #AMcd9Ns1KaCOsXuydE by wolf480pl@mstdn.io
2022-08-17T14:46:41Z
0 likes, 0 repeats
@quad the really interesting bugs usually cross abstraction boundaries. Let's say you write a wayland compositor. You may know your code like the palm of your hand, but it turns out you need to chase the bug down into mesa, libdrm, and kernel. Because your code uses them in a slightly unusual way.IMO a bug that doesn't give me a "what the ACTUAL fuck" isn't interesting
(DIR) Post #AMcdLg2JcFCWh8axTE by quad@weeaboo.space
2022-08-17T14:48:44.808796Z
0 likes, 0 repeats
@wolf480pl Well, I typically work in IT. Where something breaking means someone can't work properly.So I have to do c)If c) is unavailable I have to establish an ETA for both a) and b), then pick between a) and b) on a case-by-case basis depending on ETA estimates.
(DIR) Post #AMcdkvloFsUIjQut9s by wolf480pl@mstdn.io
2022-08-17T14:53:28Z
0 likes, 0 repeats
@quad well in most cases I've seen, (b) is very fast, just restart the app / reboot the computer / kill the VM and wait for 2 minutes for kubernetes to get you a new one.The best solution I found so far is moving "prod" to a fresh instance and keeping the old one for further inspection. But there are few situations when this is available.
(DIR) Post #AMce8BfzHadqixHbHc by quad@weeaboo.space
2022-08-17T14:57:41.026360Z
0 likes, 0 repeats
@wolf480pl I typically end up doing (b) for cases where a proprietary piece of software I have little control over keeps hangs and there's little information available so (a)'s ETA is very high.Or things that are just known to be known unfixable bullshit. Like Windows's printing stack.
(DIR) Post #AMcnQYPEPbLm1FHIvY by RandomDamage@mastodon.technology
2022-08-17T16:41:49Z
0 likes, 0 repeats
@wolf480pl Depends.If it's urgent I grab what I can off the system and go from there to b). Otherwise I do a) until it becomes urgent or I find an answerNever underestimate the value of good logs.
(DIR) Post #AMdQwi9Pc2xZimCffE by sjb@mstdn.io
2022-08-18T00:04:38Z
0 likes, 0 repeats
@wolf480pl Always start with the thing that wastes the least time, probably (b)