https://utcc.utoronto.ca/~cks/space/blog/sysadmin/ChecklistsAreHardButGood Chris Siebenmann :: CSpace >> blog >> sysadmin >> ChecklistsAreHardButGood Welcome, guest. Checklists are hard (but still a good thing) July 19, 2025 We recently had a big downtime at work where part of the work was me doing a relatively complex and touchy thing. Naturally I made a checklist, but also naturally my checklist turned out to be incomplete, with some things I'd forgotten and some steps that weren't quite right or complete. This is a good illustration that checklists are hard to create. Checklists are hard partly because they require us to try to remember, reconstruct, and understand everything in what's often a relatively complex system that is too big for us to hold in our mind. If your understanding is incomplete you can overlook something and so leave out a step or a part of a step, and even if you write down a step you may not fully remember (and record) why the step has to be there. My view is that this is especially likely in system administration where we may have any number of things that have been quietly sitting in the corner for some time, working away without problems, and so they've slipped out of our minds. (For example, one of the issues that we ran into in this downtime was not remembering all of the hosts that ran crontab jobs that used one particular filesystem. Of course we thought we did know, so we didn't try to systematically look for such crontab jobs.) To get a really solid checklist you have to be able to test it, much like all documentation needs testing. Unfortunately, a lot of the checklists I write (or don't write) are for one-off things that we can't really test in advance for various reasons, for example because they involve a large scale change to our live systems (that requires a downtime). If you're lucky you'll realize that you don't know something or aren't confident in something while writing the checklist, so you can investigate it and hopefully get it right, but some of the time you'll be confident you understand the problem but you're wrong. Despite any imperfections, checklists are still a good thing. An imperfect written down checklist is better than relying on your memory and mind on the fly almost all of the time (the rare exceptions are when you wouldn't even dare do the operation without a checklist but an imperfect checklist tempts you into doing it and fumbling). (You can try to improve the situation by keeping notes on what was missed in the checklist and then saving or publishing these notes somewhere. You can review these after the fact notes on what was missed in this specific checklist if you have to do the thing again, or look for specific types of things you tend to overlook and should specifically check for the next time you're making a checklist that touches on some area.) (One comment.) Written on 19 July 2025. A logic to Apache accepting Realizing we needed two sorts of << query parameters for static alerts for our temperature >> files monitoring These are my WanderingThoughts (About the blog) Full index of entries Recent comments This is part of CSpace, and is written by ChrisSiebenmann. Mastodon: @cks [S:Twitter:S] @thatcks * * * Categories: links, linux, programming, python, snark, solaris, spam, sysadmin, tech, unix, web Also: (Sub)topics This is a DWiki. GettingAround (Help) Search: [ ] --------------------------------------------------------------------- Page tools: View Source, Add Comment. Search: [ ] Login: [ ] Password: [ ] [Login] Atom Syndication: Recent Comments. --------------------------------------------------------------------- Last modified: Sat Jul 19 23:09:43 2025 This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.