[HN Gopher] Car alarms and smoke alarms: the tradeoff between se...
___________________________________________________________________
Car alarms and smoke alarms: the tradeoff between sensitivity and
specificity
Author : lngarner
Score : 52 points
Date : 2023-04-11 19:37 UTC (3 hours ago)
(HTM) web link (blog.danslimmon.com)
(TXT) w3m dump (blog.danslimmon.com)
| rtkwe wrote:
| It's a constant pain of mine to try to get people to stop having
| business as usual or successfully completed $PROCESS emails come
| out of our batch processes on our teams at work. They absolutely
| drown my inbox so I'm forced to filter them then the actual
| failures get buried in the unchecked "batch spam" folders.
| hevans66 wrote:
| My pet peeve is these $PROCESS notifications that go to slack
| channels. I worked at a company that had an #engineering_humans
| slack channel because we got chased out of #engineering by
| bots.
| justin_oaks wrote:
| I'm fine if they go to THEIR OWN slack channel. Then I can
| mute or leave that channel.
|
| Of course, it's a different problem if those notifications
| have a mix of actionable and non-actionable messages (e.g.
| both success and error messages). Then it's a signal/noise
| problem.
| WrtCdEvrydy wrote:
| The one that pushes buttons is the alarms that have no docs
| attached so when they blow off at 2AM, they just get muted
| until someone comes in and complains at 6AM.
| justin_oaks wrote:
| I had a boss who had an inbox with literally hundreds of
| thousands of unread emails. A good chunk of those emails were
| "success" messages from batch processes.
|
| It's quite correct to send a "success" message when a batch
| process is completed successfully, but it's quite wrong to send
| that message to a human. It should be sent to a machine that
| should translate a missing success message into an error
| message/alert for humans to respond to.
|
| For example, I have a set of nightly backup jobs. The last step
| of each backup process is to send a success message to my
| monitoring system. I only get a "Missing Backup" alert when the
| monitoring system detects that it didn't receive the success
| message it expected for a particular backup.
|
| My old boss didn't seem to understand the concept that people
| don't generally notice missing messages. Or he was too
| lazy/incompetent to use a monitoring system that could
| translate gaps in successes into errors.
| sammalloy wrote:
| The entire car alarm industry is a scam, promoted by Republican
| congressman Darrell Issa. It has seriously disrupted our lives in
| every way imaginable and has drowned out the beauty of nature. I
| can't think of a single car that has been protected by a car
| alarm since they were invented. They are useless and should be
| banned for the health and safety of mitigating noise pollution.
| GuB-42 wrote:
| > I can't think of a single car that has been protected by a
| car alarm since they were invented.
|
| Many insurance companies offer lower premiums if you install a
| car alarm. So I guess they work at least a little, otherwise
| they wouldn't lower their premiums.
|
| It may not actually stop a thief, but it may get a thief to
| chose a car that doesn't have an alarm, or maybe it is just a
| correlation, but there is at least something.
|
| Still, I think they should be made illegal, they are a
| nuisance, there are already laws against making excessive noise
| and car alarms should be included. And if they create an arms
| race, by getting thieves to prefer cars without alarms, that's
| even more reason to ban them.
| BalinKing wrote:
| > It has seriously disrupted our lives in every way imaginable
|
| I assume this is one of those things that changes dramatically
| based on where you live--for me (western US), this statement
| seems almost comically exaggerated.
| dahfizz wrote:
| Yeah, I can't remember the last time I heard a car alarm.
| birdyrooster wrote:
| Visit a city
| Eji1700 wrote:
| I live in one with more than 2 million people and it's
| something that I have to think hard about to remember the
| last time I heard one go off.
|
| Longer if I have to think of one that went off and wasn't
| some form of 'oh shit oh shit oh shit, wrong button'
| reaction from the person trying who accidentally turned
| it on.
| tayo42 wrote:
| I am jealous, I hear atleast one everyday. I live in an
| apartment complex in a suburb. I get woken up by them
| sometimes too
| jimbob45 wrote:
| For poor people whose ability to live depends on having a car,
| car alarms must be at least _sort_ of useful to know if your
| car is being stolen at night. I'm sure they're just a noisy
| inconvenience to the wealthy though.
| izacus wrote:
| Does the alarm ever prevent theft?
| thrashh wrote:
| If I'm living in an area where they don't go off often
| (like right now) and a car alarm woke me up, I would
| definitely check.
|
| And I imagine if I triggered a car alarm, I would back off.
| yafbum wrote:
| I'd like to know more about the chip designer who, perhaps
| unwittingly, created the alarm-filled soundscape of most American
| cities https://youtu.be/tmCnleSBAIg. Would love to know more
| about the composition process that went into it.
| tra3 wrote:
| I need to sit down and go through the math again, I got lost in
| the middle somewhere. All I know is our alerts are way too noisy
| now to the point where they are useless.
| nh23423fefe wrote:
| I dunno, article doesn't seem to want me to understand. It's
| just another, "here's a random stats calculation you cant
| perform in your head, isnt the english a bad way to describe
| this calculation?!?!? your intuition sucks when i dont explain
| myself....."
| sparrish wrote:
| Alert fatigue... it's common when alerts are non-actionable and
| it causes a lot of downtime.
| bluGill wrote:
| We write up stories to fix them, and upper management tracks
| progress on completion so they are not buried in the backlog.
| hevans66 wrote:
| Yes! This. This has happened to me at least two previous
| companies I have worked at. Everybody sets up thresholds on
| every possible Datadog metric and alerts become useless. That's
| part of the ethos of monitoring at my current company. We only
| set up alerts through https://heiioncall.com/ that we are
| convinced you absolutely need to look at right now. Anything
| that is not that gets shoved to a slack channel (that I have
| long since muted).
| Syonyk wrote:
| Now, if you're annoyed by the false positive rate on your _actual
| smoke alarms,_ go replace the one nearest your kitchen with a
| photoelectric type, not the standard ionization type that 's
| cheaper, the default style installed, and ought to be illegal in
| homes (IMO).
|
| There's been quite a bit of research done, generally easy to find
| if you look, that talks about the difference and tests them, but
| the short summary:
|
| - Ionization type sensors detect the products of fast flaming
| combustion and "things cooking in the kitchen." Your oven, if a
| bit dirty, will reliably trip an ionization type. They are quick
| on the draw for this. The downside is that they're very, very
| poor at detecting the sort of slow, smoking, smoldering
| combustion that is associated with house fires that kill people
| in the middle of the night.
|
| - The photoelectric type is very good at detecting smoke in the
| air - but it isn't nearly as prone to false triggers on ovens, a
| burner burning some spills off, etc.
|
| They've been A/B tested in a wide variety of conditions, and in
| some cases, the ionization type is a bit quicker. In other cases,
| the ionization type is slower, by time ranges north of _half an
| hour_ - I 've seen some test reports where there was a 45 minute
| gap, while the photoelectric type was going off, before the
| ionization type fired!
|
| In general, "rapid fires during the day" are somewhat destructive
| to property, but rarely kill people. If your kitchen catches on
| fire while you're cooking, it may burn the house down, but
| generally people are able to get out.
|
| The fires that kill people are "slow starting fires during the
| night" - the sort that smolder for potentially hours, often
| slowly filling the house with toxic smoke, before actually
| bursting into open flames. On this sort of fire, the
| photoelectric type will fire long, long before the ionization
| type - in some cases, they get around to alarming quite literally
| "after the occupants are dead from the smoke."
|
| Using smoke alarms as a way to talk about monitoring systems is
| nice, but in terms of actual smoke detectors, get at least a few
| photoelectric sorts in the main areas of your home.
|
| Do _not_ get the "combined sensor" sort, since these tend to be
| and-gated and the worst of both worlds.
|
| Edited to add some resources:
|
| A presentation on the matter from a while back by one of the
| experts in this field:
| https://wahigroup.com/Resources/Documents/Ion%20vs%20Photo%2...
|
| Another paper: https://www.semanticscholar.org/paper/Detection-
| of-Smoke-%3A...
|
| > _Full-scale fire tests are carried out to study the
| effectiveness of the various types of smoke detectors to provide
| an early warning of a fire. Both optical smoke detectors and
| ionization smoke detectors have been used. Alarm times are
| related to human tenability limits for toxic effects, visibility
| loss and heat stress. During smouldering fires it is only the
| optical detectors that provide satisfactory safety. With flaming
| fires the ionization detectors react before the optical ones. If
| a fire were started by a glowing cigarette, optical detectors are
| generally recommended. If not, the response time with these two
| types of detectors are so close that it is only in extreme cases
| that this difference between optical and ionization detectors
| would be critical in saving lives._
| bluGill wrote:
| The law requires you have both types of good reason. Either
| alone will detect less than half of all house fires.
|
| Dual sensors are not and gated. While nobody will admit what
| algorithm they use, they detect most fires unlike the single
| sensor type.
| riceart wrote:
| Lol "the law" .. what law? Maybe in some dumb ass
| jurisdiction - but you're a bit full of yourself if you think
| where you happen to live is "the law".
| bluGill wrote:
| Us fire code, though inspectors often don't check.
| Syonyk wrote:
| As far as I can tell, it's state by state, so... you want
| to cite some sources?
| Syonyk wrote:
| Where does the law require both types? I'm not aware of any
| housing codes specifically requiring photoelectric types, and
| any house I've looked at, including mine, came with purely
| ionization types. Though it's been a few years, and it may
| have changed recently - this is less of a niche concern
| lately.
|
| As for dual sensors and gating... do you actually trust your
| life to "nobody will admit what algorithm they use"?
|
| My house has all the smoke detectors wired together (they're
| on an AC circuit, with battery backup, with a signal line
| running between them all), so I have some photoelectric and
| some ionization, depending on where in the house they are.
| compumike wrote:
| I do like how the author presents the case for how damaging
| false-positives can be in SRE monitoring. But, FYI, it can get
| worse if these monitors are hooked to self-actuating feedback
| loops! I recently wrote about a production incident on the Heii
| On-Call blog, in the context of witnessing how Kubernetes
| liveness probes and CPU limits worked together to create a self-
| reinforcing CrashLoopBackOff. [1] Partially because the liveness
| probe thresholds (timeoutSeconds and failureThreshold fields)
| were too aggressive.
|
| We have a similar message about setting monitoring thresholds in
| our documentation [2] because users have to explicitly specify a
| downtime timeout before they're alerted about their website / API
| endpoint / cron job being down. The timeout / "grace period" is
| necessary because in many cases a failure is some transient
| network glitch which will fix itself before a human is alerted.
|
| If you make the timeout too short, you'll get lots of false
| positive alerts, and as the article says, your on-call engineers
| will be overwhelmed or just start ignoring the alerts.
|
| If you make the timeout too long, it just takes that many minutes
| of downtime longer before you find out about it.
|
| It may sound counterintuitive, but the latter is usually
| preferable. :)
|
| [1] https://heiioncall.com/blog/kubernetes-liveness-probes-
| and-c...
|
| [2] https://heiioncall.com/docs
| raldi wrote:
| When the oncall gets paged, an SLO should be in jeopardy in a way
| that requires immediate measures to be taken by a well-trained
| human as described in actionable terms in a linked playbook.
|
| No SLO in jeopardy, or no immediate measure that needs to be
| taken? Don't page the oncall; send a low-priority ticket for the
| service owner to investigate the next business day.
|
| Steps need to be taken, but they're mechanical in nature or
| otherwise don't give the SRE an opportunity to exercise their
| brain in an interesting fashion? Replace the alert with an
| automated handler that only pages the oncall if it encounters an
| exception.
|
| No playbook, or the playbook consists of useless non-actionable
| items like, "This alert means the service is running out of
| frobs"? Write a playbook that explains what the oncall is
| expected to _do_ when the service needs frobs.
|
| Edit: A dead reply asks if I've ever experienced a novel
| incident. Of course. Say, for instance, a "This should never
| happen" error-level log is suddenly happening like crazy, for the
| first time ever. In that case, you page the oncall, they do their
| best to debug it, see if they can reach the SWE service owners,
| read through the code to see if it could be an indicator that
| SLOs are being violated (e.g., user data corruption) or might be
| violated soon, and then write a stub playbook to be fleshed out
| the next business day, probably alongside a code change to handle
| this situation without spamming the logs so much.
| matthew9219 wrote:
| [dead]
| fatnoah wrote:
| In a previous life as a full-stack Engineer at a startup, this
| was my white whale. The state of logging, monitoring, and
| alerting was such that signal quality was low, and only
| indirect observations of the system were possible since the
| logging was borderline useless. The result was multiple pages
| per night, with each one resulting in a scavenger hunt because
| signal was so low that it was nigh impossible to even identify
| what playbook to run.
|
| For example, the web application crashing was logged as a DEBUG
| statement, but starting was logged at an ERROR level. This was
| clearly done at some point because DEBUG generated far too much
| log info w/millions of active users, but some Engineer wanted
| to know that the app started. Gross.
|
| I solved for this by doing a couple things. The first was to
| define standards for log levels, ability to correlate log
| statements with each other for a given request, and to define
| the level of context a "proper" log level should provide.
|
| For example, FATAL = there's no way anything can work properly.
| These are pretty rare, but incorrect configuration values were
| a common culprit. ERROR indicates something, possibly transient
| going wrong. Every now and then, not a big deal that can wait
| until later, but a rapid accumulation could mean something more
| serious is going on. INFO contained information about the state
| of the system, such as general measures of activity and other
| signals to indicate the system is working as expected. Most of
| our metrics capture was instrumented based off these
| statements.
|
| In terms of the messages, we rapidly evolved the quality of the
| messages. For something like the aforementioned configuration
| error, the system initially just spat out an "Unexpected error"
| and a module name. The first improvement then stated something
| like "invalid configuration value" and finally we ended up on a
| message that stated the value was incorrect, identified which
| configuration value was wrong, and had a code that referenced
| documentation and escalation owner.
|
| When all was said and done, we'd reduced our downtime from
| hours per year to less than 5 minutes, eliminated over 95% of
| our pages, and reduced escalations to Engineering from several
| days per week to a level where it was hard to remember the last
| one.
|
| As the head of Engineering, I had to fight an uphill battle
| against the product & sales team for almost a year to make all
| of this happen, but I was fully vindicated when we were
| acquired and our operational maturity was lauded during the due
| diligence process.
| peteradio wrote:
| You know all that work was worth it when you get a good
| lauding.
| yamtaddle wrote:
| > When presented with this tradeoff, the path of least resistance
| is to say "Let's just keep the threshold lower. We'd rather get
| woken up when there's nothing broken than sleep through a real
| problem." And I can sympathize with that attitude. Undetected
| outages are embarrassing and harmful to your reputation. Surely
| it's preferable to deal with a few late-night fire drills.
|
| > It's a trap.
|
| > In the long run, false positives can -- and will often -- hurt
| you more than false negatives. Let's learn about the base rate
| fallacy.
|
| Not sure about anyone else, but speaking of alarms, this style of
| writing trips my "self-promoting snake-oil Internet bullshitter"
| alarm. It's like nails on a damn chalkboard, and if you're
| writing like this, you've already lost me; however, maybe I ought
| not be pointing that out, since signals are nice to have.
|
| Incidentally, I wasn't sure which way the author was gonna go
| with the core analogy. My smoke alarms have false-alarmed
| probably 10x as much as my car alarm, even counting times one of
| us has hit the alarm button on the fob by accident. I've
| certainly never been so annoyed by my car alarm that I've ripped
| it out and stuck it in a freezer, as I have with a smoke alarm.
|
| (If I were writing like the author I suppose that last part would
| have read:
|
| "I've certainly never been so annoyed by my car alarm that I've
| ripped it out and stuck it in a chest freezer.
|
| I have, with a smoke alarm."
|
| Except also I'd have found a way to use "we" and "you" a bunch.)
| raldi wrote:
| What do you mean by "this style of writing"? What aspects of
| the quote do you object to?
| jacquesm wrote:
| At a guess the bit 'let's learn about the base rate fallacy'.
| yamtaddle wrote:
| Short, choppy sentences, lots of second-person, dropping a
| "punch-line" sentence to its own paragraph like they're a
| fucking magician revealing the card you pulled earlier.
| It's some kind of cross between transparent rapport-
| building sales-psychology crap and setting off a fireworks
| display to celebrate your successfully assembling a PB&J.
|
| Like listening to a used car salesman tell a mundane story
| about their morning commute.
|
| But full of unearned and over-the-top dramatic pauses.
| burnished wrote:
| Im not sure what you are responding to in the quoted text but
| after reading the article I think I can assure you that the
| author isnt selling you anything more salacious than you would
| find in a more interesting introduction to probability and
| statistics lecture.
| quickthrower2 wrote:
| I see a lot of this style of writing in articles submitted on
| HN. I think they are just trying to make the writing more
| lively, not trying to BS.
|
| A trope of this style is "{interesting half story} but more on
| that later".
|
| I don't think it is a big deal and I don't see much self
| promotion here other than vanilla blogging, i.e. sounds like
| this person is knowledgeable let's check their bio.
| gmuslera wrote:
| Some complementary reading could be My Philosophy on Alerting (
| https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa... )
| and https://how.complexsystems.fail/
|
| In any case, not all signals are the same. Most systems have a
| lot of components interacting and what turns to be dangerous is
| usually a combination of factors, but in the end, what defines
| that it was or not is that the system is doing what it should.
| You can put some guessing thresholds, but you must contrast it
| with that the system works.
|
| And they should be actionable too, at least for alerts instead of
| slow day notifications, or metrics giving context to perceived
| problems that could take out the guessing from the thresholds.
| mertd wrote:
| The post is somewhat incomplete without also discussing the cost
| of the wrong decision.
|
| You obey the smoke alarm because the cost of ignoring the alarm
| when it is a true positive is potentially infinite (you die). You
| ignore the car alarm because (1) most likely it is a false
| positive but also (2) most likely it is somebody else's car.
| [deleted]
| dfox wrote:
| Smoke(/fire in general) alarms are not a good example of a thing
| with high specificity. You perceive it that way, but what you see
| is the result of somebody getting paged about it and then
| checking (preferably physically, but also through eg. CCTV)
| whether there really is an emergency situation and canceling the
| alarm before its escalation timeout. Apparently, for typical
| commercial building false fire alarms are more or less an weekly
| occurrence.
|
| Edit: in large scale fire alarm systems there also are rules
| about combinations of triggered sensors that cause immediate
| escalation (if there is smoke and elevated temperature in two
| adjacent zones, it probably is not a false alarm and such things,
| often it even takes into account the failure modes of the
| physical alarm loop wiring). This is an interesting idea for IT
| monitoring: page someone only when multiple metrics indicate an
| issue.
| tobyjsullivan wrote:
| It was an interesting example and maybe deserved a few more
| caveats to actually serve the point. After all, we've all heard
| a fire alarm of some sort in the past year (if not the past
| month) but how many were actual fires? (Technically the author
| said smoke which helps but not really.)
|
| Where I was expecting the author to go:
|
| - Clearly was talking about residential smoke detectors, not
| commercial. That could have been explicit.
|
| - Smoke detectors do have a high false-positive rate but almost
| always at the _right time_. A home smoke alarm going off while
| I 'm cooking is quite different to a smoke alarm going off when
| I'm sleeping. To the author's point, there are very few false
| positives while I'm sleeping so when they happen, I'm getting
| up.
|
| Speaking of the commercial context, I wonder what sort of
| businesses would get a lot of false alarms and how that varies
| across industries.
| cbarrick wrote:
| I think this article is missing the forest for the trees.
|
| The article is about finding the appropriate sensitivity of
| alerts on some signal in order to maximize the predictive value.
|
| But you should care more about the quality of the signals you are
| monitoring than about the sensitivity of your thresholds.
|
| The article mentions load-average as an example signal, but to
| me, that's a poor signal to monitor. Instead, if your SLO is
| defined for error rate, alert on error rate.
|
| Alerts on your SLO will have a high predictive value for
| predicting violations of your SLO, by definition. The tunable
| parameter here is the time window, not the threshold. E.g. if
| your error budget is defined for a 30d window, you may want
| alerts at the SLO threshold for 24h and 1h windows.
|
| Alert on causes, not symptoms.
| jacquesm wrote:
| > But you should care more about the quality of the signals you
| are monitoring than about the sensitivity of your thresholds.
|
| This is so true. Case in point: Growatt inverters have - like
| every other inverter - a maximum voltage on the grid connection
| at which they will shut down. They're pretty trigger happy
| about this and fail to take into account the resistance of the
| feed wire of the inverter to the (much lower impedance) grid
| hookup. As a result even on cabling sized properly for the
| interconnect they tend to falsely trigger well before the point
| where they should. The only way to avoid this problem is to
| either hack into the inverter somehow (which I've so far failed
| to do) or to use oversized cables (which isn't always an
| option).
|
| The sensitivity is fantastic, the quality of the signal is
| hopeless. Obviously they err on the side of caution but the
| margin is so ridiculously large that you end up losing a lot of
| usable power for no reason at all. At least it should allow for
| either a resistance for the interconnect to be specified so
| that it can take into account the voltage drop across that
| wire, which at 10A is appreciable for even short runs of fairly
| beefy cable.
___________________________________________________________________
(page generated 2023-04-11 23:00 UTC)