[HN Gopher] Notes on the Perfidy of Dashboards
       ___________________________________________________________________
        
       Notes on the Perfidy of Dashboards
        
       Author : jedixit
       Score  : 79 points
       Date   : 2021-08-27 15:00 UTC (8 hours ago)
        
 (HTM) web link (charity.wtf)
 (TXT) w3m dump (charity.wtf)
        
       | mdoms wrote:
       | This is a very stupid take from the author just intended to drum
       | up controversy.
        
       | throwaway_2047 wrote:
       | The tld is apt, wtf?
        
       | devchix wrote:
       | 1. Set up some strawmen about dashboards
       | 
       | 1a. Define dashboards in a way that suits your argument
       | 
       | 2. Knock em down!
       | 
       | 3. Profit?
       | 
       | This, https://status.cloud.google.com/, is a static dashboard.
       | One can make up one's mind about its usefulness and function. A
       | blunt example, but nobody I know "debugs" using dashboards.
       | 
       | Dashboard is automation, if you're against dashboards, you're
       | against automation of repetitive, boring, long-winded, laborious,
       | hard-to-remember, tasks. Dashboards aren't sacred, once one
       | outlives its usefulness, get rid of it.
        
       | jldugger wrote:
       | Honestly, the complaint here seems to be less about dashboards
       | and more about the data behind them.
       | 
       | Static dashboards sound like timeseries backends, where the data
       | is pre-aggregated (graphite / statsd, prometheus). You can't
       | really drill down into the metrics, or can only drill down into
       | preplanned dimensions. Grafana is a commonly used dashboarding
       | frontend here.
       | 
       | Dynamic dashboards, in contrast, are dynamically aggregating
       | data. More akin to structured logging, or maybe splunk / ELK. You
       | have granular data, and write queries to extract, filter, and
       | aggregate on demand. Tableau, PowerBI, Apache Superset all
       | compete in this space.
       | 
       | But by focusing on the dashboard angle, the reader doesn't think
       | to hardly about why they're different, and also why you might
       | prefer one over the other. TSDB like Prometheus are very fast,
       | and if you focus on collecting aggregate data, allow you to
       | collect a lot more metrics, or sample much faster. You're
       | probably not logging in the TSDB any labels associated with
       | UserAgent strings, or screen size your mobile app got, etc. By
       | paying the price in dimensionality, you get much faster queries
       | at lower cost. I'll let you guess which type of backend Charity's
       | startup represents.
       | 
       | Both have a place. I've been able to build canary dashboards that
       | work quite well using both backends, as a proof of concept that
       | something like Kayenta is feasible for my team. In fact, high
       | dimensionality works against you in release engineering. The more
       | dimensions you can compare across, the higher chance for a false
       | positive, and the more investigations engineers have to do to
       | rule them out. Worse, there are often confounding variables you
       | need to go hunting for, and the dashboard won't find them for
       | you.
       | 
       | And execs absolutely don't want to have to care about the complex
       | causality chain you need to model. They want 'a single number' to
       | improve on over time. They don't want a dashboard to dive in and
       | analyze on ten dimensions. They want to see their chosen KPIs
       | going to the right and up. Fundamentally, the dashboard is less
       | important than your audience.
        
       | clipradiowallet wrote:
       | From TFA... >every dashboard is a sunk cost
       | 
       | >every dashboard is an answer to some long-forgotten question
       | 
       | >every dashboard is an invitation to pattern-match the past
       | 
       | >instead of interrogate the present
       | 
       | >every dashboard gives the illusion of correlation
       | 
       | >every dashboard dampens your thinking
       | 
       | I disagree with this on all counts. A dashboard is a way to view
       | multiple disparate metrics in a single place. Whether they are
       | correlated isn't important(but it is helpful).
       | 
       | And the author doesn't stop there...
       | 
       | > They tend to have percentiles like 95th, 99th, 99.9th, 99.99th,
       | etc. Which can cover over a multitude of sins. You really want a
       | tool that allows you to see MAX and MIN, and heatmap
       | distributions.
       | 
       | They "tend to"? "You really want"? The author is confusing their
       | own failures/gripes around the concept of dashboards with the
       | world's experience with dashboards. By the end of the article, I
       | was shocked they weren't selling something.
        
         | cratermoon wrote:
         | > A dashboard is a way to view multiple disparate metrics in a
         | single place.
         | 
         | This is technically correct but doesn't approach anywhere near
         | the criticisms the article has.
         | 
         | The deeper questions are: how did those metrics come to be
         | collected, and why? What happened that resulted in those
         | particular metrics being aggregated and displayed they way they
         | are? What questions were being asked _at the time_ the
         | dashboards were created?
         | 
         | > a way to view multiple disparate metrics
         | 
         | So what? Why view them? Pretty graphs? A red/yellow/green? But
         | to what end? This is why the statement is technically correct,
         | but sheds no light at all on the reasons why a developer or
         | customer support troubleshooter would care to look at the
         | disparate metrics gathered in a dashboard.
         | 
         | Dashboards are created in response to certain problems and
         | events. Those problems and events may or may not be relevant
         | some time down the road. What happens when someone in the
         | present with a certain set of questions or problems looks at
         | the dashboard full of metrics capturing past questions and
         | forgets that those questions are not today's questions?
        
           | devchix wrote:
           | That's why good dashboards come with Title, subtitle, legend,
           | the X and Y axis, and units. Count of packets denied from
           | source IP, source port, last 4 hours. Average number of
           | requests forwarded to proxy farm, distributed by server, last
           | 7 days vs. same time last month. Who called 2049, last 24
           | hours.
        
             | arrrg wrote:
             | Yeah, but why?
             | 
             | You are talking tactics, not strategy. What are the
             | underlying goals?
        
         | _jal wrote:
         | People frequently get dashboards wrong, it is true. I like them
         | for genuinely important things - I want klaxxons when there is
         | a customer-facing issue, for instance. These should be simple
         | and hard to confuse.
         | 
         | I also like them for whatever KPIs are considered important
         | this week. A slightly sneaky reason is that time has to be
         | budgeted to modify the dashboard and they are very visible, so
         | dashboards also advertise when the goalposts move. (My current
         | employer is actually not bad about this. This lesson came from
         | $job-1.)
        
         | thebeardisred wrote:
         | Hear hear.
         | 
         | At CoreOS I set up a number of dashboards specifically to
         | socialize the normative behavior of our systems. Think of it as
         | the difference between the person who just drives their car
         | versus the person who knows what feels and sounds "normal".
         | 
         | The latter can tell when they need an oil change because of
         | different vibrations in the engine and how the car sounds
         | pulling up to a stop light (because of the engine sounds being
         | reflected back into the window by parked cars).
         | 
         | Big surprise, it had the desired effect.
         | 
         | I'm especially with you on the notion of disparate metrics.
         | While correlation is not causation, it's still a useful
         | diagnostic tool.
         | 
         | Let's say someone in marketing walks a dashboard and and sees
         | the following:
         | 
         | 1) a new version has been pushed 2) customer tickets are up 20%
         | over the number they're used to seeing
         | 
         | Does that mean that the new version caused the tickets? No.
         | Will that allow them to ask? Absolutely. Will that urge
         | behavior to reach out to support and release management to see
         | if there's an interesting story to share internally (or with
         | the world)? Hopefully.
         | 
         | You hit the nail on the head by calling out the absolutist /
         | "I'm the authority on this matter" / "there is a single correct
         | perspective" tone.
         | 
         | Your lack of shock at the author selling something can be
         | remedied. They have a dog in this fight:
         | https://www.honeycomb.io/teammember/charity-majors/
        
         | alexfromapex wrote:
         | I agree with both viewpoints in certain ways, a lot of times
         | visualization surfaces issues that would otherwise go unnoticed
         | but I think the important point is that being able to drill
         | down is important for context.
        
         | spullara wrote:
         | I mean, she is selling something. She is the founder (former
         | CEO) of Honeycomb which is all about doing ad hoc queries
         | rather than setting up dashboards.
        
           | fallat wrote:
           | Get 'em
        
       | reureu wrote:
       | I spent the past four years working as a data scientist for a
       | healthcare company on population health initiatives, and started
       | building out a body of research around how to engage clinicians
       | using data (among other things, through dashboards). That's a bit
       | different than the article, but one of my key learnings was that
       | dashboards are often incredibly ineffective and only promulgated
       | by well-intentioned engineers, based on what they would want to
       | see if they were a clinician.
       | 
       | I worked with a behavioral economist, and we started running RCTs
       | looking at different approaches to sharing data, and found that
       | dashboards led to less engagement, when there was engagement it
       | was more likely to drive ineffective interventions, and generally
       | our dashboard groups had worse patient outcomes. Non-dashboard
       | approaches had 20x better engagement and anywhere from 20-100%
       | better patient outcomes (depending on the condition).
       | 
       | Unfortunately, both of us left the company when a bunch of
       | engineers moved in, scoffed at our work, and immediately said
       | "doctors need to be able to slice and dice their data" -- which,
       | by every measure we had tested, is simply not true. But the
       | "mission control" style thinking, where you have tons of dials
       | and numbers flashing on a screen, pervaded because it "feels"
       | like the right answer despite being the objectively wrong one.
        
         | robbles wrote:
         | Do you think there's an alternate approach to showing data that
         | would be effective? I can definitely buy the idea that showing
         | a bunch of fancy charts doesn't help most medical
         | professionals, but I don't like the idea of giving up on trying
         | to surface more data. Or is that what you're referring to as
         | "non-dashboard approaches"?
        
           | vitovito wrote:
           | Not the OP, but in my experience, when clinicians ask for
           | more data, they're actually asking for more lineage or
           | provenance metadata.
           | 
           | We boiled it down for our teams like this:
           | 
           | Administrators are top-down: they want a high-level view and
           | to be able to drill down from there.
           | 
           | Individual physicians are bottom-up: they want "their" data
           | about "their" patients, and maybe, sometimes, to compare to
           | the peers they personally trust.
           | 
           | As with any professional group, there's some minority
           | percentage that treats their work like a craft and knows how
           | to use data to improve their practices; but the majority want
           | qualitative data and value interpersonal relationships.
           | Giving a dashboard to the latter group at all is wasting time
           | and effort of all parties.
           | 
           | If your dashboard can't attribute all of its data and all of
           | the patients referenced to match the physician's definition
           | of "theirs," you've lost. That's the "more data" and "drill
           | down" physicians care about.
           | 
           | If your dashboard isn't timely and clinical -- which
           | generally means presented in a clinical voice, at the point
           | of care, or otherwise when they have an opportunity to make
           | the change you want them to make -- it's not going to be
           | actionable. That means surfacing some alternative action
           | _right before they see a patient which might benefit from
           | that_ , which is not when they're on their computer. They
           | might be one of those doctors that never is on their computer
           | until the very end of the day. Looking at your dashboard at
           | 11pm about the patients from earlier today (or more likely,
           | earlier this past quarter of the year) is not helpful.
           | 
           | Looking at your dashboard is non-clinical work, and doctors
           | want to do clinical work. If you're going to make them go do
           | a new and non-clinical thing, it has to reduce some other
           | non-clinical thing in a way that's meaningful to them.
           | Otherwise, they're just as likely to do an end-run around
           | your application entirely, like the doctors who only use
           | index cards to write notes or who fail to enter their
           | passwords every morning and lock themselves out, so they
           | don't have to use the EMR.
        
           | reureu wrote:
           | I wrote a longer response to another comment with examples of
           | some of the experiments and learnings. But, yes, I think
           | there are effective alternatives, but I think it starts with
           | being really clear on what your success measures are. Do you
           | want to maximize patient outcomes? Do you want providers to
           | feel engaged (or, perhaps, actually engage) with data? Do you
           | want to minimize provider burnout? I was always surprised by
           | how few clinical and tech leaders could actually articulate
           | what the goals were-- it'd often just be "we need providers
           | to have more data!", which I suspect isn't actually what your
           | goal is.
           | 
           | The tldr was that telling providers directly what you want,
           | generally in an emailed newsfeed-style format was the most
           | effective at improving actual outcomes. No slicing and
           | dicing. No graphs. No comparisons. Just "hey, look at these 6
           | uncontrolled hypertensive patients, and follow-up with any
           | that need follow-up."
           | 
           | Also, to caveat: I'm talking about how to engage the worker-
           | bee providers. Not clinical leadership. Not the quality team.
           | Not the data science/analyst team. Providers who are super
           | busy with patient care, but also expected to manage patients
           | between visits. Basically every experiment we ran favored the
           | most direct, no frills, least effort approach to look at
           | data. Which, coincidentally, was the exact opposite of what
           | the engineering teams wanted to build :-/
        
             | troelsSteegin wrote:
             | With respect to the tldr - so, on one arm, so to speak, is
             | an interface where clinicians could use different
             | combinations of measures to identify patients who might
             | need some kind of intervention, like follow up attention.
             | Then the other side is an analytic system that uses a
             | specific set of measures, etc, and then messages clinicians
             | with a recommended set of actions?
             | 
             | In the first case, the clinicians have to do analytical
             | work (slice, dice) towards understanding the population of
             | patients. That sounds more like epidemiology... In the
             | latter case, how is it that clinicians will trust the
             | recommender? Is it understood that there is a clinical
             | rationale or authority behind the algorithm? It sounds like
             | "uncontrolled" in this case is based on a measure that
             | clinicians trust.
             | 
             | I think of dashboards as potentially good for monitoring
             | outcomes against expectations, EDA as potentially good for
             | focusing attention on subpopulations, and recommenders as
             | potentially good for efficiently allocating action. In a
             | broad way what you described is a monitoring system that
             | pushes recommended actions out to doers. I'd venture that
             | with busy clinicians that that needs to be pretty accurate,
             | too, and/or that recommendations need both explicit
             | justification and a link to collateral information.
        
               | reureu wrote:
               | Quality measures are generally well-defined by external
               | authorities, so questions like "what defines
               | uncontrolled" are generally answered. Even when providers
               | personally disagree with this (I worked with a provider
               | who didn't believe in pre-diabetes), they still
               | acknowledge that health care organizations are being
               | judged on these measures, and that the measures are not
               | just arbitrarily defined. How you improve your quality
               | measures becomes where the question turns.
               | 
               | Your comment about epidemiology/EDA/etc really hit the
               | nail on the head. If you sit in on population health
               | meetings at your average hospital/clinic system, you'll
               | see that many people really don't get this. Further,
               | people often conflate their needs/desires with that of
               | others-- so, the data-driven administrator is quick to
               | say "we just need doctors to be able to slice and dice
               | their data, and then we'll have better quality scores."
               | But they're talking about what their needs are, and it's
               | completely not what the doctors actually need (well, and
               | from monitoring usage of dashboards for those types, I'd
               | argue it's also not what they need either, but that's a
               | different issue). And, the reason I keep saying "slice
               | and dice" is because I've heard that phrase used by every
               | vendor I've evaluated, and in practically every strategy
               | meeting regarding population health at multiple
               | institutions.
               | 
               | I'd personally shy away from describing this issue in
               | terms of a recommender, since that has a pretty
               | connotation in the ML world, and it doesn't really line
               | up well (e.g., there's not a well-defined objective
               | function or a clear feedback loop to train a
               | recommendation system on). However, getting away from
               | that specific concept, I think it's reasonable to say
               | that there are needs for multiple distinct but ideally-
               | related systems in the population health world: one for
               | analysis to be used by quality and data people, and one
               | specifically for the clinicians doing the work.
        
         | kfarr wrote:
         | Curious if you have any examples of "Non-dashboard approaches"
         | to compare and contrast?
        
           | reureu wrote:
           | We tried a couple of different approaches: tableau reports,
           | emailing static data to providers, sending spreadsheets of
           | patient-level data, and building a Facebook or Twitter style
           | feed. And then had different variations on each, and would
           | run trials comparing different approaches.
           | 
           | We pretty quickly found that sending data ("push") was way
           | more effective at engagement than just having a tableau
           | report they could go to ("pull"), even when that dashboard
           | was linked directly within the EHR, didn't require a login,
           | and was contextualized for the provider (basically as low
           | friction as you could get-- they would actually venture into
           | it 1-2 times per year).
           | 
           | We ran a trial where we changed how we presented data: either
           | in terms of number of patients ("screen 20 people for
           | depression this week") or in terms of quality rates ("your
           | depression screening rate is 40% and going up"). Keeping the
           | data in terms of patients led to ~20% improved screening, and
           | in the surveys led to providers expressing more trust in the
           | data (although, they also were more likely to say they
           | "didn't have enough data to do [their] job", despite actually
           | doing their job better than the other group).
           | 
           | So then we took that idea for depression screening and
           | extended it from depression screening to chronic disease
           | management (where the specific task for the provider is much
           | more variable). So we had one arm where we gave them access
           | to data marts and trained them on how to "slice and dice" the
           | data, and then compared that against a newsfeed that had the
           | data pre-"sliced and diced". The engagement was higher in the
           | newsfeed group. Interestingly, the only thing the "slice and
           | dice" group seemed to do was look for patients without a
           | primary care doc designed in the EHR and just assign them--
           | in evaluating the outcomes for this, that was the single
           | least effective intervention they could do to improve chronic
           | disease care (and this was validated in a follow-up study
           | looking explicitly at the impact of PCP-assignment on patient
           | care). So, our "newsfeed" arm ended up with, on average,
           | around 60% better outcomes than the "slice and dice" arm.
           | 
           | What's funny is that through all of this, some of the leaders
           | would also say "we need more data!!" But when we'd build a
           | tableau report for them, they'd use it once or twice and then
           | never again. Or, in one case, the leader would actually use
           | it for ineffective purposes ("we have to assign them all
           | PCPs!!") or for things that are easily automated ("we're
           | going to email each of our uncontrolled hypertensive
           | patients"). I firmly believe that for doctors and data, you
           | need to have clearly defined objectives: the goal should
           | never be "give them access to data", but rather should be
           | something like "make providers feel like they have the data
           | necessary to do their job" and "improve quality rates through
           | targeted provider intervention." Start from those first
           | principles, and validate your assumptions at each step. I'm
           | confident your solution won't end up with tableau.
        
             | kfarr wrote:
             | Thanks for writing this up, it's a valuable resource that
             | can be applied to other industries.
             | 
             | My takeaways:
             | 
             | - Start with first principles, such as "improve quality
             | rates through targeted provider intervention"
             | 
             | - Push and simple stats works better vs. pull with fancy
             | dashboards
             | 
             | - Slice and dice can help identify process exception but
             | not great for process improvement, whereas simple stats on
             | a regular basis improve outcomes
        
             | roveo wrote:
             | Thank you so much! I'm working on improving construction
             | management with better access to data and I think these
             | insights will transfer very well to my domain.
        
         | dmix wrote:
         | There's so many vanity metrics in business related data science
         | that I suspect when the visualization gets blamed instead of
         | what is being visualized.
        
         | vitovito wrote:
         | Are you me? I also built out a body of research in a healthcare
         | company about how to engage clinicians using data, that stopped
         | us from making some mistakes until we got acquired. No doctor
         | (or chief of staff, or chief medical officer, or anyone besides
         | an actual data analyst) will slice and dice their data. Doctors
         | won't even listen to a non-peer clinician discuss data with
         | them, let alone their administrators.
         | 
         | In my timeline, I told the engineering team that all of their
         | work was almost certainly for naught, that all the research
         | said this product would completely fail, and we were basically
         | just doing it for managerial and contractual obligations.
         | 
         | This gave engineering the freedom to use whatever technologies
         | and conduct whatever technical experiments they wanted, since
         | no-one would ever use the product, and it'd likely be shut down
         | soon after launch for disuse.
         | 
         | A key hospital partner gave us a couple dozen docs to test it
         | with. I interviewed them about how they measured their work and
         | impact, and the data they used to improve their craft and
         | outcomes. I asked them to review every measure on the
         | dashboard, explain their understanding of it, and explain how
         | their work or behavior would change based on that.
         | 
         | Almost to a person, the doctors said there was nothing of use
         | to them there, as the research predicted. Some of these doctors
         | were on the committee that specified the measures they would
         | themselves be seeing.
         | 
         | The product was launched with great managerial acclaim, and
         | promptly sunset 12 months later from disuse.
        
           | [deleted]
        
           | reureu wrote:
           | I dunno, but it sounds like we should get a beer sometime.
           | 
           | Not sure if this resonates also, but the engineers that took
           | over all came from outside healthcare and had a strong "I'm
           | going to apply what I know from Ticketmaster to solve
           | healthcare!" mentality. Those of us that have 15 years of
           | experience in healthcare would, at best, have a 10 minute
           | "knowledge sharing" meeting with the engineering and product
           | managers. And then we'd sit back and watch them make some
           | really naive mistakes. [to be clear, I'm not about
           | gatekeeping people from being involved in health tech, but
           | rather I'm just exhausted at interacting with people with no
           | self-awareness about the amount of things they don't know
           | about a particular domain]
           | 
           | I'm still a bit bummed because I think we were actually just
           | starting to get to some really cool, actually innovative,
           | population health approaches that seemed effective for both
           | improving outcomes and minimizing provider burnout. :(
        
         | matheusmoreira wrote:
         | Can you provide concrete examples of data that would be shown
         | in such dashboards? I'm having trouble visualizing it.
        
           | reureu wrote:
           | The dashboards would show generally quality measure
           | performance, often outcomes but sometimes process measures.
           | So, things like "percent of eligible patients who have had
           | cervical cancer screening" or "percent of hypertensive
           | patients who have controlled blood pressure" (outcome
           | measure), and perhaps "percent of patients missing source
           | documentation" or "percent of patients with no appointment
           | scheduled" (process measures). The initial tableau report
           | would let you drill down to see which patients were non-
           | compliant for that particular quality measure, presumably
           | allowing them to follow-up as needed.
           | 
           | By the time I left, we had a newsfeed that would just tell
           | them what needed to happen ("you have N patients who have
           | uncontrolled hypertension with no follow-up appointment
           | booked. Consider messaging them to book an appointment."). No
           | rates, just that story in their feed, with a link to the list
           | of patients that corresponds to the "N patients". That would
           | all be thrown into an email and sent weekly, and we'd track
           | when they opened, clicked, and what ended up happening with
           | the individual patients.
        
             | matheusmoreira wrote:
             | That's interesting. In my country that's usually handled by
             | clinic administration and secretaries. I assume the
             | presence of private medical information such as presence of
             | hypertension, controlled and uncontrolled, etc. makes it
             | necessary for doctors to do it.
             | 
             | In my experience, when patients have uncontrolled chronic
             | disease a follow up appointment is automatically set up.
             | Doctors prescribe medication and ask the patient to return
             | after some time so they can be reevaluated. When they miss
             | appointments, clinic administration knows and can follow
             | up.
        
         | [deleted]
        
         | lazyasciiart wrote:
         | Sounds interesting, is any of that body of research available
         | publicly?
        
           | reureu wrote:
           | Some, not all. I'm hesitant to directly post links on here
           | since it will out both me and the company I worked for. If
           | you're interested in this kind of work, you should check out
           | the conference proceedings for AMIA (amia.org)
        
       | simonw wrote:
       | I have trouble trusting any dashboard if it doesn't have the
       | equivalent of a "view source" button that lets me see what data
       | sources were used for it and how they were manipulated.
       | 
       | Sadly dashboard systems that encourage this are extremely rare.
        
       | rocgf wrote:
       | I disagree so much with this that I wouldn't even know where to
       | start arguing against it.
        
         | numlock86 wrote:
         | Agree. I wonder why this mindless rant even made it to the
         | frontpage.
        
       | hbosch wrote:
       | I read this and still don't know exactly what the author is
       | asking for. Is an "exploratory, queryable interface" _not_
       | exactly how you would describe a modern dashboard?
        
         | tcard wrote:
         | The author is asking for, and actually built,
         | https://honeycomb.io.
        
         | lmeyerov wrote:
         | They want you to use _their_ interactive dashboards backed by
         | _their_ database using features their PMs prioritized, like
         | wide columns and drill-downs. But you 're right, other systems
         | do support these . Ex: Splunk has done wide columns with index-
         | on-write columnar analytics and point-and-click
         | pivoting/drilldowns since ~day 1. So, as they add features,
         | they get very definitional on each one. Most startup people
         | (myself included!) have a lot to learn from their successful
         | developer/IT marketing style.
        
       | brazzy wrote:
       | > That's not debugging, that's pattern-matching. That's ...
       | eyeball racing.
       | 
       | Um...yes. And that is a _very good thing_. Because if there is
       | anything the human brain is good at, it 's pattern matching.
       | Especially on visual data.
       | 
       | It's an extremely quick and efficient way to find out where to
       | _start_ the detailed debugging.
       | 
       | And there is a lot of value in that.
        
         | someguydave wrote:
         | the brain is also good at fooling itself
        
       | SkipperCat wrote:
       | Dashboards are invaluable. Humans can intake a lot of data from
       | images and there is not better way to grok data than a graph.
       | 
       | We've spent a lot of time building Grafana dashboards and they've
       | been extremely helpful with debugging. It doesn't solve all
       | problems but it certainly helps narrow down where to look.
       | 
       | Sure, we still look at log files, use htop and a lot of other
       | tools, but our first stop is always Grafana.
       | 
       | I suggest the almost any book by Edward Tufte. There you'll see
       | the beauty and value of visual information.
        
         | tetha wrote:
         | > We've spent a lot of time building Grafana dashboards and
         | they've been extremely helpful with debugging. It doesn't solve
         | all problems but it certainly helps narrow down where to look.
         | 
         | This is what I was about to write. Most of our services have 1
         | or 2 dashboards showing some service KPIs - for example HTTP
         | request throughputh and response time, and also interface
         | metrics to other sub systems - queries to postgres, messages to
         | the message bus and so on.
         | 
         | With dashboards like this, you can very quickly build a
         | deduction chain of "The customer opened a ticket, well because
         | our 75%ile of the response time went to 20 seconds, well,
         | because our database response times spiked to an ungodly
         | number".
         | 
         | And then you can go to the dashboards about the database, and
         | quickly narrow it down to the subsystem of the database - is
         | something consuming the entire CPU of the database, is the IO
         | blocked, is the network there slow.
         | 
         | In the happy cases, you can go from "The cluster is down" to
         | "our database is blocked by a query" within a minute by looking
         | at a few boards. That's very, very powerful and valuable.
         | 
         | And sure, at that point, the dashboards aren't useful anymore.
         | But a map doesn't lose value because you can now see your
         | target.
        
           | ethbr0 wrote:
           | This is where dashboard have been most useful to me: extract
           | and expose key metrics of the system, in as _unbiased and
           | raw_ of a form as possible.
           | 
           | Or, to put it another way, looking at a dashboard should tell
           | you reliable _facts_ about the system, that lead you to
           | further exploration.
           | 
           | As the post puts it, _" They're great for getting a high
           | level sense of system functioning, and tracking important
           | stats over long intervals. They are a good starting point for
           | investigations."_
           | 
           | Dashboards should not attempt to interpret anything, without
           | being very clear about how, why, and what they're doing.
           | 
           | Example: response time statistics vs "responsive green/red
           | light"
           | 
           | If it's important enough to have logic built on top of it,
           | that's an alert, and that's something different.
        
         | drewcoo wrote:
         | A dashboard in a car gives you largely active feedback to help
         | you drive. Mechanics' diagnostic tools give them a deep dive
         | into information about repairing the vehicle, not limited to
         | real time or to just driving activity. Those two things are
         | very different.
         | 
         | I don't think a debugging "dashboard" is a dashboard. That
         | conflates two very different ideas. We need a different name
         | for that.
        
           | SkipperCat wrote:
           | What about the "check engine" light or the "low oil" light?
           | Both are debugging indicators about the health of your
           | engine.
           | 
           | But snark aside, I do agree with your about the intent of a
           | dashboard. I've always told people dashboards are there to
           | give you historical and real-time info about the performance
           | and rates of your systems. It doesn't show you exactly what's
           | wrong, but it is very helpful in showing where to start
           | looking.
        
           | ethbr0 wrote:
           | "Pre-configured EDA"
           | 
           | https://en.m.wikipedia.org/wiki/Exploratory_data_analysis
        
         | cratermoon wrote:
         | > We've spent a lot of time building Grafana dashboards and
         | they've been extremely helpful with debugging. It doesn't solve
         | all problems but it certainly helps narrow down where to look.
         | 
         | And then once the bugs that led to the creation of that
         | dashboard are fixed or retired, what's left for that data? It
         | just sits there with its pretty graphs and eye-catching
         | visualizations to snare the unwary who are looking for help
         | debugging a different problem. In fact, they'd be best served
         | by ignoring existing dashboards and creating new ones specific
         | to the issue in the present, not some dead husk of a problem
         | that looks like it might be related.
        
           | mdoms wrote:
           | How do things like message queue sizes, transactions per
           | second, errors per minute, memory usage etc become less
           | useful after solving one bug?
        
           | lazyasciiart wrote:
           | This is like advocating to get rid of log statements because
           | they might not log the cause of the next bug.
        
         | landryraccoon wrote:
         | I don't think dashboards help with debugging at all, and they
         | shouldn't be designed with that goal in mind.
         | 
         | Dashboards tell you that a problem exists. Just demonstrating
         | that SOMETHING is broken means the dashboard has accomplished
         | it's task in full. The engineer's job is then to fix it.
        
       | landryraccoon wrote:
       | Dashboards aren't a debugging tool. They're a QA tool.
       | 
       | The point of the dashboard is so someone can say "hey, I'm not an
       | engineer but new user signups sure are taking a nosedive this
       | week. Can we get someone on this asap?"
       | 
       | Then you can point at the dashboard and say "this is a problem".
        
       | tomrod wrote:
       | I disagree strongly with the content.
       | 
       | But man, why is it designers thing light gray background with
       | mid-gray text is a good idea? Almost unreadable for me.
        
       | ziggus wrote:
       | Interesting that most of this rant is focused on static
       | dashboards - does anyone really use static dashboards? I can see
       | if you're stuck using 1998 technology like Excel that you might
       | throw together some static charts/graphs/KPIs, but even in the
       | most primitive dashboard tools available now (PowerBI, I'm
       | looking at you) the default is a dashboard that's tied to a
       | 'live' dataset.
        
         | cwillu wrote:
         | The "static" refers to the structure of the data, not the
         | freshness/liveness of it.
        
       | buremba wrote:
       | The main problem is that it's not easy to drill down into a
       | metric in most BI tools because the connection between the
       | dashboard and the source data is usually missing. Looker is one
       | of the first companies that target this specific issue; you
       | transform, model the data, define your metrics before creating
       | your first dashboard. It takes too much effort (also money) to
       | create a (basic?) dashboard because it's not just a "dashboard".
       | 
       | Instead, as data analysts, we usually want to write a bunch of
       | SQL queries, create charts from them and expose the data to our
       | business stakeholders. While they can see the underlying SQL
       | queries of the metrics, it's not easy for them to modify these
       | SQL queries, so they often get lost.
       | 
       | The dashboards have a long tail. For me, you need to get these
       | four steps done beforehand:
       | 
       | 1. Move all the company data into a data warehouse and use it as
       | the single source of truth.
       | 
       | 2. Model the data with a transformation tool such as dbt and
       | Airflow.
       | 
       | 3. Define metrics in one place on top of your data models in a
       | collaborative way. (This layer is new, and we're tapping it at
       | https://metriql.com)
       | 
       | 4. Use an interactive BI tool that lets you create dashboards on
       | top of these metrics with drill-down capability.
        
       | nixpulvis wrote:
       | The last paragraph really got me thinking about _regression_.
       | 
       | > raise your hand if you've ever been in an incident review where
       | one of the follow up tasks was, "create a dashboard that will
       | help us find this next time"
       | 
       | As a disciplined software engineer, I aspire to have each and
       | every user facing bug captured first as an automated test. This
       | helps form trust in the software. Ideally the users themselves
       | can choose to write the tests and submit them for me.
       | 
       | This is akin to metrics. I completely agree, system-KPI metrics
       | should be relevant and short. But there's nothing stopping you
       | from collecting an archive of previous data experiment formulas.
        
       | iamthepieman wrote:
       | The most effective dashboards I've worked on have been glorified
       | interactive spreadsheets with the ability to graph the data in
       | various ways. High density, not pretty. Sometimes there was a map
       | if the data was geospatial.
       | 
       | Also, does anyone remember the covid dashboard from John Hopkins.
       | That one is pretty useful.
        
       ___________________________________________________________________
       (page generated 2021-08-27 23:01 UTC)