[HN Gopher] The Economist's excess deaths model
       ___________________________________________________________________
        
       The Economist's excess deaths model
        
       Author : raldu
       Score  : 163 points
       Date   : 2021-05-25 13:25 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | evilpotatoes wrote:
       | Isn't this based on a rather questionable assumption that all
       | excess deaths in this time period were due to undiagnosed COVID ?
       | There's been a large spike in drug overdoses, and some increases
       | in suicides that have been seen. Would it not be reasonable to
       | assume that a number of these deaths were caused by lockdowns,
       | and various other heavy-handed measures authorities used to try
       | to control outcomes ?
        
         | thebruce87m wrote:
         | All the data on suicides I have seen so far shows that there
         | are no significant changes to the suicide rates outside
         | existing trends.
         | 
         | For your other point about excess mortality being caused by
         | lockdowns, you can view charts of excess mortality here:
         | https://www.euromomo.eu/graphs-and-maps/
         | 
         | You can see that some countries, e.g. Norway had lockdowns but
         | did not incur any severe excess mortality - but note that this
         | only proves that you _can_ have lockdowns with no excess, not
         | that every lockdown is equal.
        
       | physPop wrote:
       | Forgive the naive question: but is code like this typical for
       | "data science"? It seems more like something out of a masters
       | project... Huge script files with zero testing?
       | 
       | Theres a huge amount of data reshaping, mutating, and general
       | wrangling going on here, how can one be confident without even a
       | known input->output integration type test?
        
         | MattGaiser wrote:
         | For something like this, the code is more an aid to analysis
         | rather than the product itself.
        
         | mistrial9 wrote:
         | in R particularly, there is a wide range of code quality, by
         | developer standards, yes
         | 
         | It should be said that the statistics themselves do make for
         | some 'guide rails' .. the stats are quite demanding, require a
         | lot of upper-division training to use correctly, and visual
         | feedback can course-correct as the results are iteratively
         | found.
         | 
         | As said in other comments, the content is often associated with
         | a research goal that is weighted more than code-quality. In
         | contrast, a general purpose development language has very broad
         | application, and probably can go wrong in very broad ways. The
         | coder is getting feedback on results, but often lots of code
         | review and expectation of professional results.
         | 
         | I picked out an R ETL sequence from an article last year, and I
         | still pull it out once in a while, to see the extensive, clever
         | and (to my eye) really hard to read R data ingestion and
         | manipulation. Personally I think it is a fair tradeoff to say
         | that the expertise to use this environment is a bit of a
         | filter, and the rigor that the results (therefore intermediate
         | results) demand, does move the expectations on code quality ..
         | 
         | As for tests, there is probably no defense on not having tests
         | also.. it is probably going to be more common as the field of R
         | and data analysis inevitably grows.
        
         | da39a3ee wrote:
         | Yes, you can see the divide in the popularity of interactive
         | notebooks like jupyter and observable. To programmers who use a
         | standard git-based workflow, the interactive notebooks are
         | anathema because you can't use version control to compare code
         | state and monitor progress, and because of the global
         | variables. But to many quantitative researchers for whom
         | software version control isn't a large part of their
         | professional worldview, it seems normal to just keep adding to
         | the same document until you hit a state that seems to work,
         | without much emphasis on checkpointing and navigation between
         | checkpoints, and testing at each checkpoint.
        
         | physPop wrote:
         | Would love some commentary on why this is an unpopular
         | question.
         | 
         | Good software practices don't only have to be for "production"
         | or CI or large teams of coders. Testing for correctness could
         | be seen as part of the work of delivering a high quality
         | product/graph/prediction/model.
        
         | ethanbond wrote:
         | In my experience, yes. Data science is highly
         | creative/improvisational and the resulting artifacts of work
         | reflect that. FWIW, normal science is pretty similar. The clear
         | step 1, step 2, ..., you see in scientific publications is
         | effectively retcon done for the benefit of the reader's
         | understanding.
        
         | nerdponx wrote:
         | Yes, sadly.
         | 
         | Testing is hard. The researchers who write this code typically
         | don't have the necessary hands-on experience to write good
         | tests, even if they had enough time in a day/week to actually
         | do it.
         | 
         | Edit: Also the code tends to be very "high level", chaining
         | lots of high-level API functions together, and even coming up
         | with assertions to test tends to be a bit of a challenge.
         | Testing such code turns out to be surprisingly difficult; you
         | might end up just rewriting big chunks of your code in the test
         | suite.
         | 
         | In my data science work, I've focused on writing tests for the
         | complicated sections (e.g. lower-level string processing
         | routines) and just trying to focus on the keep the other stuff
         | very clean and readable.
        
           | StavrosK wrote:
           | What would "good tests" look like for something like this?
        
             | physPop wrote:
             | Generate synthetic data -> run model -> check expected
             | outputs. Yes it a lot of work, but you're reaching millions
             | of people with this model and correctness is paramount!
             | 
             | Similarly, even such simple test harnesses help when
             | yourself or other go to modify the code. Having flags for
             | "Did I break something" is very important.
        
               | woeirua wrote:
               | Synthetics only take you so far though. Real data almost
               | always has unusual data (outliers), or the data will
               | violate various assumptions that you've made along the
               | way. In many cases there is no way to know ahead of time
               | how the data will break your model. Which is why
               | monitoring for model drift is so important.
        
               | StavrosK wrote:
               | The deliverable here isn't the code, it's the output. If
               | "check expected outputs" is a subjective step anyway, why
               | not just check that the output of the code when run on
               | the data is as expected and skip the test altogether?
        
               | stonemetal12 wrote:
               | > If "check expected outputs" is a subjective step anyway
               | 
               | It isn't. It is the evidence that calculations in the
               | code were done correctly.
               | 
               | It is like the problems in high school math class. The
               | teacher controls the inputs so that the outputs are
               | known. The teacher then runs the test problem past the
               | student (aka the code). If the known correct answer isn't
               | generated then we know the student didn't do it right.
        
               | StavrosK wrote:
               | How do you tell that the calculations were done
               | correctly? Presumably you have some way of doing that.
               | 
               | Then, why don't you apply that way to the output? You
               | don't need tests if you only need to do it once.
        
               | pigeonhole123 wrote:
               | Checking that the code outputs what you think it does is
               | a pretty low bar to clear. The fact that you can think of
               | higher bars doesn't invalidate the value of checking
               | this.
        
             | nerdponx wrote:
             | That's part of the problem.
             | 
             | In general (and in my opinion), a "good test" is one that
             | asserts that an invariant is always so, or that a property
             | that is expected to hold under certain conditions does
             | indeed hold under those conditions. Defining such
             | invariants and properties for "data science code" tends to
             | be difficult, and even when you define them it might be
             | difficult or impossible to test them in a straightforward
             | fashion.
             | 
             | And that's even before you get to the probabilistic stuff.
        
         | alexpetralia wrote:
         | In general I think data science code is more "research-
         | oriented", meaning that it is not run continuously like a
         | software app and its requirements often evolve as more
         | information is learned from the data. Research code produces
         | results maybe once or twice a day (manually triggered) while
         | software apps potentially service hundreds of thousands of
         | requests a day.
         | 
         | Research code, because its requirements are not fixed and it is
         | not frequently run, doesn't need to be as "stable" as a
         | bonafide app. For a bonafide app, requirements do not change as
         | often and the app is run virtually 24/7.
         | 
         | Once the research code becomes "productionized" however - i.e.
         | it is deployed in an online system where uptime & accuracy
         | matter - then I think absolutely it becomes more engineering-
         | heavy and looks quite a bit less like this code.
         | 
         | Would be curious to hear others' thoughts on this distinction
         | between research vs. production code however.
        
           | physPop wrote:
           | The purpose of testing in my example is for correctness, not
           | continuous integration. I would argue regardless of
           | "production" use, the fact that this is publishing a widely
           | read newspaper makes it highly impactful and more likely to
           | influence policy and decisionmaking, thus we want to know if
           | its correct !
        
             | whymauri wrote:
             | Well... that's partially why it's open-sourced. What fuller
             | transparency of correctness is there an open-sourcing the
             | entire analysis?
        
           | admissionsguy wrote:
           | Many of the "good practices" in software development are
           | primarily meant to reduce the cognitive effort on the part of
           | the person reading the code.
           | 
           | I suspect it will be a downvotably unpopular opinion, but a
           | typical researcher has a significantly larger cognitive
           | capacity than a typical software developer. Consequently, a
           | piece of code of certain complexity will look simpler, and be
           | easier to manipulate for a researcher than for an average
           | software developer.
        
             | woeirua wrote:
             | I think most research code bases are used by a
             | significantly smaller number of people. Hence it's easier
             | for the researcher to know what's going on in their code
             | because they probably wrote a significant portion of the
             | code they use and hence understand the assumptions, and
             | limitations that are baked into the code. Whereas if you
             | try to use a library normally you have to kind of figure
             | those things out on your own.
             | 
             | That said, having used a lot of research oriented software
             | packages over the years, researchers typically do not
             | produce high quality software. Their objective is to
             | produce papers, not software that other people can use. If
             | someone else can use it too, then great, but that's not the
             | primary motivation.
        
             | codyb wrote:
             | Huh? A researcher has greater cognitive capacity? That's a
             | funny take.
             | 
             | My impression was ->
             | 
             | Software developers main focus is developing software so
             | they spend a HUGE amount of time developing software,
             | maintaining software, noticing patterns and bugs and
             | pitfalls in software and thus they get pretty decent at
             | writing software.
             | 
             | Data scientists main focus is developing models, so they
             | spend a HUGE amount of time developing models, tweaking
             | models, finding data, and cleaning data, and write basic
             | software to achieve some of those goals.
             | 
             | I wouldn't expect Albert Einstein, Mozart, or Beyonce to be
             | some fantastic software developer just cause they're smart
             | individuals. I'd expect people who spend a lot of time
             | writing software to generally be the ones who write decent
             | software.
        
             | naomisperfume wrote:
             | This is a valid point, but I think in part the data science
             | / machine learning cycle doesn't reward careful development
             | like software engineering does.
             | 
             | Most of the time you're just testing the viability of
             | things and just want to fail fast if they don't work.
             | Getting too attached to some pipeline or abstractions is
             | usually a bad idea.
             | 
             | I don't think this excuses bad code, but I kinda get it
             | when it happens, since you can't just use best practices
             | from the beginning like in software development
        
             | MengerSponge wrote:
             | As a researcher, this is flattering, but probably wrong.
             | Reading other people's code always sucks, but reading
             | somebody's solo project hero code that doesn't follow any
             | typical design patterns? That's nightmarish.
             | 
             | I'd argue it's more about startup time, or whatever the
             | technical term is for the time between when you first start
             | reading code and when you've built enough of the model in
             | your working memory to be able to touch it without breaking
             | everything.
             | 
             | That heroic developer doesn't need to concern themselves
             | with other people's experiences, and they can just write
             | whatever weird idioms suit their brain.
        
             | patates wrote:
             | It's always easy if you're the only one writing/maintaining
             | the code.
             | 
             | Also, if your job is writing code, I'd bet you'd have less
             | difficulty manipulating spaghetti code.
        
         | admissionsguy wrote:
         | This code is above average by the standards of academia I've
         | dealt with. You've got comments announcing what each part of
         | the code is doing, and there is only one copy of each script.
         | Normally, you would expect to have multiple nested directories
         | with various variations of the same code, plus multiple
         | variations of each script with cryptic suffixes.
        
         | fullshark wrote:
         | Yes, data scientists frequently are working alone and don't
         | care about readability so much as results. Also the code isn't
         | going into production so they don't care about optimizing it by
         | and large.
        
           | checker wrote:
           | Sure, but physPop's concerns still stands - how can one be
           | confident about the results? Is it just eyeballing "this
           | looks right"? If so, how many anomalies are missed due to
           | handwaving, and how many results are inaccurate?
           | 
           | I'm genuinely curious because I understand the need to move
           | fast but is accuracy a necessary sacrifice? (or is there a
           | trick I don't know about)
        
             | fullshark wrote:
             | There's definitely a greater risk of a bug leading to
             | misleading results. There's no real unique trick other than
             | possibly someone else trying to replicate the results and
             | catching an error, or trying to use the code on another
             | data set and catching a mistake.
        
             | epistasis wrote:
             | As other comments indicate, how could you be sure the tests
             | are testing the right thing? It's easy enough to slap on a
             | few assert statements (or stopifnot statements, since it's
             | R) about number of rows or something, but that's no
             | replacement for manual inspection of the data. A code-based
             | test can not be invented that will ever substitute for
             | looking at the data in the raw, plotting it, and
             | verification using your full mental capacities.
             | 
             | The only way to make sure it's right is the same way you'd
             | do full verification of other code, going through it line
             | by line and making sure it's doing the right thing. Tests
             | can not do that, they can only assert some forms of intent,
             | and they are not good at catching the types of errors that
             | result from ingesting varied data from lots of sources and
             | getting it into modelable form. It's ETL plus a bunch of
             | other stuff going on here.
        
               | checker wrote:
               | I didn't make the claim that a code-based test can be
               | invented that will substitute for looking at the data in
               | the raw, plotting it, and verification using your full
               | mental capacities.
               | 
               | However, "the only way to make sure it's right is to ...
               | go through it line by line" is a bold claim. There are
               | multiple named functions and some unnamed functions in
               | this example that could be verified for programmer
               | mistakes (such as typing 1000 vs 10000) and edge case
               | handling (edge cases that often arise from messy ingested
               | data).
               | 
               | But even if they were tested, I'll concede that mistakes
               | can still be made.
        
             | alexpetralia wrote:
             | If I understand correctly, tests are for what the code
             | _should_ do. There is some business logic and you are
             | asserting that the business logic does what it should do
             | (the test). If you manipulate the business logic, and it no
             | longer does what the test says the code should do, the test
             | fails.
             | 
             | Here it is not as clear what the code _should_ do. What
             | _should_ the amount of excess deaths be? In what ways would
             | we change the logic such that the test case would break? If
             | the input data set is static, isn 't it more like a mock
             | anyway?
             | 
             | I think for this reason you often see more sanity checks in
             | research code because the _should_ case is not as clearly
             | defined.
        
               | stonemetal12 wrote:
               | >What should the amount of excess deaths be?
               | 
               | With several fake known inputs and there associated
               | outputs we should be able to determine if the calculation
               | is right.
               | 
               | The result on the real world data is not known but when
               | calculating a statistic you should be able to figure out
               | if you are calculating the right statistic or returning
               | 42 for all inputs.
        
             | cinntaile wrote:
             | That's why it's a good thing that the code is open. The
             | usual standard is the same type of code but without public
             | access. This is definitely a step in the right direction.
        
         | lanevorockz wrote:
         | That is quite common for R language, it is a scripting language
         | that you are suppose to explore as you use. So if you want to
         | test it, it's very simple ... Just run the script until the
         | part you are interested and then plot the hell out of it.
         | 
         | It's widely known on the "theory of testing" that data oriented
         | systems can't be tested without immense amount of effort. If
         | you are interested there are plenty of academic talks about the
         | subject.
        
         | kspacewalk2 wrote:
         | As already pointed out, this isn't an engineering product.
         | They've released the source, did some sanity checks, and moved
         | on with their busy lives. If you find an error - cool, go ahead
         | and report it.
        
           | Glavnokoman wrote:
           | By sanity check you mean shows the numbers I wanted to see?
           | Not saying there's necessarily something wrong in this
           | particular code but I know first hands how little effort is
           | put into validating the scientific codes and how much of
           | produces just random crap.
        
             | kspacewalk2 wrote:
             | Which is why releasing their code is critical. They lay it
             | all out for you, with their pride/reputation on the line if
             | it produces 'just random crap'. Writing exhaustive unit
             | tests is probably not something they are good at, nor is it
             | a requirement for them, nor is it the best use of their
             | time.
             | 
             | I too know first hand that students and even academics who
             | aren't properly trained and given the right tooling will
             | write prototype code that's not production ready.
             | Occasionally there's even a bona fide mistake, though
             | usually it's just very un-generalizable. It's part of my
             | job to make some of it more production ready. That's the
             | best use of _my_ time and training. Conversely, I 'm shit
             | at generating useful research ideas or writing papers. Just
             | don't have the right combination of
             | intuition/training/experience. Good thing my academic
             | institution has the resources to employ them _and_ me.
        
             | epistasis wrote:
             | How can you know which numbers you want to see?
             | 
             | With tests it is often best practice to write the tests
             | before writing the code that gets tested. But what if you
             | don't know ahead of time, and can't possibly ever know
             | ahead of time?
             | 
             | Writing "tests" for this would be about adding stuff
             | afterwards, like "the max and min values of this column
             | were X and Y". But that is expected to break, if anything
             | changes, because it's not testing anything useful.
             | 
             | My question is: what is one concrete test here that would
             | be useful and actually provide confidence that the code is
             | doing what it should be doing, and how does that test
             | provide better sanity checking than inspecting the tables
             | at each step of the process?
        
         | omginternets wrote:
         | This is pretty good by data-science (and other research)
         | standards.
        
       | da39a3ee wrote:
       | Is this estimating the right quantity? Wouldn't it be more
       | relevant to estimate the total number of years of human life lost
       | due to covid, rather than the number of deaths attributable to
       | covid?
        
         | fullshark wrote:
         | Economists and public policy wonks should do this for sure, as
         | we make sense of the costs and benefits of certain policies. I
         | imagine there will be 10 years of papers to come from this, and
         | no agreement at all on what policies were worth it.
        
         | ceejayoz wrote:
         | I'd imagine that's a lot more difficult.
        
           | falcor84 wrote:
           | It would be more difficult, but I don't think it has to be "a
           | lot more difficult". The code could make a lookup in the
           | actuarial life tables for each country to get the expected
           | remaining years of life for a person of that age, and then
           | just aggregate these, no?
        
             | klmadfejno wrote:
             | Ballpark yes, but that's likely a huge overestimate, as
             | even normalized to age, its people with co-morbidities that
             | have the lion's share of deaths.
        
               | falcor84 wrote:
               | Yeah, you can of course keep approximating it. The next
               | adjustment would then probably be to use something like
               | quality-adjusted-life-years (QALY) based on the persons
               | co-morbidity, and then (if you want to) your could also
               | take it the other way and reduce QALY for survivors with
               | long-covid.
        
           | codyb wrote:
           | Definitely an interesting problem though.
           | 
           | You'd probably want to gather life expectancy rates for
           | individuals once they reached five years of age (since infant
           | mortality necessarily lowers life expectancy but after a
           | cliff people who make it tend to live longer than the average
           | with a bunch of 0s, 1s, and 2s in it) for each country in
           | question.
           | 
           | Then you'd need stats for the age of the dead in each country
           | which I guess you don't really have in most cases since the
           | deaths aren't aggregated anywhere since they haven't been
           | attributed to COVID.
           | 
           | Thanks to the Economist for publishing their model. Would be
           | neat to see more stuff like this. I haven't delved too deeply
           | into the code (several 1000 line scripts which would take
           | some sussing out) but it's nice people will have real world
           | projects they can delve into.
        
         | bloak wrote:
         | I don't think that "total number of years of human life lost"
         | is a good general-purpose measure. Specialists could perhaps
         | apply that measure to the deaths of old people (people over
         | retirement age?) but it would be crazy, in my opinion, to apply
         | it to the death of a baby or a teenager.
         | 
         | As the hypothetical trolley controller, how many 50-year-olds
         | would you kill to save one 16-year-old? And would you kill the
         | 16-year-old to save a one-week-old baby?
        
           | fullshark wrote:
           | I thought regulatory bodies considered age when considering
           | regulations and the value of statistical life. Hence why
           | regulations regarding kids toys / child safety products are
           | so much stronger than say a toy for adults. Quick google
           | search doesn't confirm or refute this, merely that the
           | average VSL in america is 10 million.
           | 
           | https://en.wikipedia.org/wiki/Value_of_life
           | 
           | This does have this tidbit:
           | 
           | "Historically, children were valued little monetarily, but
           | changes in cultural norms have resulted in a substantial
           | increase as evinced by trends in damage compensation from
           | wrongful death lawsuits.[38]"
        
           | benlivengood wrote:
           | > As the hypothetical trolley controller, how many 50-year-
           | olds would you kill to save one 16-year-old? And would you
           | kill the 16-year-old to save a one-week-old baby?
           | 
           | Organ donation prioritization does this analysis all the
           | time. They don't publish the exact rules they use for
           | matching but the most important are urgency, compatibility,
           | locality, and survival benefit. Survival benefit is almost
           | literally expected quality life years (QALY) from an organ
           | transplant. A 98-year old will not get an organ that could
           | give a 50-year old another 20 years of life, but a 20-year
           | old that can survive for another year will not get an organ
           | that the 50-year old needs this week. Compatibility and
           | locality are the biggest problems apart from basic
           | availability (too few organ donors, too many patients needing
           | transplants). Once we can print/grow organs the calculus will
           | reduce to prioritization on urgency and then QALYs if supply
           | is still constrained in some way.
        
         | efxhoy wrote:
         | Change in quality adjusted life years, or variants thereof, are
         | one standard for health-economic evaluation but are obviously
         | very hard to get quality data on.
         | 
         | That would capture not only years of life lost but also the
         | quality of those years. Long-covid seems to be a serious issue
         | for a lot of people and an accurate estimate of that impact
         | would be very valuable.
        
         | Taek wrote:
         | You could make an argument that society as a whole should start
         | looking at things that way, but personally I'd have no basis
         | for understanding how that translated to damage. 3 million
         | people dead is a number that I have a lot of mental models
         | around, it's easy for me to get a sense of magnitude that way.
         | 50 million human years lost is a lot less relatable number and
         | I'm not sure how to contextualize it.
         | 
         | It might also be interesting to measure the cost of the
         | pandemic in terms of economic harm, how much the GDP dropped,
         | how much production was lost, how many stores closed, etc.
        
           | da39a3ee wrote:
           | > 50 million human years lost is a lot less relatable number
           | and I'm not sure how to contextualize it.
           | 
           | On way of contextualizing it that doesn't equate lives with
           | economic output is to divide by life expectancy: if life
           | expectancy is 70 years then 50 million years lost ~= 0.7
           | million human lives that never happened, or 0.7 million
           | healthy newborns that died preventable deaths.
        
             | Taek wrote:
             | In some sense that's not interesting though, right? A
             | newborn that never made it also had very few resources
             | poured into them. From a purely utilitarian perspective,
             | the worst age to die is 20-22, when you've got an education
             | and a long history of people devoting time to making you a
             | productive member of society, but you've had no space to
             | contribute anything back yet.
        
               | da39a3ee wrote:
               | I think there's a straightforward counter to that:
               | 
               | > From a purely utilitarian perspective, the worst age to
               | die is 20-22,
               | 
               | that's easily adjustable: if 95% of newborns become
               | healthy 20 yr olds, then just multiply the number of
               | newborns by 0.95. The harder part is estimating total
               | number of years lost, but it doesn't sound particularly
               | out of the ordinary for a research problem in
               | statistics/epidemiology.
        
           | xwdv wrote:
           | It's easy to contextualize. 50 million human years lost means
           | that people would have to put in 50 million human years worth
           | of work to recover the economic loss of COVID-19.
           | 
           | Consider the average human worker. In their life, they will
           | output around 10.5 human years worth of work. 50 million
           | years represents the total output of about 4.76 million human
           | lives.
           | 
           | So if you delete the production of 3 million people due to
           | COVID deaths plus all the time spent in lockdowns, 50 million
           | human years lost sounds about right, but probably could be
           | higher. I'd say it's probably about 80 million human years
           | lost.
        
             | germinalphrase wrote:
             | Perhaps the value of a human life is that they are alive,
             | not the hours of work they produce during that lifetime.
        
             | bb611 wrote:
             | Valuing people as primarily their economic output is anti-
             | social to the point of being dystopian, and we have much
             | better economic indicators than arbitrary heuristics when
             | we do want to understand the economic damage of the
             | pandemic.
        
               | NovemberWhiskey wrote:
               | Right. Think of this as "50 million years of
               | opportunities for grandchildren to get to know their
               | grandparents or 50 million years of time spent as a widow
               | after your elderly husband dies" as well as just the
               | economic cost.
        
             | Polygator wrote:
             | This assumes that the lost human years are productive
             | years, which I think is dubious at best given the
             | demographics of Covid death.
             | 
             | We could count "years of productive work lost", but that
             | feels like a very cynical way to look at it.
        
           | anoncake wrote:
           | Compare it to the world population. It's currently about 7.8
           | billion. So if 50 million years are lost in total, everyone
           | lost on average 50 million years / 7.8 billion ~= 2.4 days of
           | life.
        
             | mensetmanusman wrote:
             | This way of looking at human suffering might be counter
             | productive:
             | 
             | "oh, those slaves in [country] only cost me 0.03$/year, no
             | big deal."
        
         | thebruce87m wrote:
         | This article tries to do this:
         | 
         | "1.5 million potential years of life lost to COVID-19 in the
         | UK, with each life cut short by 10 years on average"
         | 
         | https://www.health.org.uk/news-and-comment/news/1.5-million-...
        
       | dang wrote:
       | Recent and related:
       | 
       |  _There have been 7M-13M excess deaths worldwide during the
       | pandemic_ - https://news.ycombinator.com/item?id=27177503 - May
       | 2021 (457 comments)
        
       | fortran77 wrote:
       | It would be interesting if there are fewer deaths than expected
       | in the next few years.
        
         | dublinben wrote:
         | Possibly. This is known as mortality displacement.
         | 
         | https://en.wikipedia.org/wiki/Mortality_displacement
        
         | cm2187 wrote:
         | I don't know what fancy stats the economist does (and I don't
         | trust fancy stats), but I looked at the french stats [1]
         | recently and my back of the envelope approach [2] suggest 50k
         | excess deaths. There was 55.7k more deaths in 2020 vs 2019
         | (which didn't seem to be an extraordinary year). Over a 25y
         | period, the number of deaths increases by 3.4k every year on
         | average (population growth).
         | 
         | The number of cumulative covid deaths in france is 109k, of
         | which 65k to the 1st of Jan 2021 [3]. So the excess deaths seem
         | to be in line with covid deaths.
         | 
         | Now you could get a very different result for another country
         | depending on how well covid deaths are reported. Also difficult
         | to predict how this will look like in 2021. Covid deaths seem
         | to be predominantly people near their end of life, but on the
         | other hand the delay of lots of medical procedures as the
         | result of the lockdown should have its own excess death, plus
         | impact of change in crime, accidents, increase in poverty, etc.
         | God knows what will be the net effect of all this.
         | 
         | [1] https://www.insee.fr/fr/statistiques/2383440
         | 
         | [2]
         | https://zbpublic.blob.core.windows.net/public/excessdeathfr....
         | 
         | [3] https://www.statista.com/statistics/1103422/coronavirus-
         | fran...
        
         | Angostura wrote:
         | Looking at the official numbers in the UK, deaths seem to fall
         | below the expected levels for a week or so, following a big
         | spike in covid deaths.
        
         | phreeza wrote:
         | It shouldn't, because expected deaths in a time period should
         | be proportional to the living population. If half your
         | population dies in a war, your model should predict half as
         | many deaths the following year, unless you don't update the
         | model.
        
           | klmadfejno wrote:
           | That's only true if a random sample of the population dies.
           | If we assume (and I'm making this up) that in the steady
           | state, the elderly comprise 90% of deaths, then if a war
           | kills only young people, you'll expect a substantial increase
           | in proportional death rate.
        
             | phreeza wrote:
             | I was simplifying, the point is that the model should
             | always reflect the current state of the population. You
             | shouldn't expect the kind of "catch up" effects that OP is
             | referring to, unless you have static predictions that don't
             | take into account actual deaths and births.
        
         | mrcartmenez wrote:
         | In the long run we'll all be dead.
         | 
         | So if you killed everyone on earth the mortality rate for the
         | next eternity would be 0.
         | 
         | Please don't tell the singularity
        
           | jackbrookes wrote:
           | Mortality rate would be undefined (0/0)
           | 
           | ;)
        
         | JshWright wrote:
         | If it "averages out", that still means millions of life-years
         | lost.
        
         | froh wrote:
         | the point with death is that once you're dead it's over. and it
         | won't "average out" in the sense of life expectancy. it gives
         | the life expectancy growth a dent. it will only average out in
         | the sense that everybody alife will die some day.
        
           | sumtechguy wrote:
           | I wonder if they mean like the thing like what people say
           | about DST causing deaths. But if you look on the large scale
           | that blip does not exist. Standupmaths did a segment on it a
           | few years ago. Basically the idea was 'yes you were going to
           | die, and DST seemed to make it happen' _but_ it seemed to
           | only make it happen a few days sooner than it normally would
           | have. Maybe that is what they are talking about? I personally
           | know a couple of people who were marked as dying from covid.
           | But the reality was they were going to die very soon. They
           | both had very advanced stages of Alzheimer 's.
        
             | froh wrote:
             | as long as it's very.clear that the age distribution of
             | surplus mortality in times of covid is heavily drawing from
             | other than Alzheimer's dead-soon-anyhow; as long as it's
             | very clear that covid-19 kills otherwise healthy diabetics,
             | overweights, heart patients, cant-afford-healthcare people,
             | was-misguided-about-indoor-events people, multiple
             | sclerosis patients, the list goes on, and on and on, ....
             | 
             | as long as that is crystal clear, that "two old dead soon
             | anyhow Alzheimer's patients" completely misrepresent the
             | 500.000+ dead US citizens...
             | 
             | ... such an anecdote is perfectly fine ...
        
             | CrazyStat wrote:
             | There have been a number of studies (e.g. [1], [2], [3])
             | published estimating how many years of life have been lost
             | to COVID-19. The estimates vary between ~10-15 average
             | years lost per COVID death.
             | 
             | [1] https://www.nature.com/articles/s41598-021-83040-3
             | 
             | [2] https://www.news-medical.net/news/20201021/More-
             | than-25-mill...
             | 
             | [3] https://www.sciencedaily.com/releases/2020/09/200923124
             | 557.h...
        
         | jmull wrote:
         | There will still be years of life lost per dead person.
         | 
         | We'll all be dead eventually, but most of us aren't OK with
         | losing X years of it.
        
         | seventytwo wrote:
         | It might... but it would still indicate that many living-years
         | had been lost.
        
       | narush wrote:
       | No matter the quality of the code presented here, thanks to The
       | Economist for posting this! We all really appreciate it, even if
       | we don't always seem like it :)
       | 
       | Even if it seems justified to shit on the code as is (bad style,
       | lack of comments, no tests, whatever), all it does is discourage
       | similar companies / researchers / folks in academia from posting
       | their code as well. So then we end up with the same code, but now
       | no one can see it - which is the worst of all worlds.
       | 
       | Let's support / work towards a culture of sharing our code before
       | anything else. And if there's something you think can be
       | improved, maybe consider opening a PR with improvements to really
       | _show the value_ to all these parties of posting their code!
        
         | pkilgore wrote:
         | Code like this is like (not) caring about memory leaks in a
         | Missile Control module where the explosive is the garbage
         | collector:
         | 
         | When code is one shot, accomplishes its goal, and does not need
         | to be extended or relied upon in the future, why do all these
         | things?
         | 
         | Is there any goal above that is not also serviced, at least in
         | significant part, by open-sourcing it?
         | 
         | Bravo to The Economist for releasing it. This is unabashedly a
         | Good Thing.
        
         | Taek wrote:
         | Amen!
         | 
         | When we have the code, we can all work together to improve the
         | model, add the tests, find the bugs.
         | 
         | And honestly, research shouldn't be considered valid if all of
         | the code and data isn't available. Reproducibility is one of
         | the pillars of science, and that's not possible without code
         | and data.
        
         | epistasis wrote:
         | I would love to see the tests that people come up for this...
         | 
         | Tests can be helpful and informative, or they can be make-work
         | boilerplate that will never reveal any flaws. And in this type
         | of code base, it's hard to imagine the test that's written in
         | code that will be informative.
         | 
         | When dealing with data, "testing" is most often done visually,
         | with plotting.
         | 
         | The process of exploratory data analysis is not anything at all
         | like writing code to an architected specification, and the same
         | practices are not as effective for different types of work.
         | Insisting that there should be automated tests for something
         | like this reminds me of the cultural phenomenon of extreme
         | table-phobia in the 2000s that happened as a reaction to using
         | tables for layout. I had trouble getting people to believe that
         | is was OK to use tables for tabular data, it was extremely
         | frustrating. Or in the last 5-10 years, with design trends
         | preferring spare layouts with lots of empty space, it was super
         | hard to get information dense designs into production, because
         | they were so averse to the idea that people may need
         | information density for some things.
         | 
         | I'm also reminded of this video about the deficiencies of
         | notebooks:
         | 
         | https://youtu.be/7jiPeIFXb6U
         | 
         | Yes, there are huge deficiencies to notebooks. But using them
         | is an entirely different task than the type of code that one
         | would write in an IDE, for code that gets put into a library
         | and will be run by others. I have two work modes: 1) IDE
         | (actually Emacs) where I write tests and the code is meant to
         | be reusable, even if it's a one-off script, then 2) exploratory
         | data analysis and processing, in notebooks. Though both of
         | these use code, they do not and should not share much else in
         | the way of practices.
         | 
         | We develop best practices in fields to work against our worst
         | instincts. And once we establish best practices, we become
         | strict about them to make sure we don't fall back into bad
         | practices. But when approaching a different field it's good to
         | reevaluate these best practices to see if they still fit.
        
           | jfoutz wrote:
           | I think it always makes sense to have an oracle, or a few.
           | Nothing super fancy, just a few values you _know_ have a
           | certain outcome.                  f(4,5) == 6
           | 
           | is good enough. just, whatever you type in to test as you
           | develop the code, write it down and save it for later. I
           | mean, yeah, you can go nuts with boilerplate. But, I always
           | seem to regret not having written down a handful of cases. If
           | nothing else, it helps me remember what the heck this
           | function is for, and a few samples of expected input and
           | output, along with source are more immediately useful (to me)
           | than comments.
           | 
           | Opinions differ. This isn't a hill I'll die on - as evidenced
           | by my lack of tests from time to time. But I've never
           | regretted the few minutes to provide them, and I have
           | regretted not providing them
        
             | epistasis wrote:
             | That sort of test doesn't really apply here in any
             | meaningful way.
             | 
             | Suppose you want to validate that a certain column is in
             | the file. You could place your assumptions around that
             | explicitly in the read_tsv function call and have it
             | generate the error for you on reading the file, or it can
             | generate an error when you try to use that column. Either
             | way, there's no "test" yet the intention is clearly stated
             | in the code, and the code will simply fail out rather than
             | proceed if it encounters input that can't be meaningfully
             | used.
             | 
             | This code is in many ways a declarative use of other code,
             | describing the data, doing some joins and filtering, and
             | plugging it into well tested libraries of code. The data
             | structures that store the data and the library functions
             | that deal with the data are designed in a way that bugs
             | often result in errors immediately.
             | 
             | In many ways it's like writing a SQL statement. One could
             | make test data tables and make sure that the SQL statement
             | does what one expects, but the SQL statement already
             | embodies intent in the same way that a test of code would
             | embody intent.
        
               | jfoutz wrote:
               | So, I'm not an R guy, but I pawed through some of the
               | code. I get what you're saying, there's an environment,
               | it needs to be just so, or it won't work. that's fine.
               | And yeah, it does feel a lot like SQL.
               | 
               | I would point out this guy -
               | https://github.com/TheEconomist/covid-19-the-economist-
               | globa...
               | 
               | that sort of has the look of something that took a pass
               | or two to get right. And I think, would be nice to have a
               | test case (sample call).
               | 
               | Again, not a hill I'm going to die on.
               | 
               | Overall, I don't think we have a big difference of
               | opinion, yeah, it looks like gluing together libraries -
               | as a non, native R speaker, I think I can make sense of
               | the project, it's cool they put it out there, and I don't
               | think there's anything _wrong_ with what they've got.
        
               | nojito wrote:
               | That function is pretty standard.
               | 
               | If you want to run a loess, it runs the loess function if
               | not it runs a windowed average.
               | 
               | What exactly would you test it for?
        
               | epistasis wrote:
               | This is actually one case where I could see jfoutz's
               | point!
               | 
               | If I were developing this, I'd probably be running it in
               | a notebook, and inspecting output on a few rows from the
               | data frame. It would be easy enough to capture both the
               | input and output and throw it into a stopifnot to record
               | the testing that was performed interactively.
        
         | lanevorockz wrote:
         | Data should only be taken seriously when the full context is
         | shared for reference. Somehow internet created a generation of
         | TLDRs or ELI5 which just is very sensible to lies and
         | manipulation. Even though I can build trust to my sources, I
         | still think it is mandatory to have access to the raw data.
        
           | mikepurvis wrote:
           | The Economist has been publishing since 1843. I'd imagine
           | that for the vast majority of that time, readers have trusted
           | them with either no details at all about methodology or
           | little more than a high level overview of what was done--
           | certainly there was no expectation of being able to reproduce
           | and iterate on the full analysis locally, even thirty years
           | ago, much less in 19th century.
           | 
           | Don't get me wrong-- this is a good direction to be pushing
           | in. But it's something that's been very much enabled by the
           | internet, which is why I'm finding it weird to be blaming the
           | prior status quo on internet culture attention span memes.
        
           | IAmEveryone wrote:
           | > Somehow internet created a generation of TLDRs or ELI5
           | 
           | No, it has created a generation. of conspiracy-mongering
           | freaks.
           | 
           | Or, more precisely, it has allowed that particularly seedy
           | underbelly of society to become more noticeable.
        
         | jarenmf wrote:
         | The code is not even that bad. It's a simple script generating
         | a plot. I think it's too much to expect tests and documentation
         | for it.
        
           | stephc_int13 wrote:
           | I took a quick look, and I found it readable and easy to
           | follow, contrary to 98% of the codebase I see on github.
        
         | rich_sasha wrote:
         | > Even if it seems justified to shit on the code as is (bad
         | style, lack of comments, no tests, whatever), all it does is
         | discourage similar companies / researchers / folks in academia
         | from posting their code as well.
         | 
         | I agree that code should be published alongside papers, and
         | great that the Economist, being a newspaper, publishes their
         | code too.
         | 
         | But when it is the most reputable sources (researchers,
         | universities etc) publishing code of quality so low cannot
         | possibly be correct (looking at you Imperial), it totally
         | should be scrutinised and critiqued.
         | 
         | I think the root cause is, where research is code-based (I
         | guess 90% of science), it should pass the most basic
         | correctness tests _before_ a paper is accepted, or indeed
         | turned into national policy. There are many angles to bite this
         | apple from, but what remains unacceptable is shit code being
         | taken at face value.
        
           | monkey_monkey wrote:
           | > publishing code of quality so low cannot possibly be
           | correct (looking at you Imperial)
           | 
           | If this is referring to the quality of the model developed by
           | Neil Ferguson @ Imperial College, John Carmack audited the
           | code and didn't have a big problem with it:
           | https://twitter.com/ID_AA_Carmack/status/1254872368763277313
        
         | nojito wrote:
         | The code doesn't even matter in this instance.
         | 
         | The discussion should be on the model design and whether it's
         | sound.
         | 
         | Leave it to HN to start criticizing code whenever anything is
         | posted.
        
           | CrazyStat wrote:
           | The code absolutely does matter, as we need to be confident
           | that it is a correct implementation of the model. To turn
           | your first sentence around, if the code is incorrect than the
           | model doesn't even matter.
           | 
           | I've done my share of implementing statistical models in code
           | and have seen plenty of examples of incorrect code failing to
           | implement the model correctly.
        
             | nojito wrote:
             | Your second paragraph doesn't match the first.
             | 
             | Simply clone the repo, download the data, and verify the
             | results.
             | 
             | The discussion needs to not be on the code but the quality
             | of the assumptions in their model.
             | 
             | Which you can read here.
             | 
             | https://www.economist.com/graphic-detail/2021/05/13/how-
             | we-e...
        
       | splithalf wrote:
       | Some muddled thinking up in this thread. Testing is good for code
       | implementation but the risk for code such as this mostly lies in
       | untestable aspects like the assumptions built into certain types
       | of statistical routines and measurement/definition problems. The
       | only answer for these issues is independent replication. Software
       | thinking emphasizes reusing code, but science should want the
       | opposite. Replication really matters, even if we don't want to
       | accept that inconvenient fact. Great work with perfect code will
       | fail to replicate and not because there were problems with the
       | initial work, it's statistically certain.
        
       | lanevorockz wrote:
       | Why they feel the need to fabricate numbers ? Covid Death numbers
       | counted even people that tested negative which means that even
       | the first numbers were already inflated.
       | 
       | This this is just getting way out of hand.
        
         | eckmLJE wrote:
         | The purpose of the excess death model is to measure the real
         | number of deaths without having to argue about whether they
         | were caused by covid infection or not. It is simply the
         | difference between the typical number of deaths in a given time
         | and the real number of deaths -- unless you're saying that
         | death certificates are being fabricated altogether?
        
           | lanevorockz wrote:
           | You are being silly here ... We know for a matter of FACT
           | that the death count was exaggerated in order to make sure
           | the pandemic could be tackled. For example, people that died
           | and had direct contact with someone with covid are
           | AUTOMATICALLY covid deaths.
           | 
           | I know that The Economist is not an honest source but we
           | should still be rational to think about these. If we succumb
           | to eternal politicisation, there is no hope for this society
           | anymore.
        
             | scld wrote:
             | I haven't heard any stories about entire death certificates
             | being fabricated.....you'd either have hundreds of living
             | people who were considered dead by the state, or you'd have
             | entirely fabricated people showing up as "dead".
        
             | bitexploder wrote:
             | You completely ignored their point. We don't care about
             | COVID-19 in this discussion. We care about how many people
             | were dying from any cause. Then you attack the economist as
             | a dishonest source when they literally provided their data
             | and made it very easy to audit and verify their claims.
             | Join the actual conversation here, please.
        
       ___________________________________________________________________
       (page generated 2021-05-25 23:01 UTC)