hngopher.com

       [HN Gopher] The Data Science Manifesto (2020)
       ___________________________________________________________________
        
       The Data Science Manifesto (2020)
        
       Author : alexmolas
       Score  : 24 points
       Date   : 2023-09-17 20:17 UTC (2 hours ago)
        
 (HTM) web link (datasciencemanifesto.org)
 (TXT) w3m dump (datasciencemanifesto.org)
        
       | mkl95 wrote:
       | > Clever use of computation over convenient assumptions
       | 
       | Making falsifiable assumptions is legit. I don't know what
       | "clever use of computation" is supposed to mean.
        
       | RcouF1uZ4gsC wrote:
       | I think pretty much all manifestos, especially in software, are
       | worse than useless and do more harm than good.
       | 
       | Can anyone point to a counter example of a helpful manifesto?
        
         | tomrod wrote:
         | I've greatly enjoyed Agile, personally, though have heard and
         | read about horror stories
        
       | cactusfrog wrote:
       | I don't see any value in these principles.
       | 
       | Minimal Viable Products over prototypes
       | 
       | What's the difference?
       | 
       | APIs over databases
       | 
       | I feel like this is a terrible decision that has nothing to do
       | with data science.
       | 
       | Clever use of computation over convenient assumptions
       | 
       | Why not both?
       | 
       | Dashboards over reports
       | 
       | Dislike this as well. Dashboards don't contain analysis.
       | 
       | Validation, scrutiny and repeatability over convention and ad
       | verecundiam
       | 
       | Sure, but this is not controversial
        
         | mhh__ wrote:
         | > Why not both?
         | 
         | A financial example would be that some models are designed to
         | be particularly tractable when those in the know could use a
         | tool like autodiff to make a more sophisticated model more
         | usable at the same scale
        
         | mhh__ wrote:
         | Also I think the API thing if some right is the right way.
         | Databases can become an unversioned blob for too quickly.
         | 
         | I don't think raw APIs are really it either, maybe a persistent
         | airflow-y DAG type thing that's secretly backed by a proper
         | database?
        
       | zmk5 wrote:
       | I think it needs a more memorable intro. It's a manifesto! It
       | should express more emotion if you are issuing proclamations like
       | this otherwise you are just giving bullet points.
        
       | [deleted]
        
       | tomrod wrote:
       | Builds on https://agilemanifesto.org/, maybe overly so.
       | 
       | I disagree with some of these points:
       | 
       | - Minimal Viable Products over prototypes: depends on use case.
       | Prototypes within a timebox evaluation are helpful and don't have
       | the overhead of delivery. Maybe better: prototypes during
       | discovery
       | 
       | - APIs over databases: nah, use both.
       | 
       | - Clever use of computation over convenient assumptions : if the
       | assumptions are well founded, calibrated from external
       | references, etc., then no issue. For example, you don't need to
       | perform raw research to understand how many joules heat water by
       | a degree Celsius
       | 
       | - Dashboards over reports: depends on the use case. Dashboards
       | generally limit use choice.
       | 
       | - Validation, scrutiny and repeatability over convention and ad
       | verecundiam: Reasonable (though _argument by authority_ is the
       | more common name than _ad verecundiam_ ).
        
         | czep wrote:
         | I lean towards your viewpoint as well. Their assumptions
         | (axioms, postulates?) are highly controversial, while the
         | actual principles seem quite sound to me.
         | 
         | The only issue I can see is with #5. I would argue for decision
         | making, you absolutely need a single metric, otherwise the
         | process collapses into bickering over which measure is more
         | important at the time (often for political or interpersonal
         | reasons). The point is a bit vague on what exactly is being
         | evaluated (product quality, which means what?). For launching
         | products or running A/B tests, aim for a single metric as your
         | decision framework. If you must have more than one, then be
         | explicit about the tradeoffs in a flowchart: e.g., "if X is >
         | 0, we launch. If x <= 0, but y > 2%, we launch, otherwise no
         | launch".
        
       | alex_lav wrote:
       | The only one of these that I think is even remotely valid is:
       | 
       | > Validation, scrutiny and repeatability over convention and ad
       | verecundiam
       | 
       | The rest I just flatout disagree with. This sort of feels like a
       | Project Manager's Data Science Manifesto. Which...is fine if it's
       | titled as such. Otherwise...no thanks.
        
       | TrackerFF wrote:
       | Once you become the "dashboard guy" at your organization, it's
       | game over - that is all you'll be doing - there is literally a
       | never-ending demand of dashboards in any org that works with lots
       | of data, and those dashboards will grow into full-blown apps,
       | sooner or later.
        
         | quickthrower2 wrote:
         | Is "game over" good or bad in this context? Maybe good for job
         | security, bad for career?
        
         | [deleted]
        
       | marius_k wrote:
       | From now on you are not doing the real data science. But you can
       | hire me to tell how the real data science is done.
        
       | mrkeen wrote:
       | > Validation, scrutiny and repeatability over convention and ad
       | verecundiam
       | 
       | Does repeatability mean you can't change your product based on
       | the outcome of an A/B test? Because you probably won't get a
       | chance to repeat those conditions.
        
       | nologic01 wrote:
       | An https site would be advisable. check out lets encrypt.
        
         | [deleted]
        
       | mrkeen wrote:
       | > 2. Data science is about solving problems, not models or
       | algorithms.
       | 
       | I don't like this. How about:
       | 
       | You may not have a problem. If you have a problem, you may not be
       | able to detect it. If you detect it, it may be a false positive.
       | Your intervention may make things worse. You might not detect
       | that you made things worse. The whole operation may have been a
       | statistical illusion made up of small sample size, insufficient
       | blinding, and insufficient control.
        
       | quickthrower2 wrote:
       | They have an interesting links page:
       | http://datasciencemanifesto.org/links/
       | 
       | This one seems like a better manifesto actually:
       | https://statisticsblog.com/manifesto/
       | 
       | APIs over Databases links to Martin Fowler's microservices
       | https://martinfowler.com/articles/microservices.html
        
       | civilized wrote:
       | I've been a data scientist for over a decade and have delivered
       | on a wide range of high-impact projects and products. I don't
       | recognize this as meaningful or helpful to my field. More of a
       | me-too from an inexperienced perspective.
       | 
       | The bulleted list at the top seems to be a random collection of
       | irrelevant, almost unintelligible thoughts, stuffed cargo cult-
       | style into the template of the agile manifesto. What does "APIs
       | over databases" mean? It sounds like "oranges over apples" to me.
       | 
       | The numbered list of principles is better, but still not always
       | that helpful. There's a strange spirit of perfectionism and
       | inflexibility, e.g. in #3 and #5. Maybe some data scientists have
       | time to automate everything they do, but I don't think it's a
       | good general principle. #4, 6, and 7 are better. But overall
       | there is a disorganized, random, unmotivated feel to both lists.
       | 
       | Perhaps something better could be developed, but for now, I think
       | data scientists looking for a manifesto would be best served by
       | going back to the original agile manifesto
       | (https://agilemanifesto.org/) and reflecting on how it applies to
       | our field. Which it doesn't, at least not universally. But it
       | didn't apply universally to software engineering either. Just
       | replace "software" with "data analysis" or "predictive analytics"
       | or what have you and it all carries over pretty well.
        
       | leetrout wrote:
       | > Clever use of computation over convenient assumptions
       | 
       | What does that mean?
        
       | smfjaw wrote:
       | I'm not trying to be a dick but I'm having a hard time relating
       | this to data science or agile and then specific mentions of
       | things like APIs over Databases. What's the goal of this?
        
         | mhh__ wrote:
         | It reminds me of
         | https://en.m.wikipedia.org/wiki/Financial_Modelers%27_Manife...
         | 
         | Does it not make sense? A lot of people doing data "science"
         | are spraying bullshit out through their teeth, unknowingly even
         | (there are too many people in the field). Far too often the
         | dominant approach seems to be jamming a model onto unfamiliar
         | data.
        
         | [deleted]
        
       | axpy906 wrote:
       | Add 2020 to title
        
       ___________________________________________________________________
       (page generated 2023-09-17 23:00 UTC)