[HN Gopher] The Data Science Manifesto (2020)
___________________________________________________________________
The Data Science Manifesto (2020)
Author : alexmolas
Score : 24 points
Date : 2023-09-17 20:17 UTC (2 hours ago)
(HTM) web link (datasciencemanifesto.org)
(TXT) w3m dump (datasciencemanifesto.org)
| mkl95 wrote:
| > Clever use of computation over convenient assumptions
|
| Making falsifiable assumptions is legit. I don't know what
| "clever use of computation" is supposed to mean.
| RcouF1uZ4gsC wrote:
| I think pretty much all manifestos, especially in software, are
| worse than useless and do more harm than good.
|
| Can anyone point to a counter example of a helpful manifesto?
| tomrod wrote:
| I've greatly enjoyed Agile, personally, though have heard and
| read about horror stories
| cactusfrog wrote:
| I don't see any value in these principles.
|
| Minimal Viable Products over prototypes
|
| What's the difference?
|
| APIs over databases
|
| I feel like this is a terrible decision that has nothing to do
| with data science.
|
| Clever use of computation over convenient assumptions
|
| Why not both?
|
| Dashboards over reports
|
| Dislike this as well. Dashboards don't contain analysis.
|
| Validation, scrutiny and repeatability over convention and ad
| verecundiam
|
| Sure, but this is not controversial
| mhh__ wrote:
| > Why not both?
|
| A financial example would be that some models are designed to
| be particularly tractable when those in the know could use a
| tool like autodiff to make a more sophisticated model more
| usable at the same scale
| mhh__ wrote:
| Also I think the API thing if some right is the right way.
| Databases can become an unversioned blob for too quickly.
|
| I don't think raw APIs are really it either, maybe a persistent
| airflow-y DAG type thing that's secretly backed by a proper
| database?
| zmk5 wrote:
| I think it needs a more memorable intro. It's a manifesto! It
| should express more emotion if you are issuing proclamations like
| this otherwise you are just giving bullet points.
| [deleted]
| tomrod wrote:
| Builds on https://agilemanifesto.org/, maybe overly so.
|
| I disagree with some of these points:
|
| - Minimal Viable Products over prototypes: depends on use case.
| Prototypes within a timebox evaluation are helpful and don't have
| the overhead of delivery. Maybe better: prototypes during
| discovery
|
| - APIs over databases: nah, use both.
|
| - Clever use of computation over convenient assumptions : if the
| assumptions are well founded, calibrated from external
| references, etc., then no issue. For example, you don't need to
| perform raw research to understand how many joules heat water by
| a degree Celsius
|
| - Dashboards over reports: depends on the use case. Dashboards
| generally limit use choice.
|
| - Validation, scrutiny and repeatability over convention and ad
| verecundiam: Reasonable (though _argument by authority_ is the
| more common name than _ad verecundiam_ ).
| czep wrote:
| I lean towards your viewpoint as well. Their assumptions
| (axioms, postulates?) are highly controversial, while the
| actual principles seem quite sound to me.
|
| The only issue I can see is with #5. I would argue for decision
| making, you absolutely need a single metric, otherwise the
| process collapses into bickering over which measure is more
| important at the time (often for political or interpersonal
| reasons). The point is a bit vague on what exactly is being
| evaluated (product quality, which means what?). For launching
| products or running A/B tests, aim for a single metric as your
| decision framework. If you must have more than one, then be
| explicit about the tradeoffs in a flowchart: e.g., "if X is >
| 0, we launch. If x <= 0, but y > 2%, we launch, otherwise no
| launch".
| alex_lav wrote:
| The only one of these that I think is even remotely valid is:
|
| > Validation, scrutiny and repeatability over convention and ad
| verecundiam
|
| The rest I just flatout disagree with. This sort of feels like a
| Project Manager's Data Science Manifesto. Which...is fine if it's
| titled as such. Otherwise...no thanks.
| TrackerFF wrote:
| Once you become the "dashboard guy" at your organization, it's
| game over - that is all you'll be doing - there is literally a
| never-ending demand of dashboards in any org that works with lots
| of data, and those dashboards will grow into full-blown apps,
| sooner or later.
| quickthrower2 wrote:
| Is "game over" good or bad in this context? Maybe good for job
| security, bad for career?
| [deleted]
| marius_k wrote:
| From now on you are not doing the real data science. But you can
| hire me to tell how the real data science is done.
| mrkeen wrote:
| > Validation, scrutiny and repeatability over convention and ad
| verecundiam
|
| Does repeatability mean you can't change your product based on
| the outcome of an A/B test? Because you probably won't get a
| chance to repeat those conditions.
| nologic01 wrote:
| An https site would be advisable. check out lets encrypt.
| [deleted]
| mrkeen wrote:
| > 2. Data science is about solving problems, not models or
| algorithms.
|
| I don't like this. How about:
|
| You may not have a problem. If you have a problem, you may not be
| able to detect it. If you detect it, it may be a false positive.
| Your intervention may make things worse. You might not detect
| that you made things worse. The whole operation may have been a
| statistical illusion made up of small sample size, insufficient
| blinding, and insufficient control.
| quickthrower2 wrote:
| They have an interesting links page:
| http://datasciencemanifesto.org/links/
|
| This one seems like a better manifesto actually:
| https://statisticsblog.com/manifesto/
|
| APIs over Databases links to Martin Fowler's microservices
| https://martinfowler.com/articles/microservices.html
| civilized wrote:
| I've been a data scientist for over a decade and have delivered
| on a wide range of high-impact projects and products. I don't
| recognize this as meaningful or helpful to my field. More of a
| me-too from an inexperienced perspective.
|
| The bulleted list at the top seems to be a random collection of
| irrelevant, almost unintelligible thoughts, stuffed cargo cult-
| style into the template of the agile manifesto. What does "APIs
| over databases" mean? It sounds like "oranges over apples" to me.
|
| The numbered list of principles is better, but still not always
| that helpful. There's a strange spirit of perfectionism and
| inflexibility, e.g. in #3 and #5. Maybe some data scientists have
| time to automate everything they do, but I don't think it's a
| good general principle. #4, 6, and 7 are better. But overall
| there is a disorganized, random, unmotivated feel to both lists.
|
| Perhaps something better could be developed, but for now, I think
| data scientists looking for a manifesto would be best served by
| going back to the original agile manifesto
| (https://agilemanifesto.org/) and reflecting on how it applies to
| our field. Which it doesn't, at least not universally. But it
| didn't apply universally to software engineering either. Just
| replace "software" with "data analysis" or "predictive analytics"
| or what have you and it all carries over pretty well.
| leetrout wrote:
| > Clever use of computation over convenient assumptions
|
| What does that mean?
| smfjaw wrote:
| I'm not trying to be a dick but I'm having a hard time relating
| this to data science or agile and then specific mentions of
| things like APIs over Databases. What's the goal of this?
| mhh__ wrote:
| It reminds me of
| https://en.m.wikipedia.org/wiki/Financial_Modelers%27_Manife...
|
| Does it not make sense? A lot of people doing data "science"
| are spraying bullshit out through their teeth, unknowingly even
| (there are too many people in the field). Far too often the
| dominant approach seems to be jamming a model onto unfamiliar
| data.
| [deleted]
| axpy906 wrote:
| Add 2020 to title
___________________________________________________________________
(page generated 2023-09-17 23:00 UTC)