[HN Gopher] Python is not a great language for data science
       ___________________________________________________________________
        
       Python is not a great language for data science
        
       Author : speckx
       Score  : 320 points
       Date   : 2025-11-25 16:38 UTC (1 days ago)
        
 (HTM) web link (blog.genesmindsmachines.com)
 (TXT) w3m dump (blog.genesmindsmachines.com)
        
       | lenerdenator wrote:
       | > I think people way over-index Python as the language for data
       | science. It has limitations that I think are quite noteworthy.
       | There are many data-science tasks I'd much rather do in R than in
       | Python.1 I believe the reason Python is so widely used in data
       | science is a historical accident, plus it being sort-of Ok at
       | most things, rather than an expression of its inherent
       | suitability for data-science work.
       | 
       | Python doesn't need to be the best at any one thing; it just has
       | to be serviceable for a lot of things. You can take someone who
       | has expertise in a completely different domain in software (web
       | dev, devops, sysadmin, etc.) and introduce them to the data
       | science domain without making them learn an entirely new language
       | and toolchain.
        
         | dmurray wrote:
         | That's not why it's used in data science though. Lots of data
         | scientists use Python all day and have no concept of ever
         | working in a different field.
         | 
         | It's used in data science because it's used in data science.
        
           | mohaine wrote:
           | But data science usually isn't an island.
           | 
           | Use whatever you want on your one off personal projects but
           | use something more non-data science friendly if you ever want
           | your model to run directly in a production workflow.
           | 
           | Productionizing R models is quite painful. The normal way is
           | to just rewrite it not in R.
        
             | dmurray wrote:
             | I've soured a lot on directly productionizing data science
             | code. It's normally an unmaintainable mess.
             | 
             | If you write it in R and then rewrite it in C (better:
             | rewrite it in English with the R as helpful annotations,
             | then have someone else rewrite it in C), at least there is
             | some chance you've thought about the abstractions and
             | operations that are actually necessary for your problem.
        
           | lenerdenator wrote:
           | That's probably true now, but at one point, they were looking
           | for people to start doing data science, and were pulling
           | people from other domains.
        
           | vkazanov wrote:
           | It's used in data science because no other language has this
           | level of library support.
           | 
           | And it got this unprecedented level of support because right
           | from the start it made its focus clear syntax and (perceived)
           | simplicity.
           | 
           | There is also a sort of cumulative effect from being nice for
           | algorithmic work.
           | 
           | Guido's long-term strategy won over numerous other strong
           | candidates for this role.
        
             | passivegains wrote:
             | I think the key thing not obvious to most data scientists
             | is they're not using python because it meets their needs,
             | it's because we've failed them. twice.
             | 
             | 1. data scientists _aren 't_ programmers, so why do they
             | need a programming language? the tools they should be using
             | don't exist. they'd need programmers to make them, and all
             | we have to offer is... more programming languages.
             | 
             | 2. the giant problem at the heart of modern software: the
             | most important feature of a modern programming language is
             | being easy to read and write. this feature is conspicuously
             | absent from most important languages.
             | 
             | they're trapped. they can't do what they need without a
             | programming language but there are only a handful they can
             | possibly use. the real reason python ended up with such
             | good library support is they never really had a choice.
        
               | vkazanov wrote:
               | When the first scientific libraries were written for
               | python, most alternatives didn't even consider being
               | readable, or convenient. The choice was more like
               | C/Cpp/Fortran vs Python.
               | 
               | And then Python went into a self-reinforcing loop, with
               | scientific community coming up with more and more ways to
               | improve Python support for the kind of interactive work
               | that was required for data analysis. Think ipython ->
               | jupyter -> jupyter forks and other python-centric
               | notebook systems.
               | 
               | So when data analysis evolved into data science and
               | machine learning, gpu-first library vendors already faced
               | a crowd of people knowing python.
               | 
               | It is crazy how right now one can utilize 100s of gpus
               | through these bits of dirty python wrapped in json.
        
               | aragilar wrote:
               | I think you're forgetting perl (plus other unix utils)
               | and matlab. PDL (perl data language) was a thing, as was
               | IDL (and other similar tools).
        
           | bsder wrote:
           | Partially, but it's _also_ because 90% of your work in  "data
           | science" isn't direct analysis.
           | 
           | You need to get the data from somewhere. Do you need to
           | scrape that because Python is okay at scraping? Oh, after its
           | scraped, we looked at it and it's in
           | ObtuseBinaryFormat0.0.LOL.Beta and, what do you know,
           | somebody wrote a converter for that for Python. And we need
           | to clean all the broken entries out of that and Python is
           | decent at that. etc.
           | 
           | The trick is that while Python may or may not be anybody's
           | first choice for a particular task, Python is an okay second
           | or third choice for most tasks.
           | 
           | So, you can learn Python. Or you learn <best language> and
           | <something else>. And if <something else> is Python, was
           | <best language> sufficiently better than Python to be worth
           | spending the time learning?
        
       | forgotpwd16 wrote:
       | Article is well written but fails to address its own thesis by
       | postponing it to a sequel article. At its current state only
       | alludes that Python is not great because requires specialized
       | packages. (And counterexample is R for which also used a
       | package.)
        
         | stevenpetryk wrote:
         | Totally agree. The author's most significant example is two
         | code snippets that are quite similar and both pretty nice.
        
         | puzzlingcaptcha wrote:
         | The 'sequel' is also online:
         | https://blog.genesmindsmachines.com/p/python-is-not-a-great-...
        
           | forgotpwd16 wrote:
           | Thanks! In such serial articles usually there's link to the
           | end pointing to the next one so, since there wasn't any,
           | thought next one hadn't been written. This one indeed
           | addresses the thesis. The TL;DR, taken directly from the
           | article,
           | 
           | >The core problems I see with Python as a language for data
           | science are call-by-reference semantics, lack of built-in
           | concepts of missing values, lack of built-in vectorization,
           | and lack of non-standard evaluation.
        
       | yeahwhatever10 wrote:
       | A little late for this
        
         | ASalazarMX wrote:
         | "Not great" doesn't necessarily mean "bad", it can be
         | interpreted as "good", or even "very good". An honest title
         | would have explicitly qualified how suitable the author found
         | it was.
         | 
         | That the author avoided saying Python was a bad language
         | outright speaks a great deal of its suitability. Well, that,
         | and the majority data science in practice.
        
       | iLemming wrote:
       | From many practical points, Clojure is great for data. And you
       | can even leverage python libs via clj-python.
        
         | phforms wrote:
         | In the past few years I have seen some serious efforts from the
         | Clojure community to make Clojure more attractive for data
         | science. Check out the Scicloj[1] group and their data science
         | stack/toolkit Noj[2] (still in beta) as well as the high-
         | performance tabular data processing library tech.ml.dataset
         | (TMD)[3].
         | 
         | - [1] https://scicloj.github.io
         | 
         | - [2] https://scicloj.github.io/noj
         | 
         | - [3] https://github.com/techascent/tech.ml.dataset
        
           | geokon wrote:
           | What's worth emphasizing is that you're not marrying in to an
           | ecosystem of libs. There are a lot of separate pieces that
           | you can typically use separately. I do climate data work
           | without most of Scicloj's tools, but I do use tech.ml.dataset
           | extensively
        
       | paulfharrison wrote:
       | R is so good in part because of the efforts of people like Di
       | Cook, Hadley Wickham, and Yihui Xie to create an software
       | environment that they like working in.
       | 
       | It also helps that in R any function can completely change how
       | its arguments are evaluated, allowing the tidyverse packages to
       | do things like evaluate arguments in the context of a data frame
       | or add a pipe operator as a new language feature. This is a very
       | dangerous feature to put in the hands of statisticians, but it
       | allows more syntactic innovation than is possible in Python.
        
         | cb321 wrote:
         | Like Python, R is a 2 (+...) language system. C/Fortran
         | backends are needed for performance as problems scale up.
         | 
         | Julia and Nim [1] are dynamic and static approaches
         | (respectively) to 1 language systems. They both have both user-
         | defined operators and macros. Personally, I find the surface
         | syntax of Julia rather distasteful and I also don't live in
         | PLang REPLs / emacs all day long. Of course, neither Julia nor
         | Nim are impractical enough to make calling C/Fortran all that
         | hard, but the communities do tend to implement in the new
         | language without much prompting.
         | 
         | [1] https://nim-lang.org/
        
       | Lyngbakr wrote:
       | I was a bit disappointed to discover that this was essentially an
       | R vs. Python article, which is a data science trope. I've been in
       | the field for 20+ years now and while I used to be firmly on team
       | R, I now think that we don't really have a good language for data
       | science. I had high hopes for Julia and even Clojure's data
       | landscape looks interesting, but given the momentum of Python I
       | don't see how it could be usurped at this point.
        
         | vkazanov wrote:
         | It is EVERYWHERE. I recently had to interview a bunch of data
         | scientists, and only one of them knew SQL. Surely, all of then
         | worked with python. I bet none of them even heard of R.
        
           | Lyngbakr wrote:
           | Yikes. Were they experienced data scientists or straight out
           | of school? I find it very odd (and a bit scary) that they
           | didn't know SQL.
        
             | garciasn wrote:
             | Experienced Data Scientists and/or those straight out of
             | school are EXTREMELY lacking in valuable SQL experience and
             | always have been. Take a DS with 25 years experience in
             | SAS, _many of them_ are great with DATAstep, but have far
             | less experience using PROC SQL for querying the data in the
             | most effective way--even if they were pulling the data down
             | with pass-through via SAS /ACCESS.
             | 
             | Often they'd be doing very simplistic querying and then
             | manipulating via DATAstep prior to running whatever
             | modeling and/or reporting PROCs later, rather than pushing
             | it upstream into a far faster native database SQL pull via
             | pass-through.
             | 
             | Back in 2008/2009, I saved 30h+ runtime on a regular report
             | by refactoring everything in SQL via pass-through as
             | opposed to the data scientists' original code that simply
             | pulled the data down from the external source and
             | manipulated it in DATAstep. Moving from 30h to 3m (Oracle
             | backend) freed up an entire FTE to do more than babysit a
             | long-running job 3x a week to multiple times per day.
        
           | garciasn wrote:
           | SAS > R > Python.
           | 
           | The focus of SAS and R were primarily limited to data
           | science-related fields; however, Python is a far more generic
           | programming language, thus the number of folks exposed to it
           | is wider and thus the hiring pool of those who come in
           | exposed to Python is FAR LARGER than SAS/R ever were, even
           | when SAS was actively taught/utilized in
           | undergraduate/graduate programs.
           | 
           | As a hiring leader in the Data Science and Engineering space,
           | I have extensive experience with all of these + SQL, among
           | others. Hiring has become much easier to go cross-field/post-
           | secondary experience and find capable folks who can hit the
           | ground running.
        
             | username135 wrote:
             | you beat me to it. i understand why sas gets hate but I
             | think that comes with simply not understanding how powerful
             | it is.
        
               | garciasn wrote:
               | It was a great language, but it was/is extremely cost-
               | prohibitive plus it simply fell out of favor in academia,
               | for many of the same reasons, and thus was supplanted by
               | free alternatives.
        
         | SiempreViernes wrote:
         | What would it even mean to be a "good language for data
         | science"?
         | 
         | In the first place data science is more a label someone put on
         | bag full of cats, rather than a vast field covered by similarly
         | sized boxes.
        
         | username135 wrote:
         | SAS has entered the chat
        
       | RobinL wrote:
       | I think a lot of this comes down to the question: Why aren't
       | tables first class citizens in programming languages?
       | 
       | If you step back, it's kind of weird that there's no mainstream
       | programming language that has tables as first class citizens.
       | Instead, we're stuck learning multiple APIs (polars, pandas)
       | which are effectively programming languages for tables.
       | 
       | R is perhaps the closest, because it has data.frame as a 'first
       | class citizen', but most people don't seem to use it, and use
       | e.g. tibbles from dplyr instead.
       | 
       | The root cause seems to be that we still haven't figured out the
       | best language to use to manipulate tabular data yet (i.e. the way
       | of expressing this). It feels like there's been some convergence
       | on some common ideas. Polars is kindof similar to dplyr. But no
       | standard, except perhaps SQL.
       | 
       | FWIW, I agree that Python is not great, but I think it's also
       | true R is not great. I don't agree with the specific comparisons
       | in the piece.
        
         | jna_sh wrote:
         | I know the primary data structure in Lua is called a table, but
         | I'm not very familiar with them and if they map to what's
         | expected from tables in data science.
        
           | TheSoftwareGuy wrote:
           | IIRC those are basically hash tables, which are first-class
           | citizens in many languages already
        
           | Jtsummers wrote:
           | Lua's tables are associative arrays, at least fundamentally.
           | There's more to it than that, but it's not the same as the
           | tables/data frames people are using with pandas and similar
           | systems. You could build that kind of framework on _top_ of
           | Lua 's tables, though.
           | 
           | https://www.lua.org/pil/2.5.html
        
         | kelipso wrote:
         | People use data.table in R too (my favorite among those but
         | it's been a few years). data.table compared to dplyr is quite a
         | contrast in terms of language to manipulate tabular data.
        
         | CivBase wrote:
         | What is a table other than an array of structs?
        
           | RobinL wrote:
           | I would argue that's about how the data is stored. What I'm
           | trying to express is the idea of the programming language
           | itself supporting high level tabular
           | abstractions/transformations such as grouping, aggregation,
           | joins and so on.
        
             | CivBase wrote:
             | Ah, that makes more sense. Thanks for the clarification.
        
             | camdenreslink wrote:
             | Sounds a lot like LINQ in .NET (which is usually compatible
             | with ORMs actually querying tables).
        
             | p1necone wrote:
             | Implementing all of those things is an order of magnitude
             | more complex than any other first class primitive datatype
             | in most languages, and there's no obvious "one right way"
             | to do it that would fit everyones use cases - seems like
             | libraries and standalone databases are the way to do it,
             | and that's what we do now.
        
             | pjc50 wrote:
             | Yeah, that's LINQ+EF. People have hated ORMs for so long
             | (with some justification) that perhaps they've forgotten
             | what the use case is.
             | 
             | (and yes there's special language support for LINQ so it
             | counts as "part of the language" rather than "a library")
        
             | redwall_hp wrote:
             | Map/filter/reduce are idiomatic Java/Kotlin/Scala.
             | 
             | SELECT thing1, thing2 FROM things WHERE thing2 != 2;
             | 
             | val thingMap = things.map { it.thing2 to it.thing2 }.filter
             | { it.thing2 !=2 }
             | 
             | Then you've got distinct(), sorting methods, take/drop for
             | limits, count/sumOf/average/minOf/maxOf.
             | 
             | There are set operations, so you can do unions and
             | differences, check for presence, etc.
             | 
             | Joins are the hard part, but map() and some lambda work can
             | pull it off.
        
           | thom wrote:
           | It's not that you can't model data that way (or indeed with
           | structs of arrays), it's just that the user experience starts
           | to suck. You might want a dataset bigger than RAM, or that
           | you can transparently back by the filesystem, RAM or VRAM.
           | You might want to efficiently index and query the data. You
           | might want to dynamically join and project the data with
           | other arrays of structs. You might want to know when you're
           | multiplying data of the wrong shapes together. You might want
           | really excellent reflection support. All of this is obviously
           | possible in current languages because that's where it
           | happens, but it could definitely be easier and feel more of a
           | first class citizen.
        
           | ModernMech wrote:
           | The difference is semantics.
           | 
           | What is a paragraph but an array of sentences? What is a
           | sentence but an array of words? What's a word but an array of
           | letters? You can do this all the way down. Eventually you
           | need to assign meaning to things, and when you do, it helps
           | to know _what_ the thing actually is, specifically, because
           | an array of structs can be many things that aren 't a table.
        
           | FridgeSeal wrote:
           | Well it could be a struct of arrays.
           | 
           | Nitpicking aside, a nice library for doing "table stuff"
           | without "the whole ass big table framework" would be nice.
           | 
           | It's not hard to roll this stuff by hand, but again, a nicer
           | way wouldn't be bad.
        
         | kevinhanson wrote:
         | this is my biggest complaint about SAS--everything is either a
         | table or text.
         | 
         | most procs use tables as both input and output, and you better
         | hope the tables have the correct columns.
         | 
         | you want a loop? you either get an implicit loop over rows in a
         | table, write something using syscalls on each row in a table,
         | or you're writing macros (all text).
        
         | nextos wrote:
         | I don't think this is the real problem. In R and Julia tables
         | are great, and they are libraries. The key is that these
         | languages are very expressive and malleable.
         | 
         | Simplifying a lot, R is heavily inspired by Scheme, with some
         | lazy evaluation added on top. Julia is another take at the
         | design space first explored by Dylan.
        
           | Iwan-Zotow wrote:
           | R was clone of S
        
         | paddleon wrote:
         | > R is perhaps the closest, because it has data.frame as a
         | 'first class citizen', but most people don't seem to use it,
         | and use e.g. tibbles from dplyr instead.
         | 
         | You're forgetting R's data.table, https://cran.r-project.org/we
         | b/packages/data.table/vignettes...,
         | 
         | which is amazing. Tibbles only wins because they fought the
         | docs/onboarding battle better, and dplyr ended up getting
         | industry buy-in.
        
           | elehack wrote:
           | And readability. data.table is very capable, but the
           | incantations to use it are far less obvious (both for reading
           | and writing) than dplyr.
           | 
           | But you can have the best of both worlds with
           | https://dtplyr.tidyverse.org/, using data.table's performance
           | improvements with dplyr syntax.
        
           | extr wrote:
           | Yeah data.table is just about the best-in-class tool/package
           | for true high-throughput "live" data analysis. Dplyr is great
           | if you are learning the ropes, or want to write something
           | that your colleagues with less experience can easily spot
           | check. But in my experience if you chat with people working
           | in the trenches of banks, lenders, insurance companies, who
           | are running hundreds of hand-spun crosstabs/correlational
           | analyses daily, you will find a lot of data.table users.
           | 
           | Relevant to the author's point, Python is pretty poor for
           | this kind of thing. Pandas is a perf mess. Polars, duckdb,
           | dask etc, are fine perhaps for production data pipelines but
           | quite verbose and persnickety for rapid iteration. If you put
           | a gun to my head and told me to find some nuggets of insight
           | in some massive flat files, I would ask for an RStudio cloud
           | instance + data.table hosted on a VM with 256GB+ of RAM.
        
         | RodgerTheGreat wrote:
         | There are a number of dynamic languages to choose from where
         | tables/dataframes are truly first-class datatypes: perhaps most
         | notably Q[0]. There are also emerging languages like Rye[1] or
         | my own Lil[2].
         | 
         | I suspect that in the fullness of time, mainstream languages
         | will eventually fully incorporate tabular programming in much
         | the same way they have slowly absorbed a variety of idioms
         | traditionally seen as part of functional programming, like
         | map/filter/reduce on collections.
         | 
         | [0]
         | https://en.wikipedia.org/wiki/Q_(programming_language_from_K...
         | 
         | [1] https://ryelang.org/blog/posts/comparing_tables_to_python/
         | 
         | [2] http://beyondloom.com/tools/trylil.html
        
           | middayc wrote:
           | Another page about Rye tables:
           | https://ryelang.org/cookbook/working-with/tables/
        
           | liveranga wrote:
           | Nushell is another one with tables built-in:
           | 
           | https://www.nushell.sh/book/working_with_tables.html
        
             | middayc wrote:
             | It's interesting how often there are similarities between
             | Numshell, Rye and Lil, although I think they are from
             | different influences. I guess it's sort of current
             | zeitgeist if you want something light, high level and
             | interactive.
        
           | mncharity wrote:
           | Interesting links - tnx. Apropos the optimism of
           | "eventually", I think of language support for say key-value
           | pair collections, namespaces, as still quite impoverished.
           | With each language supporting only a small subset of the
           | concision, apis, and datastructures, found useful in some
           | other. This some 3 decades after becoming mainstream, and the
           | core of multiple mainstream languages. Diminishing returns,
           | silos, segregation of application domains, divergence of
           | paradigm/orientation/idioms, assorted dysfunctions as a
           | field, etc... "eventually" can be decades. Maybe LLMs can
           | quicken that... or perhaps call an end to this era,
           | permitting a "no, we collectively just never got around to
           | creating any one language which supported all of {X}".
        
         | constantcrying wrote:
         | >Why aren't tables first class citizens in programming
         | languages?
         | 
         | Matlab has them, in fact it has multiple competing concepts of
         | it.
        
         | alexnewman wrote:
         | APL Is great
        
           | 7thaccount wrote:
           | Perfect solution for doing analysis on tables. Wes McKinney
           | (inventor of pandas is rumored to have been inspired by it
           | too).
           | 
           | My problem with APL is 1.) the syntax is less amazing at
           | other more mundane stuff, and 2.) the only production worthy
           | versions are all commercial. I'm not creating something that
           | requires me to pay for a development license as well as
           | distribution royalties.
        
           | smartmic wrote:
           | Agreed. I once used it for data preparation for a data
           | science project (GNU APL). After a steep learning curve, it
           | felt very much like writing math formulas -- it was fun and
           | concise, and I liked it very much. However, it has zero
           | adoption in today's data science landscape. Sharing your work
           | is basically impossible. If you're doing something just for
           | yourself, though, I would probably give it a chance again.
        
         | 127 wrote:
         | Because there's no obvious universal optimal data structure for
         | heterogeneous N-dimensional data with varying distributions?
         | You can definitely do that, but it requires an order of
         | magnitude more resource use as baseline.
        
         | riskassessment wrote:
         | > R is perhaps the closest, because it has data.frame as a
         | 'first class citizen', but most people don't seem to use it,
         | and use e.g. tibbles from dplyr instead.
         | 
         | Everyone in R uses data.frame because tibble (and data.table)
         | inherits from data.frame. This means that "first class" (base
         | R) functions work directly on tibble/data.table. It also makes
         | it trivial to convert between tibble, data.table, and
         | data.frames.
        
         | ModernMech wrote:
         | It makes sense from a historical perspective. Tables _are_ a
         | thing in many languages, just not the ones that mainstream devs
         | use. In fact, if you rank programming languages by usage
         | outside of devs, the top languages _all_ have a table-ish
         | metaphor (SQL, Excel, R, Matlab).
         | 
         | The languages devs use are largely Algol derived. Algol is a
         | language that was used to express algorithms, which were
         | largely abstractions over Turing machines, which are based
         | around an infinite 1D tape of memory. This model of 1D memory
         | was built into early computers, and early operating systems and
         | early languages. We call it "mechanical sympathy".
         | 
         | Meanwhile, other languages at the same time were invented that
         | weren't tied so closely to the machine, but were more for the
         | purpose of doing science and math. They didn't care as much
         | about this 1D view of the world. Early languages like Fortran
         | and Matlab had notions of 2D data matrices because math and
         | science had notions of 2D data matrices. Languages like C were
         | happy to support these things by using an array of pointers
         | because that mapped nicely to their data model.
         | 
         | The same thing can be said for 1-based and 0-based indexing --
         | languages like Matlab, R, and Excel are 1-based because that's
         | how people index tables; whereas languages like C and Java are
         | 0-based because that's how people index memory.
        
           | cb321 wrote:
           | As a slight refinement of your point, C does have storage map
           | based N-D arrays/tensors like Fortran, just with the old
           | column-major/row-major difference and a clunky "multiple
           | [][]" syntax. There was just a restriction early on to need
           | compile-time known dimensions to the arrays (up to the final
           | dimension, anyway) because it was a somewhat half-done/half-
           | supported thing - and because that _also_ fit the linear data
           | model well. So, it is also common to see char *argv[] like
           | arrays of pointers or in numerics sometimes libraries which
           | do their own storage map equations from passed dimensions.
           | 
           | Also, the linear memory model itself is not really _only_
           | because of Algol /Turing machines/theoretical CS/"early"
           | hardware and mechanical sympathy. DRAM has rows & columns
           | internally, but byte addressability leads to hiding that from
           | HW client systems (unless someone is doing a rowhammer attack
           | or something). More random access than tape rewind/fast
           | forward is indeed a huge deal, but I think the actual
           | popularity of linearity just comes from its simplicity as an
           | interface more than anything else. E.g.s, segmented x86
           | memory with near/far pointers was considered ugly relative to
           | a big 32-bit address space and disk files and other
           | allocation arenas have internally a large linear address/seek
           | spaces. People just want to defer using >1 number until they
           | really need to. People learn univariate-X before they learn
           | multivariate-X where X could be calculus, statistics, etc.,
           | etc.
        
         | IgorPartola wrote:
         | SQL is not just about a table but multiple tables and their
         | relationships. If it was just about running queries against a
         | single table then basic ordering, filtering, aggregation, and
         | annotation would be easy to achieve in almost any language.
         | 
         | Soon as you start doing things like joins, it gets complicated
         | but in theory you could do something like an API of an ORM to
         | do most things. With using just operators you quickly run into
         | the fact that you have to overload (abuse) operators or write a
         | new language with different operator semantics:
         | orders * customers | (customers.id == orders.customer_id |
         | orders.amount > Decimal('10.00')
         | 
         | Where * means cross product/outer join and | means filter. Once
         | you add an ordering operator, a group by, etc. you basically
         | get SQL with extra steps.
         | 
         | But it would be nice to have it built in so talking to a
         | database would be a bit more native.
        
           | sgarland wrote:
           | Every time I see stuff like this (Google's new SQL-ish
           | language with pipes comes to mind), I am baffled. SQL to me
           | is eminently readable, and flows beautifully.
           | 
           | For reference, I think the same is true of Python, so it's
           | not like I'm a Perl wizard or something.
        
             | IgorPartola wrote:
             | Oh I agree. The problem is that they are two different
             | languages. Inside a Python file, SQL is just a string. No
             | syntax highlighting, no compile time checking, etc. A
             | Kwisatz Haderach of languages that incorporates both its
             | own language and SQL as first class concepts would be very
             | nice but the problem is that SQL is just too different.
             | 
             | For one thing, SQL is not really meant to be dynamically
             | constructed in SQL. But we often need to dynamically
             | construct a query (for example customer applied several
             | filters to the product listing). The SQL way to handle that
             | would be to have a general purpose query with a thousand
             | if/elses or stored procedures which I think takes it from
             | "flows beautifully" to "oh god who wrote this?" Or you
             | could just do string concatenation in a language that
             | handles that well, like Python. Then wrap the whole thing
             | in functions and objects and you get an ORM.
             | 
             | I still have not seen a language that incorporates anything
             | like SQL into it that would allow for even basic ORM-like
             | functionality.
        
               | kelipso wrote:
               | Are you thinking of query generators like Ecto in Elixir?
        
         | riidom wrote:
         | PyTorch was first only Torch, and in Lua. I didn't follow it
         | too close at the time, but apparently due to popular demand it
         | got redone in Python and voila PyTorch.
        
         | RA_Fisher wrote:
         | R's the best, bc it's been a statistical analysis language from
         | the beginning in 1974 (and was built and developed for the
         | purpose of analysis / modeling). Also, the tidyverse is
         | marvelous. It provides major productivity in organizing and
         | augmenting the data. Then there's ggplot, the undisputed best
         | graphical visualization system + built-ins like barplot(), or
         | plot().
         | 
         | But ultimately data analysis is going beyond Python and R into
         | the realm of Stan and PyMC3, probabilistic programming
         | languages. It's because we want to do nested integrals and
         | those software ecosystems provide the best way to do it (among
         | other probabilistic programming languages). They allow us to
         | understand complex situations and make good / valuable
         | decisions.
        
         | OkayPhysicist wrote:
         | There's a number of structures that I think are missing in our
         | major programming languages. Tables are one. Matrices are
         | another. Graphs, and relatedly, state machines are tools that
         | are grossly underused because of bad language-level support.
         | Finally, not a structure per se, but I think most languages
         | that are batteries-included enough to included a regex engine
         | should have a a full-fledged PEG parsing engines. Most, if not
         | all, Regex horror stories derive from a simple "Regex is built
         | in".
         | 
         | What tools are _easily_ available in a language, by default,
         | shape the pretty path, and by extension, the entire feel of the
         | language. An example that we 've largely come around on is key-
         | value stores. Today, they're table stakes for a standard
         | library. Go back to 90's, and the most popular languages _at
         | best_ treated them as second-class citizens, more like imported
         | objects than something fundamental like arrays. Sure, you can
         | implement a hash map in any language, or import some else 's
         | implementation, but oftentimes you'll instead end up with
         | nightmarish, hopefully-synchronized arrays, because those are
         | built-in, and ready at hand.
        
           | throwaway2037 wrote:
           | > There's a number of structures that I think are missing in
           | our major programming languages. Tables are one. Matrices are
           | another.
           | 
           | I disagree. Most programmers will go their entire career and
           | never need a matrix data structure. Sure, they will use
           | libraries that use matrices, but never use them directly
           | themselves. It seems fine that matrices are not a separate
           | data type in most modern programming languages.
        
             | OkayPhysicist wrote:
             | Unless you think "most programmers" === "shitty webapp
             | developers", I strongly disagree. Matrices are first class,
             | important components in statistics, data analysis,
             | graphics, video games, scientific computing, simulation,
             | artificial intelligence and so, so much more.
             | 
             | And all of those programmers are either using specialized
             | languages, (suffering problems when they want to turn their
             | program into a shitty web app, for example), or committing
             | crimes against syntax like
             | 
             | rotation_matrix.matmul(vectorized_cat)
        
               | lock1 wrote:
               | That's needlessly aggressive. Ignoring webapps, you could
               | do gamedev without even knowing what a matrix is.
               | 
               | You don't even need such construction in most native
               | applications, embedded systems, and OS kernel
               | development.
        
               | theamk wrote:
               | I am working in embedded. Had to optimize weights for an
               | embedded algorithm, decided to use linear regression and
               | thus needed matrices.
               | 
               | And if you do robotics, the chances of encountering a
               | matrix are very high.
        
               | throwaway2037 wrote:
               | This is my exactly point. Even in a highly specialised
               | library for pricing securities, the amount of code that
               | uses matrices is surprisingly small.
        
               | voidUpdate wrote:
               | To be fair, I do use matrices a reasonable amount in
               | gamedev. And if you're writing your engine from scratch,
               | rather than using something like unity, you will almost
               | certainly need matrices
        
               | habinero wrote:
               | I don't see why the majority of engineers need to cater
               | to your niche use cases. It's a programming language, you
               | can just make the library if it doesn't exist. Nobody's
               | stopping you.
               | 
               | Plus, plenty of third party projects have been
               | incorporated into the Python standard library.
        
               | Koshkin wrote:
               | At least in C++ you don't need 'matmul'
        
           | jltsiren wrote:
           | When there is no clear canonical way of implementing
           | something, adding it to a programming language (or a standard
           | library) is risky. All too often, you realize too late that
           | you made a wrong choice, and then you add a second version.
           | And a third. And so on. And then you end up with a confusing
           | language full of newbie traps.
           | 
           | Graphs are a good example, as they are a large family of
           | related structures. For example, are the edges undirected,
           | directed, or something more exotic? Do the nodes/edges have
           | identifiers and/or labels? Are all nodes/edges of the same
           | type, or are there multiple types? Can you have duplicate
           | edges between the same nodes? Does that depend on the types
           | of the nodes/edges, or on the labels?
        
             | WorldMaker wrote:
             | Even the raw storage for graphs doesn't have just one
             | answer: you could store edge lists or you could store
             | adjacency matrixes. Some algorithms work better with one,
             | some work better with the other. You probably don't want to
             | store both because that can be extra memory overhead as
             | well as a locking problem if you need to atomically update
             | both at once. You probably don't want to automatically flip
             | back and forth between representations because that could
             | cause garbage collector churn if not also long breadth or
             | depth searches, and you may not want to encourage manual
             | conversions between data structures either (to avoid
             | providing a performance footgun to your users). So you
             | probably want the edge list Graph type and the adjacency
             | matrix Graph type to look very different, even though (they
             | are trivially convertible they may be expensive to convert
             | as mentioned), and yeah that's the under-the-hood storage
             | mechanism. From there you get into possible exponential
             | explosion as you start to get into the other higher level
             | distinctions between types of graphs (DAGs versus Trees
             | versus cyclic structures and so forth, and all the
             | variations on what a node can be, if edges can be weighted
             | or labeled, etc).
        
           | HelloNurse wrote:
           | > I think most languages that are batteries-included enough
           | to included a regex engine should have a a full-fledged PEG
           | parsing engines
           | 
           | Then there would be more PEG horror stories. In addition,
           | string and indices in regex processing are universal, while a
           | parser is necessarily more framework-like, far more complex
           | and doomed to be mismatched for many applications.
        
           | fluorinerocket wrote:
           | Would love to see a language in which hierarchical state
           | machines, math/linear algebra, I/O to sensors and actuators,
           | and time/timing were first class citizens.
           | 
           | Mainly for programming control systems for robotics and
           | aerospace applications
        
         | dm319 wrote:
         | Dplyr is quite happy with data.frame. R is built around tabular
         | data. Other statistical languages are too, such as Stata.
        
         | getnormality wrote:
         | Saying that SQL is the standard for manipulating tabular data
         | is like saying that COBOL is the standard for financial
         | transactions. It may be true based on current usage, but nobody
         | thinks it's a good idea long term. They're both based on the
         | outdated idea that a programming language should look like
         | pidgin English rather than math.
        
           | Iwan-Zotow wrote:
           | In R data.table is basically SQL in another shape
        
         | maest wrote:
         | > Why aren't tables first class citizens in programming
         | languages?
         | 
         | They are in q/kdb and it's glorious. Sql expressions are also
         | first class citizens and it makes it very pleasant to write
         | code
        
         | don-bright wrote:
         | Every copy of Microsoft Excel includes Power Query which is in
         | the M language and has tables as a type. Programs are
         | essentially transformations of table columns and rows. Not sure
         | if its mainstream but is widely available. M language is also
         | included in other tools like PowerBI and Power Automate.
        
         | m_mueller wrote:
         | Fortran gives you that and more, it has first class
         | multidimensional arrays, including matrix operations.
        
         | genidoi wrote:
         | This is an interesting observation. One possible explanation
         | for a lack of robust first class table manipulation support in
         | mainstream languages could be due to the large variance in
         | real-world table sizes and the mutually exclusive subproblems
         | that come with each respective jump in order-of-magnitude row
         | size.
         | 
         | The problems that one might encounter in dealing with a 1m row
         | table are quite different to a 1b row table, and a 1b row table
         | is a rounding error compared to the problems that a 1t row
         | table presents. A standard library needs to support these
         | massive variations at least somewhat gracefully and that's not
         | a trivial API surface to design.
        
         | brikym wrote:
         | This. I really really want some kind of data frame which has
         | actual compile time typing my LSP/IDE can understand. Kusto
         | query language (Azure Data Explorer) has it and the auto
         | completion and error checking is extremely useful. But kusto
         | query language is really just limited to one cloud product.
        
         | poulpy123 wrote:
         | > Why aren't tables first class citizens in programming
         | languages?
         | 
         | Because they were created by before the need for it and maybe
         | before their invention.
         | 
         | Manipulating numeric arrays and matrices in python is a bit
         | clunky because it was not designed as a scientific computing
         | language so they were added as library. It's much more
         | integrated and natural to use in scientific computer languages
         | such as matlab. However the reverse is also true: because
         | matlab wasn't designed to do what python does, it's a bit
         | clunkier to use outside scientific computing
        
           | jstanley wrote:
           | Tables were definitely around before programming languages.
           | 
           | There are clay tablets from ancient Sumeria that represent
           | information using tables.
        
         | HenriTEL wrote:
         | Well you nailed it, the language you're looking for is SQL.
         | There's a reason why duckdb got such traction over the last
         | years. I think data scientists overlook SQL and Excel like
         | tooling.
        
           | RobinL wrote:
           | Out of the current options, I strongly agree - I even wrote a
           | blog post! https://www.robinlinacre.com/recommend_sql/
           | 
           | But on the other hand, that's doesn't mean SQL is ideal - far
           | from it. When using DuckDB with Python, to make things more
           | succinct, reusable and maintainable, I often fall into the
           | pattern of writing Python functions that generate SQL
           | strings.
           | 
           | But that hints at the drawbacks of SQL: it's mostly not
           | composable as a language (compared to general purpose
           | languages with first-class abstractions). DuckDB syntax does
           | improve on this a little, but I think it's mostly fundamental
           | to SQL. All I'm saying is that it feels like something better
           | is possible.
        
         | hermitcrab wrote:
         | There are a number of data-focussed no-code/visual/drag-and-
         | drop tools where data tables/frames are very much a first class
         | citizen (e.g. Easy Data Transform, Alteryx, Knime).
        
         | SubjectToChange wrote:
         | Mathematica recently added the Tabular command, for what it's
         | worth. I haven't used it much yet, but it seems to be quite
         | capable.
        
         | timbit42 wrote:
         | The 3rd edition of Dartmouth BASIC, back in the 1960's, had a
         | MAT command for dealing with matrices.
        
       | serjester wrote:
       | Seems like their critique boils down to two areas - pandas
       | limitations and fewer built ins to lean on.
       | 
       | Personally I've found polars has solved most of the "ugly"
       | problems that I had with pandas. It's way faster, has an
       | ergonomic API, seamless pandas interop and amazing support for
       | custom extensions. We have to keep in mind Pandas is almost 20
       | years old now.
       | 
       | I will agree that Shiny is an amazing package, but I would argue
       | it's less important now that LLMs will write most of your code.
        
       | kasperset wrote:
       | R data science people generally come to data science field from
       | life science or stats field. Python data science people generally
       | originate from other fields that are mostly engineering focused.
       | Again this may not apply to all the cases but that is my general
       | observation.
       | 
       | Recently I am seeing that Python is heavily pushed for all data
       | science related things. Sometimes objectively Python may not be
       | the best option especially for stats. It is hard to change
       | something after it becomes the "norm" regardless of its
       | usability.
        
       | exabrial wrote:
       | The problem is there's so much momentum behind it that's hard to
       | course correct. PyTorch is now a goliath.
        
       | jakobnissen wrote:
       | Excellent article - except that the author probably should have
       | gated their substantiation of the claim behind a cliffhanger, as
       | other commenters have mentioned.
       | 
       | The author's priorities are sensible, and indeed with that set of
       | priorities, it makes sense to end up near R. However, they're not
       | universal among data scientists. I've been a data scientist for
       | eight years, and have found that this kind of plotting and
       | dataframe wrangling is only part of the work. I find there is
       | usually also some file juggling, parsing, and what the author
       | calls "logistics". And R is terrible at logistics. It's also bad
       | at writing maintainable software.
       | 
       | If you care more about logistics and maintenance, your conclusion
       | is pushed towards Python - which still does okay in the
       | dataframes department. If you're ALSO frequently concerned about
       | speed, you're pushed towards Julia.
       | 
       | None of these are wrong priorities. I wish Julia was better at
       | being R, but it isn't, and it's very hard to be both R and useful
       | for general programming.
       | 
       | Edit: Oh, and I should mention: I also teach and supervise
       | students, and I KEEP seeing students use pandas to solve non-
       | table problems, like trying to represent a graph as a dataframe.
       | Apparently some people are heavily drawn to use dataframes for
       | everything - if you're one of those people, reevaluate your
       | tools, but also, R is probably for you.
        
         | ActorNightly wrote:
         | >Excellent article
         | 
         | Except its not. Data science in python pretty much requires you
         | to use numpy. So his example of mean/variance code is a dumb
         | comparison. Numpy has mean and variance functions built in for
         | arrays.
         | 
         | Even when using raw python in his example, some syntax can be
         | condesed quite a bit:
         | 
         | groups = defaultdict(list) [groups[(row['species'],
         | row['island'])].append(row['body_mass_g']) for row in filtered]
         | 
         | It takes the same amount of mental effort to learn python/numpy
         | as it does with R. The difference is, the former allows you to
         | integrate your code into any other applicaiton.
        
           | ModernMech wrote:
           | I dunno. Numpy has its own data types, its own collections,
           | its own semantics which are all different enough from Python,
           | I think it's fair to consider it a DSL on its own. It'd be
           | one thing if it was just, operator overloading to provide
           | broadcasting for python, but Numpy's whole existence is to
           | patch the various shortcomings Python has in DS.
        
           | dragonwriter wrote:
           | > Numpy has mean and variance functions built in for arrays.
           | 
           | Even outside of Numpy, the stdlib has the statistics packages
           | which provides mean, variance, population/sample standard
           | deviation, and other statistics functions for normal
           | iterables. The attempt to make Python out-of-the-box code
           | look bad was either deliberately constructed to exaggerate
           | the problems complained of, or was the product of a very
           | convenient ignorance of the applicable parts of Python and
           | its stdlib.
        
         | a_bonobo wrote:
         | >I find there is usually also some file juggling, parsing,
         | [...]
         | 
         | I'd say I'm 50/50 Python/R for exactly this reason: I write
         | Python code on HPC or a server to parse many, many files, then
         | I get some kind of MB-scale summary data I analyse locally in
         | R.
         | 
         | R is _not good_ at looping over hundreds of files in the
         | gigabytes, Python is _not good_ at making pretty insights from
         | the summary. A tool for every task.
        
         | puzzlingcaptcha wrote:
         | The second part of the article is right here:
         | https://blog.genesmindsmachines.com/p/python-is-not-a-great-...
        
       | whyenot wrote:
       | What makes Python a great language for data science, is that so
       | many people are familiar with it, and that it is an easy language
       | to read. If you use a more obscure language like Clojure, Common
       | Lisp, Julia, etc., many people will not be familiar with the
       | language and unable to read or review your code. Peer review is
       | fundamental to the scientific endeavor. If you only optimize on
       | what is the best language for the _task_ , there are clearly
       | better languages than Python. If you optimize on what is best for
       | _science_ then I think it is hard not to argue that Python (and
       | R) are the best choices. In science, just getting things done is
       | not enough. Other people need to be able to read and understand
       | what you are doing.
       | 
       | BTW AI is not helping and in fact is leading to a generation of
       | scientists who know how to write prompts, but do not understand
       | the code those prompts generate or have the ability to peer
       | review it.
        
         | iLemming wrote:
         | I can't speak for Julia - never used it; never used Common Lisp
         | for analyzing data (I don't think it's very "data-oriented" for
         | the modern age and the shape of data), but Clojure is really
         | not "obscure" - it only looks weird for the first fifteen
         | minutes or so; once you start using it - it is one of the most
         | straightforward and reasonable languages out there - it is in
         | fact simpler than Python and Javascript. Immutable-by-default
         | makes it far much easier to reason about the code. And OMG, it
         | is so much more data-oriented - it's crazy that more people
         | don't use it. Most never even heard about it.
        
           | MarsIronPI wrote:
           | Common Lisp fan here, but not a data scientist. Why do you
           | say to avoid CL for data analysis? Not trying to flame or
           | anything, just curious about your experience with it.
        
             | iLemming wrote:
             | I don't have great experience of using CL for analyzing
             | data, because of "why?", if I already have another Lisp
             | that is simply amazing for data.
             | 
             | Clojure, unlike lists in traditional Lisps, based on
             | composable, unified abstraction for its collections, they
             | are lazy by default and literal readable data structures,
             | they are far easier to introspect and not so "opaque"
             | compared to anything - not just CL (even Python), they are
             | superb for dealing with heterogeneous data. Clojure's
             | cohesive data manipulation story is where Common Lisp's
             | lists-and-symbols just can't match.
        
               | dreamcompiler wrote:
               | Homework assignments notwithstanding, very few serious
               | Common Lisp programs use lists and symbols as their
               | primary data structures. This has been true since around
               | 1985.
               | 
               | Common Lisp has O[1] vectors, multidimensional arrays,
               | hash-tables (what Clojure calls maps), structs, and
               | objects. It has set operations too but it doesn't enforce
               | membership uniqueness. It also has bignums, several sizes
               | of floats, infinite-precision rationals, and complex
               | numbers. Not to mention characters, strings, and logical
               | operations on individual bits. The main difference from
               | Clojure is that CL data structures are not immutable. But
               | that's an orthogonal issue to the suggestion that CL
               | doesn't contain a rich library of modern data structures.
               | 
               | Common Lisp has never been limited to "List Processing."
        
               | iLemming wrote:
               | I wasn't trying to denigrate Common Lisp, I'm sorry if I
               | hurt your feelings. It does have comprehensive support
               | for all kinds of data structures. I wasn't talking it
               | being limited to "list processing". SBCL is great for
               | many things, but from many practical points Clojure
               | actually much better suited for data analysis.
               | 
               | You're saying: "hash-tables (what Clojure calls maps)"
               | not only inaccurate, you're hand-waving Clojure's core
               | design philosophy (immutability, structural sharing, lazy
               | sequences) as orthogonal. But those aren't cosmetic
               | differences - they're the reason why Clojure's data
               | structures are fundamentally better for data analysis. I
               | think you're confusing "having equivalent data types"
               | with "solving the same problem the same way"
        
           | 7thaccount wrote:
           | I tried to get into Clojure, but a lot of the JVM hosted
           | languages require some Java experience. Same thing with Scala
           | and Kotlin or F# on .NET.
           | 
           | The early tooling was also pretty dependent on Vim or Emacs.
           | Maybe it's all easier now with VSCode or something like that.
        
             | iLemming wrote:
             | None of this even remotely true. I've gotten into Clojure
             | without knowing jackshit about Java, almost ten years
             | later, after tons of things successfully built and
             | deployed, still don't know jackshit about Java. Mia, co-
             | host of 'Clojure apropos' podcast was my colleague, we've
             | worked together on multiple teams, she learned Clojure as
             | her very first PL. Later she tried learning some Java and
             | she was shocked how impossibly weird it looked compared to
             | Clojure. Besides, you can use Clojure without any JVM -
             | e.g., with nbb. I use it for things like browser automation
             | with Playwright.
             | 
             | The tooling story is also very solid - I use Emacs, but
             | many of my friends and colleagues use IntelliJ, Vim,
             | Sublime and VSCode, and some of them migrated to it from
             | Atom.
        
               | 7thaccount wrote:
               | It might not be a problem for you, but it has been for
               | many. I did start by reading through 3 Clojure books. The
               | repl and the basic stuff like using lists is all easy of
               | course, but the tooling was pretty poor compared to what
               | I was used to (I like lisp, but Emacs is a commitment).
               | Also, a lot of tutorials at the time definitely assumed
               | java familiarity, especially with debugging java stack
               | traces.
        
               | iLemming wrote:
               | > It might not be a problem for you, but it has been for
               | many
               | 
               | Do you have a habit of referring to yourself in plural,
               | or do you typically like to generalize things based on
               | your personal experiences?
               | 
               | I personally know many Clojurists who never had problems
               | you're describing - hundreds of people. Sure, that could
               | be the case of survivorship bias, perhaps I just don't
               | befriend people who struggled with getting into Clojure
               | specifically in a way you're describing. But like they
               | say: "Those who are willing to make the effort will find
               | the solutions. Those who aren't will find the excuses."
               | 
               | Clojure undeniably had challenges in the past, and still
               | has some today. But not the things you're talking about.
               | This is literally not an exaggeration - it's as easy as
               | installing Calva extention for VSCode - that's all one
               | needs to mess around with Clojure.
        
             | geokon wrote:
             | It doesn't require any Java but the docs do at times sort
             | of assume you understand the JVM to some extent - which was
             | a bit frustrating when first learning the language. It'll
             | use terms like "classpath" without explaining what that is.
             | However nowadays with LLMs these are insignificant
             | speedbumps.
             | 
             | If you want to use Java you also don't really need to know
             | Java beyond "you create instances of classes and call
             | methods on them". I really don't want to learn a dinosaur
             | like Java, but having access to the universe of Java libs
             | has saved me many times. It's super fun and nice to use and
             | poke around mature Java libs interactively with a REPL :)
             | 
             | All that said I'd have no idea how to write even a
             | helloworld in Java
             | 
             | PS: Agreed on Emacs. I love Emacs.. but it's for turbo
             | nerds. Having to learn Emacs and Clojure in parallel was a
             | crazy barrier. (and no, Emacs is not as easy people make it
             | out to be)
        
         | hekkle wrote:
         | > What makes Python a great language for data science, is that
         | so many people are familiar with it
         | 
         | While I agree with you in principal this also leads to what I
         | call the "VB Effect". Back in the day VB was taught at every
         | school as part of the standard curriculum. This made every kid
         | a 'computer wizz'. I have had to fix many a legacy codebase
         | that was started by someone's nephew the whizz kid.
        
         | aethor wrote:
         | Peer review is fundamental to scientific endeavor but... in ML
         | fields, reviewers almost never check the code and Python
         | package management is hardly reproducible. So clearly we are
         | not there, Python or not.
        
         | flexagoon wrote:
         | That's ok, I don't think _anyone_ knows how to properly write
         | Julia. After using it for a while and following the community
         | (watching talks, checking the forum etc), I don 't think it has
         | a concept of code quality. You just throw random code at the
         | wall until it starts working. Which makes sense, considering
         | most of the users are scientists.
        
       | huherto wrote:
       | For what is worth. The Kotlin folks have been adding some cool
       | features and tools for data analysis.
       | https://kotlinlang.org/docs/data-analysis-overview.html
        
       | niemandhier wrote:
       | Python is just a language that:
       | 
       | 1. Is easy to read
       | 
       | 2. Was easy to extend in languages that people who work with
       | scientific data happen to like.
       | 
       | When I did my masters we hacked around in the numpy source and
       | contributed here and there while doing astrophysics.
       | 
       | Stuff existed in Java and R, but we had learned C in the first
       | semester and python was easier to read and contrary to MATLAB
       | numpy did not need a license.
       | 
       | When data science came into the picture, the field was full of
       | physicists that had done similar things. They brought their tools
       | as did others.
        
         | jillesvangurp wrote:
         | The main feature of Python is that it is approachable by people
         | who have never programmed before. They might have a vague
         | notion of wanting to instruct a computer to first do this and
         | then do that. Imperative programming is their starting point.
         | And Python delivers that. It was designed as a scripting
         | language whose primary use indeed was to script together other
         | things. It always was good at that and that was the main thing
         | it was used for in the nineties.
         | 
         | It got popular once Linux distributions started relying on a
         | lot of python scripts (e.g. Red Hat and Debian). As a side
         | effect it was present on a lot of Linux and Unix systems early
         | on. Scientists in the early 2000s and late nineties had access
         | to workstations running Linux and Unix. So, Python was simply
         | the approachable thing that was just there already.
         | 
         | And because it's so easy, there are lots of people getting into
         | Python. So it got its own dynamic of generations of researchers
         | in all sorts of fields knowing about Python being the goto
         | thing to reach for. It never really was the best at anything it
         | does. That wasn't even a goal. It's a bit slow. A bit
         | verbose/clumsy compared to some of the alternatives that some
         | data scientists prefer. It lacks a lot of features other
         | languages have. Etc. This doesn't matter because it is simple
         | and easy. The type of users that are new to programming are
         | looking for something simple that they can understand. Not the
         | platonic ideal of a language that mathematicians or computer
         | scientists might prefer.
         | 
         | Python is the modern equivalent of BASIC which had this role
         | before python was created. It wasn't that amazing. But early
         | home computers had it as part of their OS. E.g. the Commodore
         | 64 that was my first computer had an interactive Basic shell
         | with the ability to load games from a tape as the main OS
         | experience.
        
       | jswelker wrote:
       | Inherited Python code is a mixed bag. Inherited R code is a
       | nightmare.
        
       | mushufasa wrote:
       | Languages inherently have network effects; most people around the
       | world learn English so they can talk with other professionals who
       | also know English, not because they are passionate about Charles
       | Dickens.
       | 
       | My take (and my own experience) is that python won because the
       | rest of the team knows it. I prefer R but our web developers
       | don't know it, and it's way better for me to write code that the
       | rest of our team can review, extend, and maintain.
        
       | pacbard wrote:
       | When you think about a data science pipeline, you really have
       | three separate steps:
       | 
       | [Data Preparation] --> [Data Analysis] --> [Result Preparation]
       | 
       | Neither Python or R does a good job at all of these.
       | 
       | The original article seems to focus on challenges in using Python
       | for data preparation/processing, mostly pointing out challenges
       | with Pandas and "raw" Python code for data processing.
       | 
       | This could be solved by switching to something like duckdb and
       | SQL to process data.
       | 
       | As far as data analysis, both Python and R have their own niches,
       | depending on field. Similarly, there are other specialized
       | languages (e.g., SAS, Matlab) that are still used for domain-
       | specific applications.
       | 
       | I personally find result preparation somewhat difficult in both
       | Python and R. Stargazer is ok for exporting regression tables but
       | it's not really that great. Graphing is probably better in R
       | within the ggplot universe (I'm aware of the python port).
        
       | huherto wrote:
       | Isn't the author saying that Python + Pandas is almost as good as
       | R, but Python without Pandas is less powerful than R.
       | 
       | I can't help to conclude that Python is as good as R because I
       | still have the choice of using Pandas when I need it. What did I
       | get wrong?
        
         | paddleon wrote:
         | you missed the "almost as" in your first sentence.
         | 
         | also, we didn't define "good".
        
       | programmertote wrote:
       | Disclaimer: I have nothing against R or Python and I'm not
       | partial to either.
       | 
       | Python, the language itself, might not be a great language for
       | data science. BUT the author can use Pandas or Polars or another
       | data-science-related library/framework in Python to get the job
       | done that s/he was trying to write in R. I could read both her R
       | and Pandas code snippets and understand them equally.
       | 
       | This article reads just like, "Hey, I'm cooking everything by
       | making all ingredients from scratch and see how difficult it
       | is!".
        
       | NuSkooler wrote:
       | You could end it with "Python is not a great language".
       | 
       | Now, is Python a SUCCESSFUL language? Very.
        
       | rdtsc wrote:
       | They basically advocate using R. I think it depends what they
       | mean by "data science" and if the person will be doing just data
       | science. If that's the case then R may be better. As in their
       | whole career is going to built on that domain. But let's say they
       | are on a general computer science track, now they'll probably
       | benefit from learning Python more than R, simply because they can
       | use it for other purposes.
       | 
       | > Either way, I'll not discuss it further here. I'll also not
       | consider proprietary languages such as Matlab or Mathematica, or
       | fairly obscure languages lacking a wide ecosystem of useful
       | packages, such as Octave.
       | 
       | I feel, to most programming folks R is in the same category. R is
       | to them what Octave is to the author. R is nice nice, but do they
       | really want to learn a "niche" language, even if it has better
       | some features than Python? Is holding a whole new paradigm,
       | syntax, library ecosystem in your head worth it?
        
       | solatic wrote:
       | Shell is the best language for data science. Pick the best tools
       | for each of getting data, cleaning data, transforming data, and
       | visualizing data, then stitch them together by sheer virtue of
       | the fact that text is the universal interoperable protocol and
       | files are the universal way of saving intermediate stages of
       | data.
       | 
       | Best part is, write a --help, and you can load them into LLMs as
       | tools to help the LLMs figure it out for you.
       | 
       | Fight me.
        
         | xn wrote:
         | redo[1] with shell scripts has become my goto method of dealing
         | with multi-step data problems. It makes it easy to review each
         | step of data retrieval, clean-up, transformation, etc.
         | 
         | I use mlr, sqlite, rye, souffle, and goawk in the shell
         | scripts, and visidata to interactively review the intermediate
         | files.
         | 
         | 1. https://redo.readthedocs.io/en/latest/
        
       | spicybbq wrote:
       | Part 2 is here:
       | 
       | https://blog.genesmindsmachines.com/p/python-is-not-a-great-...
        
       | drchaim wrote:
       | Python was a great language for data science, when data science
       | become a mainstream thing.
       | 
       | it was easy to think about the structures (iterators) it was easy
       | to extend. it had a good community.
       | 
       | And for that, people start extending it via libraries.
       | 
       | There are plenty more alternatives now.
        
       | thom wrote:
       | I think this expectation that data science code is a thing you
       | write basically top to bottom to get some answers out, put them
       | in a graph and move on with your life is not a useful lens
       | through which to evaluate two programming languages. R definitely
       | is an efficient DSL for doing stats this way, but it's a painful
       | way to build a durable piece of software. Python is nowhere near
       | perfect but I've seen fewer codebases that made my eyes bleed,
       | however pretty the graphs might look.
        
       | amai wrote:
       | The example would better be written in SQL. So according to the
       | author that would make SQL a great language for data science. SQL
       | also supports tables natively. This conclusion is of course
       | ridiculous and shows the shallow reasoning in this article.
        
       | constantcrying wrote:
       | Python is also an embarrassingly bad language for numerics. It
       | comes without support for different floating point types does not
       | have an n-D Array data type and is extremely slow.
       | 
       | At the same time it is an absolute necessity to know if you are
       | doing numerics. What this shows, at least to me, is that it is
       | "good enough" and that the million integrations, examples and
       | pieces of documentation matter more than whether the
       | peculiarities of the language work in favor of its given use
       | case, as long as the shortcomings can be mostly addressed.
        
         | slashdave wrote:
         | Native python is hopeless for numerics, which is why just about
         | everyone just uses numpy, which solves all of these issues. Of
         | course, a separate package. But the strength of python is that
         | it can fairly seamlessly incorporate these kinds of packages
         | that add core capabilities. Another important example: pytorch.
        
           | kbr2000 wrote:
           | https://en.wikipedia.org/wiki/Ousterhout's_dichotomy
        
       | coolThingsFirst wrote:
       | Python just has poor aesthetics. __init__(self) is unacceptable
       | in a language in 2025. Ruby would've been a much better choice.
       | Sloppiness in language design is just a bad idea.
        
         | stOneskull wrote:
         | there's @dataclass in 2025
        
       | semiinfinitely wrote:
       | correct, its only the best on that we have
        
       | keeeba wrote:
       | As a fairly extensive user of both Python and R, I net out
       | similarly.
       | 
       | If I want to wrangle, explore, or visualise data I'll always
       | reach for R.
       | 
       | If I want to build ML/DL models or work with LLM's I will usually
       | reach for Python.
       | 
       | Often in the same document - nowadays this is very easy with
       | Quarto.
        
         | Joel_Mckay wrote:
         | Python has a list of issues fundamentally broken in the
         | language, and relies heavily on integrated library bindings to
         | operate at reasonable speeds/accuracy.
         | 
         | Julia allows embedding both R and Python code, and has some
         | very nice tools for drilling down into datasets:
         | 
         | https://www.queryverse.org/
         | 
         | It is the first language I've seen in decades that reduces
         | entire paradigms into single character syntax, often
         | outperforming both C and Numpy in many cases. =3
        
           | pphysch wrote:
           | Deeply ironic for a Julia proponent to smear a popular
           | language as "fundamentally broken" without evidence.
           | 
           | https://yuri.is/not-julia/
        
             | Joel_Mckay wrote:
             | Python threading and computational errata issues go back a
             | long time. It is a popular integration "glue" language, but
             | is built on SWiG wrappers to work around its many
             | unresolved/unsolvable problems.
             | 
             | Not a "smear", but rather a well known limitation of the
             | language. Perhaps your environment context works
             | differently than mine.
             | 
             | It is bizarre people get emotionally invested in something
             | so trivial and mundane. Julia is at v1.12.2 so YMMV, but
             | Queryverse is a lot of fun =3
        
             | kelipso wrote:
             | This is like one of those people posting Dijkstra's letter
             | advocating for 0-based indexing without ever having read or
             | understood what they posted.
        
               | pphysch wrote:
               | What does indexing syntax have to do with Julia having a
               | rough history of correctness bugs and footguns?
        
               | Joel_Mckay wrote:
               | Sure, all software is terrible if looking at bug
               | frequency history...
               | 
               | https://github.com/python/cpython/issues
               | 
               | Griefers ranting about years old _closed_ tickets on
               | v1.0.5 versions on a blog as some sort of proof of
               | lameness... is a poorly structured argument. Julia
               | includes regression testing features built into even its
               | plotting library output, and thus issues usually stay
               | resolved due to pedantic reproducibility. Also, running
               | sanity-checks in any llvm language code is usually wise.
               | 
               | Best of luck =3
        
               | pphysch wrote:
               | Just saying, "other languages have bug reports" is a
               | exceptionally poor way to promote Julia =3
        
               | Joel_Mckay wrote:
               | To be blunt: Moores law is now effectively dead, and
               | chasing the monolithic philosophy with lazy monads will
               | eventually limit your options.
               | 
               | Languages like Julia trivially handle conditional
               | parallelism much more cleanly with the broadcast
               | operator, and transparent remote host process instancing
               | over ssh (still needs a lot of work to reach OTP like
               | cluster functionality.)
               | 
               | Much like Go, library resources ported into the native
               | language quietly moves devs away from the same polyglot
               | issues that hit Python.
               | 
               | Best of luck. =3
        
       | IshKebab wrote:
       | Python's not a great language for anything. Maybe for teaching
       | programming I guess (except then you end up with people that only
       | know Python).
        
       | aorist wrote:
       | > Examples include converting boxplots into violins or vice
       | versa, turning a line plot into a heatmap, plotting a density
       | estimate instead of a histogram, performing a computation on
       | ranked data values instead of raw data values, and so on.
       | 
       | Most of this is not about Python, it's about matplotlib. If you
       | want the admittedly very thoughtful design of ggplot in Python,
       | use plotnine
       | 
       | > I would consider the R code to be slightly easier to read
       | (notice how many quotes and brackets the Python code needs)
       | 
       | This isn't about Python, it's about the tidyverse. The reason you
       | can use this simpler syntax in R is because it's non-standard-
       | evaluation allows packages to extend the syntax in a way Python
       | does not expose: http://adv-r.had.co.nz/Computing-on-the-
       | language.html
        
         | npalli wrote:
         | Python is nothing without it's batteries.
        
           | jskherman wrote:
           | Python _is_ its batteries.
        
           | pphysch wrote:
           | The design and success of e.g. Golang is pretty strong
           | support for the idea that you can't and shouldn't separate a
           | language from its broader ecosystem of tooling and packages.
        
             | LtWorf wrote:
             | The success of python is due to not needing a broader
             | ecosystem for A LOT of things.
             | 
             | They are of course now abandoning this idea.
        
               | lmm wrote:
               | > The success of python is due to not needing a broader
               | ecosystem for A LOT of things.
               | 
               | I honestly think that was a coincidence. Perl and Ruby
               | had other disadvantages, Python won despite having bad
               | package management and a bloated standard library, not
               | because of it.
        
               | rjzzleep wrote:
               | It's because Ruby captured the web market and Python
               | everything else, and I get everything is more timeless
               | than a single segment.
        
               | vkazanov wrote:
               | Ruby was _competing_ on the web market and lost to many
               | others, including Python. In part, because python had a
               | much broader ecosystem, and php had wide adoption through
               | wordpress and others, and javascript was expanding from
               | browsers.
        
               | procaryote wrote:
               | The bloated standard library is the only reason I kept
               | using python in spite of the packaging nightmare. I can
               | do most things with no dependencies, or with one
               | dependency I need over and over like matplotlib
               | 
               | If python had been lean and needed packages to do
               | anything useful, while still having a packaging
               | nightmare, it would have been unusable
        
               | lmm wrote:
               | Well, sure, but equally I think there would have been a
               | lot more effort to fix the packaging nightmare if it had
               | been more urgent.
        
               | ModernMech wrote:
               | There was a massive effort though, the proliferation of
               | several different package managers is evidence of that.
        
               | LtWorf wrote:
               | The bloated standard library is the reason why you can
               | send around a single .py file to others and they can
               | execute it instantly.
               | 
               | Most of the python users are not able nor aware of venv,
               | uv, pip and all of that.
        
           | 1vuio0pswjnm7 wrote:
           | What language is used to write the batteries
        
             | logicprog wrote:
             | C/C++, in large part
        
               | saboot wrote:
               | And below that, FORTRAN :)
        
               | JPKab wrote:
               | These days it's a whole lot of Rust.
        
               | volemo wrote:
               | These days it's still a whole lot of Fortran, with some
               | Rust sprinkled on top. (:
        
               | pjmlp wrote:
               | Which since Fortran 2003, or even Fortran 95, has gotten
               | rather nice to use.
        
               | Koshkin wrote:
               | IDK it's become too verbose IMHO, looks almost like COBOL
               | now. (I think it was Fortran 66 that was the last Fortran
               | true to its nature as a "Formula Translator"...)
        
               | pjmlp wrote:
               | We are way beyond comparing languages to COBOL, now that
               | plenty folks type whole book sized descriptions into tiny
               | chat windows for their AI overloads.
        
           | throwaway2037 wrote:
           | I hear this so much from Python people -- almost like they
           | are paid by the word to say it. Is it different from Perl,
           | Ruby, Java, or C# (DotNet)? Not in my experience, except
           | people from those communities don't repeat that phrase so
           | much.
           | 
           | The irony here: We are talking about data science. 98% of
           | "data science" Python projects start by creating a virtual
           | env and adding Pandas and NumPy which have numerous (really:
           | squillions of) dependencies outside the foundation library.
        
             | m55au wrote:
             | Someone correct me if I'm completely wrong, but by default
             | (i.e. precompiled wheels) numpy has 0 dependencies and
             | pandas has 5, one of which is numpy. So not really
             | "squillions" of dependencies.
             | 
             | pandas==2.3.3
             | 
             | +-- numpy [required: >=1.22.4, installed: 2.2.6]
             | 
             | +-- python-dateutil [required: >=2.8.2, installed:
             | 2.9.0.post0]
             | 
             | | +-- six [required: >=1.5, installed: 1.17.0]
             | 
             | +-- pytz [required: >=2020.1, installed: 2025.2]
             | 
             | +-- tzdata [required: >=2022.7, installed: 2025.2]
        
               | noitpmeder wrote:
               | I don't know about _squillions_, but numpy definitely has
               | _requirements_, even if they're not represented as such
               | in the python graph.
               | 
               | e.g.
               | https://github.com/numpy/numpy/blob/main/.gitmodules
               | (some source code requirements)
               | https://github.com/numpy/numpy/tree/main/requirements
               | (mostly build/ci/... requirements)       ...
        
               | m55au wrote:
               | They're not represented, because those are build-time
               | dependencies. Most users when they do pip install numpy
               | or equivalent, just get the precompiled binaries and none
               | of those get installed. And even if you compile it
               | yourself, you still don't need those for running numpy.
        
               | nonameiguess wrote:
               | Read https://numpy.org/devdocs/building/blas_lapack.html.
               | 
               | NumPy _will_ fall back to internal and very slow BLAS and
               | LAPACK implementations if your system does not have a
               | better one, but assuming you 're using NumPy for its
               | performance and not just the convenience of adding array
               | programming features to Python, you're really gonna want
               | better ones, and what that is heavily depends on the
               | computer you're using.
               | 
               | This isn't really a Python thing, though. It's a hard
               | problem to solve with any kind of scientific computing.
               | If you insist on using a dynamic interpreted language,
               | which you probably have to do for exploratory interactive
               | analysis, and you still need speed over large datasets,
               | you're gonna need to have a native FFI and link against
               | native libraries. Thanks to standardization, you'll have
               | many choices and which is fastest depends heavily on your
               | hardware setup.
        
               | aragilar wrote:
               | The wheels will most likely come with openblas, so while
               | you can get the original blas (which is really only slow
               | by comparison, for small tasks it's likely users won't
               | notice), this is generally not an issue.
        
         | dm319 wrote:
         | > This isn't about Python, it's about the tidyverse.
         | 
         | > it's non-standard-evaluation allows packages to extend the
         | syntax in a way Python does not expose
         | 
         | Well this is a fundamental difference between Python and R.
        
           | debtta wrote:
           | The point is that the ability to extend the syntax of R leads
           | to chaos and mess (in general) but when used correctly and
           | effectively in the tidyverse, improves the experience of
           | writing and reading code.
        
         | robot-wrangler wrote:
         | >> I would consider the R code to be slightly easier to read
         | (notice how many quotes and brackets the Python code needs)
         | 
         | Oh god no, do people write R like that, pipes at the end?
         | Elixir style pipe-operators at the beginning is the way.
         | 
         | And if you really wanted to "improve" readability by confusing
         | arguments/functions/vars just to omit quotes, python can do
         | that, you'll just need a wrapper object and getattr hacks to
         | get from `my_magic_strings.foo` -> `'foo'`. As for the
         | brackets.. ok that's a legitimate improvement, but again not
         | language related, it's library API design for function sigs.
        
           | rtaylorgarlock wrote:
           | Upvoted for pipes at the beginning
        
           | medstrom wrote:
           | IIRC, putting pipe operator `|>` at end of line prevents the
           | expression from terminating early. Otherwise the newline
           | would terminate it.
        
           | tmtvl wrote:
           | The right way is putting the pipe operator at the beginning
           | of the expression.                 (-> (gather-some-data)
           | (map 'Vector #'some-functor)         (filter #'some-
           | predicate)         (reduce #'some-gatherer))
           | 
           | Or for those who have an irrational fear of brackets:
           | ->         gather-some-data         map 'Vector #'some-
           | functor         filter #'some-predicate         reduce
           | #'some-gatherer
        
         | evolighting wrote:
         | R is more of a statistical software than a programming
         | language. So, if you are a so-called "statistician," then R
         | will feel familiar to you
        
           | UniverseHacker wrote:
           | No, R is a serious general purpose programming language that
           | is great for building almost any type of complex scientific
           | software with. Projects like Bioconductor are a good example.
        
             | evolighting wrote:
             | Perhaps a in a context of comparison with Python?
             | 
             | In my limited experience, Using R feels like to using
             | JavaScript in the browser: it's a platform heavily focused
             | on advanced, feature-rich objects (such as DataFrames and
             | specialized plot objects). but you could also just build
             | almost anything with it.
        
             | blubber wrote:
             | No, it's not. Even established packages have bugs caused by
             | R weirdness. I like it nevertheless.
        
               | Cosi1125 wrote:
               | Care to give some examples?
        
               | northlondoner wrote:
               | Yes, R is a proper general purpose programming language.
               | Turing complete, functional, procedural, object
               | oriented.../
        
               | steine65 wrote:
               | Just in case someone reads this far and sees blubber's
               | confident "No." Blubber is definitely wrong here. I used
               | to do all of my programming in R. Throw the question into
               | an LLM if you're wondering if R has a package like ___ in
               | python.
        
         | getnormality wrote:
         | It's not about Python, it's about how R lets you do something
         | Python can't?
        
         | isolli wrote:
         | Or seaborn. It was built exactly for this purpose: abstracting
         | some of the annoying kinks of matplotlib while still offering a
         | rich set of features.
         | 
         | https://seaborn.pydata.org/tutorial/introduction.html
        
         | jampekka wrote:
         | I wonder what the last example of "logistics without libraries"
         | would look like in R. Based on my experience of having to do
         | "low-level" R, it's gonna be a true horror show.
         | 
         | In R it's often that things for which there's a ready made
         | libraries and recipes are easy, but when those don't exist,
         | things become extremely hard. And the usual approach is that if
         | something is not easy with a library recipe, it just is not
         | done.
        
           | m000 wrote:
           | The way you describe it, can we say that R was AI-first
           | without even knowing?
        
             | nerdponx wrote:
             | R is overtly and heavily inspired by Lisp which was a big
             | deal in AI at one point. They knew what they were doing.
        
           | debtta wrote:
           | Python: easy things are easy, hard things are hard.
           | 
           | R: easy things are hard, hard things are easy.
        
         | blubber wrote:
         | "The reason you can use this simpler syntax in R is because
         | it's non-standard-evaluation ..."
         | 
         | So it actually is about Python vs R.
         | 
         | That said, while this kind of non-standard evaluation is nice
         | when working interactively on the command line, I don't think
         | it's that relevant when writing code for more elaborated
         | analyses. In that context, I'd actually see this as a
         | disadvantage of R because you suddenly have to jump through
         | loops to make trivial things work with that non-standard
         | evaluation.
        
           | _Wintermute wrote:
           | The increasing prevalence of non-standard evaluation in R
           | packages was one of the major reasons I switched from R to
           | python for my work. The amount of ceremony and constant API
           | changes just to have something as an argument in a function
           | drove me mad.
        
             | disgruntledphd2 wrote:
             | > nd constant API changes
             | 
             | Yeah, this was so very very painful. I once ended up
             | maintaining a library that basically used all the different
             | NSE approaches, which was not very much fun at all.
        
       | drnick1 wrote:
       | I suppose it depends on what exactly is meant by "data science."
       | If find that for stochastic simulations, C++ and the Eigen
       | library are unbeatable. You get the readability of high-level
       | code with the performance of low-level code thanks to the "zero-
       | cost abstractions" of Eigen.
       | 
       | If by data science you mean loading data to memory and running
       | canned routines for regression, classification and other
       | problems, then Python is great and mostly calls C/FORTRAN
       | binaries under the hood, so Python itself has relatively little
       | overhead.
        
       | KaiserPro wrote:
       | The observation I make here is in that first python example with
       | the penguins, what the fuck is that?
       | 
       | It makes it look like perl, on a bad day, or worse autogenerated
       | javascript.
       | 
       | Why on earth is it so many levels deep in objects?
        
       | johnea wrote:
       | They could have just left the last three words off of that title
       | 8-/
       | 
       | Python is not a great language
       | 
       | First, the white space requirements are a bad flashback to 1970s
       | fortran.
       | 
       | Second, it is the language that is least compatible with itself.
        
       | taeric wrote:
       | I'm heavily inclined to agree with the general thought, but I
       | balk at the low level code showing why a language is bad at
       | something. In this specific case, without the tidyverse, R isn't
       | exactly peaches and cream.
       | 
       | As annoying as it is to admit it, python is a great language for
       | data science almost strictly because it has so many people doing
       | data science with it. The popularity is, itself, a benefit.
        
       | progval wrote:
       | The pure Python code in the last example is more verbose than it
       | needs to be.                   groups = {}         for row in
       | filtered:             key = (row['species'], row['island'])
       | if key not in groups:                 groups[key] = []
       | groups[key].append(row['body_mass_g'])
       | 
       | can be rewritten as:                   groups =
       | collections.defaultdict(list)         for row in filtered:
       | groups[(row['species'],
       | row['island'])].append(row['body_mass_g'])
       | 
       | and                   variance = sum((x - mean) ** 2 for x in
       | values) / (n - 1)         std_dev = math.sqrt(variance)
       | 
       | as:                   std_dev = statistics.stddev(values)
        
         | ashdev wrote:
         | Disagree.
         | 
         | In the first instance, the original code is readable and tells
         | me exactly what's what. In your example, you're sacrificing
         | readability for being clever.
         | 
         | Clear code(even if verbose) is better than being clever.
        
           | billyoyo wrote:
           | Using a very common utility in the standard library is to
           | avoid reinventing the wheel is not "clean code"?
           | 
           | defaultdict is ubiquitous in modern python, and is far from a
           | complicated concept to grasp.
        
             | ux266478 wrote:
             | I don't think that's the right metaphor to use here, it
             | exists at a different level than what I would consider
             | "reinventing the wheel". That to me is more some attempt to
             | make a novel outward-facing facet of the program when
             | there's not much reason to do so. For example,
             | reimplementing shared memory using a custom kernel driver
             | as your IPC mechanism, despite it not doing anything that
             | shared memory doesn't already do.
             | 
             | The difference between the examples is so trivial I'm not
             | really sure why the parent comment felt compelled to
             | complain.
        
           | MarsIronPI wrote:
           | I think code clarity is subjective. I find the second easier
           | to read because I have to look at less code. When I read
           | code, I instinctively take it apart and see how it fits
           | together, so I have no problem with the second approach.
           | Whereas the first approach is twice as long so it takes me
           | roughly twice as long to read.
        
           | explodes wrote:
           | The 2nd version is the most idiomatic.
        
           | pphysch wrote:
           | I would keep the explicit key= assignment since it's more
           | than just a single literal but otherwise the second version
           | is more idiomatic and readable.
        
           | ashdev wrote:
           | Interesting! Thanks for the responses. I'm not python native
           | and haven't worked as extensively with python as some of you
           | here.
           | 
           | That said, I'll change my mind here and agree on using std
           | library, but I'd still have separate 'key' assignment here
           | for more clarity.
        
           | freehorse wrote:
           | Imo, if you read such code the first time, you may prefer the
           | first. If you read it for the 20th time, you may prefer the
           | second. Once you understand what you are doing, often one
           | prefers more concise syntax that helps in handling complexity
           | within a larger project. But it can seem a bit "too clever"
           | in the beginning.
        
             | rkomorn wrote:
             | This happened to me with comprehensions in python, and with
             | JS' love for anonymous/arrow functions.
             | 
             | Once you get used to a language's "quirks" (so long as
             | they're considered idiomatic), they no longer feel quirky,
             | and it's usually pretty quick.
        
               | freehorse wrote:
               | You get to the same point with non-considered idiomatic
               | syntax also, the only problem being that it will be only
               | you who understands it.
        
               | rkomorn wrote:
               | Only so long as you keep the habit going.
               | 
               | I've definitely written some things that I came back to
               | much later and had to relearn (which is somewhere between
               | embarrassing and humbling).
        
         | roadside_picnic wrote:
         | > (n - 1)
         | 
         | It's also funny that one would write their own standard
         | deviation function and _include_ Bessel 's correction. Usually
         | if I'm manually re-implementing a standard deviation function
         | it's because I'm afraid the implementors blindly applied the
         | correction without considering whether or not it's actually
         | meaningful for the given analysis. At the very least, the
         | correct name for what's implemented there should really be
         | `sample_std_dev`.
        
           | m55au wrote:
           | It is sadly really inconsistent. The stdlib statistics has
           | two separate functions, stdev for sample and pstdev for
           | population. Numpy and pandas both have .std() with ddof
           | (delta degrees of freedom) as a parameter, but numpy defaults
           | to 0 (population) and pandas to 1 (sample).
        
         | gcbirzan wrote:
         | There's also itertools.groupby, maybe not much shorter (need to
         | define the keyfunc, sort, then iterate), but it does make the
         | intent obvious.
        
       | shevy-java wrote:
       | > I think people way over-index Python as the language for data
       | science. It has limitations that I think are quite noteworthy.
       | There are many data-science tasks I'd much rather do in R than in
       | Python.
       | 
       | R is kind of a super-specialized language. Python is much more
       | general purpose.
       | 
       | R failed to evolve, let's be honest. Python won via jupyter - I
       | see this used ALL the time in universities. R is used too, but
       | mostly for statistics related courses only, give or take.
       | 
       | Perhaps R is better for its niche, but Python has more momentum
       | and in thus, dominates over R. That's simply the reality of the
       | situation. It is like the bulldozer moving forward, at a fast
       | speed.
       | 
       | > I say "This is great, but could you quickly plot the data in
       | this other way?"
       | 
       | Ok so ... he would have to adjust R code too, right? And finding
       | good info on that is simply harder. He says he has experience
       | with universities. Well, I do too, and my experience is that
       | people are WAY better with python than with R. You simply see
       | that more students will drop out from R than from python. That's
       | also simply the reality of the situation.
       | 
       | > They appear to be sufficiently cumbersome or confusing that
       | requests that I think should be trivial frequently are not.
       | 
       | I am sure the reverse also applies. Pick some python library, do
       | something awesome, then tell the R students to do the same. I bet
       | he will have the same problems.
       | 
       | > So many times, I felt that things that would be just a few
       | lines of simple R code turned out to be quite a bit longer and
       | fairly convoluted.
       | 
       | Ok, so here he is trolling. Flat out - I said it.
       | 
       | I wrote a LOT of python and quite a bit of R. There is no way in
       | life that the R code is more succinct than the python code for
       | about 90% of the use cases out there. Sorry, that's simply not
       | the case. R is more verbose.
       | 
       | > Here is the relevant code in R, using the tidyverse approach:
       | penguins |>           filter(!is.na(body_mass_g)) |>
       | group_by(species, island) |>           summarize(
       | 
       | This is like perl. They also don't adapt. R is going to lose
       | grounds.
       | 
       | This professor just hasn't realised that he is slowly becoming a
       | fossil himself, by being unable to see that x is better than y.
        
         | oivey wrote:
         | > R failed to evolve, let's be honest. Python won via jupyter
         | 
         | Ju = Julia Pyt = Python Er = R
         | 
         | R is not only supported in Jupyter, it was there from the
         | start. I've never written a single line of R. It is bizarre how
         | little people know about their tools.
        
           | aragilar wrote:
           | But it used to be iPython (and the notebook interface did
           | come out when it was still iPython).
        
             | oivey wrote:
             | Yeah. The extra language support is partially why they
             | renamed it.
        
       | rob_c wrote:
       | Refuses to learn tool so tool is broken... There is no problem
       | with python for this. If you hate boiler plate job the club, get
       | llms to generate it for you and move on to doing real work (or
       | get involved in improving the language or libraries directly)
        
       | hekkle wrote:
       | For those who thought the article was TL;DR, the author argues.
       | 
       | - A General programming language like Python is good enough for
       | data science but isn't specifically designed for it.
       | 
       | - A language that is specifically designed for Data Science like
       | R is better at Data Science.
       | 
       | Who would have thought?
        
       | fnord77 wrote:
       | Python is not a great language
        
         | hekkle wrote:
         | Not great at what?
         | 
         | I agree that Python is not great at anything specifically, but
         | it is good at almost everything, and that's what makes it
         | great.
        
       | psunavy03 wrote:
       | I'm not sure what that last example is meant to be other than an
       | anti-Python caricature. If you're implementing calculating things
       | like standard deviations by hand, that's not real-world coding,
       | that's the undergraduate harassment package which should end with
       | a STEM bachelor's.
       | 
       | Of course there's a bunch of loops and things; you're exposing
       | what has to happen in both R and Python under the hood of all
       | those packages.
        
         | roadside_picnic wrote:
         | > that's not real-world coding
         | 
         | It's pretty clear the post is focused on the context of work
         | being done in an academic research lab. In that context I think
         | most of the points are pretty valid, but most of the real world
         | benefit I've experience from using Python is being able to work
         | more closely with engineering (even on non-Python teams).
         | 
         | I shipped R code to a production environment once over my
         | career and it felt incredibly fragile.
         | 
         | R is _great_ for EDA, but really doesn 't work well for
         | iteratively building larger software projects. R is has a great
         | package system, but it's not so great when you need abstraction
         | in between.
        
           | SubiculumCode wrote:
           | Yeah, to me, R has never really been a.language I'd choose to
           | program with...it's a statistical powerhouse to analyze
           | datasets with great packages / SOTA statistical methods, etc,
           | not a roduction tool.
        
       | aussieguy1234 wrote:
       | I felt forced to use python when I gave langgraph agents a go.
       | 
       | Worked quite well, but the TS/JS langgraph version is way behind.
       | React agents are just a few lines of code, compared to 50 odd
       | lines for the same thing in JS/TS.
       | 
       | Better to use a different language, even one i'm not familiar
       | with, to be able to maintain a few lines of code vs 50 lines.
        
       | yodsanklai wrote:
       | Maybe R is fine for people who use it all the time? but as SWE
       | that occasionally needs to do some data analysis, I find it much
       | easier to rely on tools I know rather than R. R is pretty
       | convoluted as a language.
        
       | dragonwriter wrote:
       | The bare python/stdlib example used (as well as bare python and
       | avoiding add-on data science oriented libraries not being the way
       | most people would use python for data science) is just...bad?
       | (And, by bad here I mean showing signs of deliberately avoiding
       | stdlib features in order to increase the appearance of the things
       | the author then complains about.)
       | 
       | A better stdlib-only version would be:                   from
       | palmerpenguins import load_penguins         import math
       | from itertools import groupby         from statistics import
       | fmean, stdev              penguins = load_penguins()
       | # Convert DataFrame to list of dictionaries         penguins_list
       | = penguins.to_dict('records')              # create key function
       | for grouping/sorting by species/island         def key_func(x):
       | return x['species'], x['island']              # Filter out rows
       | where body_mass_g is missing and sort by species and island
       | filtered = sorted((row for row in penguins_list if not
       | math.isnan(row['body_mass_g'])), key=key_func)              #
       | Group by species and island         groups = groupby(filtered,
       | key=key_func)              # Calculate mean and standard
       | deviation for each group         results = []         for
       | (species, island), group in groups:             values =
       | [row['body_mass_g'] for row in group]             mean_value =
       | fmean(values)             sd_value = stdev(values,
       | xbar=mean_value)             results.append({
       | 'species': species,                 'island': island,
       | 'body_weight_mean': mean_value,                 'body_weight_sd':
       | sd_value             })
        
       | rossdavidh wrote:
       | Speaking as a python programmer who has occasionally done work in
       | R: yes, of course. Python is not a great language for anything;
       | it's a pretty good language for just about anything. That is, and
       | always has been, its strength.
       | 
       | If you're doing data science all day, you should learn R, even if
       | it's so weird at first (for somebody coming from a C-style
       | language) that it seems way harder; R is made for the way
       | statisticians work and think, not the way computer programmers
       | work and think. If you're doing data science all day, you should
       | start thinking and working like a statistician and working in R,
       | and the fact that it seems to bend your mind is probably at least
       | in part good, because a statistician needs to think differently
       | than a programmer.
       | 
       | I work in python, though, almost all of the time.
        
         | ZhiqiangWang wrote:
         | Agree, the argument is well made in sklearn API design paper
         | https://arxiv.org/abs/1309.0238
        
       | actuallyalys wrote:
       | As much as I like Python and personally prefer it to R, I don't
       | really disagree. But I'm not sure R is a _great_ language for
       | data science either--it has its own weaknesses, e.g., writing
       | custom loops (or functional equivalents with map or reduce) was
       | pretty clunky last I tried it.
       | 
       | The other thing is that a lot of R's strengths are really the
       | tidyverse's. Some of that is to R's credit as an extensible
       | language that enables a skilled API designer to really shine of
       | course, but I think there's no reason Python the language
       | couldn't have similar libraries. In fact it has, in plotnine. (I
       | haven't tried Polars yet but it does at least seem to have a more
       | consistent API.)
        
       | dcreater wrote:
       | Fixed title: Python is not a great language for data science if
       | pandas/polars/ibis did not exist
        
         | mike_ivanov wrote:
         | Please read the article. It literally shows pandas code as an
         | example.
        
       | roadside_picnic wrote:
       | So I've been writing Python for around 20 years now, and doing
       | data science/ML work for around 15. Despite being a Python
       | programmer first I spent a good 5 years using R exclusively.
       | There's a lot of things I genuinely love about R and I strongly
       | believe that R is unfairly maligned by devs... but there's a good
       | reason I have written exclusively Python for DS work for the last
       | 5 years.
       | 
       | > Python is pretty good for deep learning. There's a reason
       | PyTorch is the industry standard. When I'm talking about data
       | science here, I'm specifically excluding deep learning.
       | 
       | I've written very little deep learning code over my career, but
       | made very frequent use of the GPU and differentiable programming
       | for non-deep learning specific tasks. In general Python is much
       | easier to write quantitative programs that make use of the
       | hardware, and you have a lot more options when your problem
       | doesn't fit into RAM.
       | 
       | > I have been running a research lab in computational biology for
       | over two decades.
       | 
       | I've been working nearly exclusively in industry for these two
       | decades and a _major_ reason I find Python just better is it 's
       | much, much easier to interface with other parts of engineering
       | when you're a using truly general purpose PL. I've actually never
       | worked for a pure Python shop, but it's generally much easier to
       | get production ML/DS solutions into prod when working with
       | Python.
       | 
       | > Data science as I define it here involves a lot of interactive
       | exploration of data and quick one-off analyses or experiments
       | 
       | This re-iterates the previous difference. In my experience I
       | would call this "step one" in all my DS related work. The first
       | step is to understand the problem and de-risk. But the vast
       | majority of code and work is related to delivering a scalable
       | product.
       | 
       | You can say that's not part of "data science", but if you did
       | you'd have a hard time finding a job on most of the teams I've
       | worked on.
       | 
       | All that said, my R vs Python experience has boiled down to: If
       | your end result is a PDF report, R is superior. If your end
       | result is shipping a product, then Python is superior. And my
       | experience has been that, outside of university labs, there
       | aren't a lot of jobs out there for DS folks who only want to
       | deliver PDFs.
        
       | UniverseHacker wrote:
       | Doing computational biology for several decades in about a dozen
       | languages, I do think R is a much better language for data
       | science, but in practice I end up using Python almost every time
       | because it has more libraries, and it's easier to find software
       | engineers and collaborators to work on Python. However, R makes
       | for much simpler cleaner code, less silent errors, and the 1
       | indexing makes dealing with biological sequences much less
       | hassle.
        
         | 3eb7988a1663 wrote:
         | Pardon? Less silent errors? R has quite a few foot guns around
         | permissively parsing user intention. Which does make it handy
         | for exploratory analysis, but a lot more fragile when you want
         | production code.
         | 
         | Just a simple one that can get you, R is 1-indexed. Yet if you
         | have a vector, accessing myvec[0] is not an error.
         | Alternatively, if you had say, a vector length of 3 and do
         | myvec[10] that gets NA (an otherwise legal value). Or you could
         | make an assignment past the end of the vector myvec[15] <- 3.14
         | , which will silently extend the array, inserting NAs
        
         | _Wintermute wrote:
         | In my experience R is king of happily chugging along spitting
         | out nonsense results when it should have errored 100 lines ago.
        
       | drtournier wrote:
       | JavaScript is not a great language for web development either,
       | yet...
        
       | janalsncm wrote:
       | Python is versatile which is what makes it popular. You can load
       | back and forth from a GPU using well-tested libraries. You can
       | memmap things if you need to. If your loops are too slow you can
       | rewrite the hot loops in rust or C. You can read and write from
       | most file formats in a couple of lines.
        
       | plaidfuji wrote:
       | Python is a pretty bad language for tabular data analysis and
       | plotting, which seems to be the actual topic of this post. R is
       | certainly better, hell Tableau, Matlab, JMP, Prism and even Excel
       | are all better in many cases. Pandas+seaborn has done a lot, but
       | seaborn still has frustrating limits. And pandas is essentially a
       | separate programming language.
       | 
       | If your data is already in a table, and you're using Python,
       | you're doing it because you want to learn Python for your next
       | job. Not because it's the best tool for your current job. The one
       | thing Python has on all those other options is $$$. You will be
       | far more employable than if you stick to R.
       | 
       | And the reason for that is because Python is one of the _best_
       | languages for data and ML _engineering_ , which is about 80% of
       | what a data science job actually entails.
        
         | getnormality wrote:
         | ...unless your data engineering job happens on a database, in
         | which case R's dbplyr is far better than anything Python has to
         | offer.
        
         | jampekka wrote:
         | > And pandas is essentially a separate programming language.
         | 
         | I'd say dplyr/tidyverse is a lot more a separate programming
         | language to R than pandas is to Python.
        
       | _ZeD_ wrote:
       | Sooo... Is this a post about python envy?
        
       | morshu9001 wrote:
       | Data science is the one thing I consider Python especially good
       | at
        
       | culebron21 wrote:
       | This was underwhelming. I work with Python and Pandas, and I can
       | show examples of much clumsier workflows I run into. The most
       | often, you get dataframe[(dataframe.column1 == something) &
       | ~dataframe.column2.isna()] constucts, which show that python
       | syntax falls short here, and isn't suitable for such
       | manipulations. Unfortunately, there's no alternative, and I don't
       | see R as much easier, there are plenty of ugly things as well
       | there.
       | 
       | There's Julia -- it has serious drawbacks, like slow cold start
       | if you launch a Julia script from the shell, which makes it
       | unsuitable for CLI workflows.
       | 
       | Otherwise you have to switch to compiled languages, with their
       | tradeoffs.
        
         | markkitti wrote:
         | > Unfortunately, there's no alternative, and I don't see R as
         | much easier, there are plenty of ugly things as well there.
         | 
         | Have you tried Polars? It really discourages the inefficient
         | creation of intermediate boolean arrays such as in the code
         | that you are showing.
         | 
         | > There's Julia -- it has serious drawbacks, like slow cold
         | start if you launch a Julia script from the shell, which makes
         | it unsuitable for CLI workflows.
         | 
         | Julia has gotten significantly better over time with regard to
         | startup, especially with regard to plotting. There is
         | definitely a preference for REPL or notebook based development
         | to spread the costs of compilation over many executions.
         | Compilation is increasingly modular with package based
         | precompilation as well as ahead-of-time compilation modes. I do
         | appreciate that typical compilation is an implicit step making
         | the workflow much more similar to a scripting language than a
         | traditionally compiled language.
         | 
         | I also do appreciate that traditional ahead-of-time static
         | compilation to binary executable is also available now for
         | deployment.
         | 
         | After a day of development in R or Python, I usually start
         | regretting that I am not using Julia because I know yesterday's
         | code could be executing much faster if I did. The question
         | really becomes do I want to pay with time today or over the
         | lifetime of the project.
        
           | jampekka wrote:
           | > Have you tried Polars? It really discourages the
           | inefficient creation of intermediate boolean arrays such as
           | in the code that you are showing.
           | 
           | The problem is not usually inefficiency, but syntactic noise.
           | Polars does remove that in some cases, but in general gets
           | even more verbose (apparently by design), which gets annoying
           | fast when doing explorative data analysis.
        
       | slowhadoken wrote:
       | Sounds like a skill issue
        
       | gyulai wrote:
       | I think, the lesson learned from > Python v. R < is that people
       | prefer doing data science in a _general purpose_ language that is
       | also okay-ish for data science over a language that 's purpose-
       | built for data science but suffers from diseconomies.
       | Specifically: Imagine a new database or something like that has
       | just come out. Now, the audience that wants to wire it into
       | applications and the audience that wants to tap it to extract
       | data for analytics put their weight together to create the demand
       | for the Python library. The economies for that work out better
       | than if you had to create two different libraries in two
       | different languages to satisfy those two groups of demand.
        
         | LanceH wrote:
         | You mention a good point of using Python to put out the
         | results.
         | 
         | I think munging the input into a clean enough data set that you
         | can work on is another place Python excels compared to analysis
         | specific tools like R.
        
       | Surac wrote:
       | I at the moment try to learn python as a hobby language. I use c
       | c++ and c# to earn my money. MY biggest problem is finding good
       | examples that are up to date. I spent a whole day learning that
       | there a four (I think) ways to do formatting strings. This
       | ,,bloat" in syntax makes even a simple print very heavy to
       | digest. I don't even bother using v2 python only v3. Also using
       | whitespaces to block things together sounds appealing but in
       | reality you need to use editors that can indent and unindent
       | whole blocks or I never get it right
        
         | lenkite wrote:
         | 15 years ago, Python programmers used to mock Perl by quoting
         | the Zen of Python: "There should be one - and preferably only
         | one - obvious way to do it.". This was in stark contrast to
         | Perl's TIMTOWTDI motto: "There Is More Than One Way To Do It."
         | 
         | The Zen of Python is sadly now an absolute lie.
        
           | gbacon wrote:
           | Rather than an absolute lie, I'm more inclined to
           | characterize it as naive or black-and-white thinking, outside
           | of CRUD apps and undergraduate intro projects.
        
         | Stratoscope wrote:
         | You seem to be making things more difficult for yourself than
         | they need to be.
         | 
         | For the strings, just use f-strings and forget all the others.
         | You can even do things like this for debugging:
         | >>> class User:       ...     pass       ... user = User()
         | ... user.name = "Surac"       ...       >>>
         | print(f"{user.name=}")       user.name='Surac'       >>>
         | 
         | For the block indenting, what editor are you using? Pretty much
         | every modern editor lets you select a block and indent/unindent
         | with Tab/Shift+Tab.
         | 
         | VS Code and PyCharm are both free and are great for Python
         | coding. They each have a full debugger, which is invaluable
         | when you are learning a language.
        
           | louistsi wrote:
           | I think their point is that it's not clear to someone with 0%
           | Python experience which of the /many/ different ways of doing
           | things (like string interpolation) is the "correct" /
           | idiomatic way.
        
         | IshKebab wrote:
         | > but in reality you need to use editors that can indent and
         | unindent whole blocks or I never get it right
         | 
         | What editor are you using that can't do that? Notepad?
        
       | Havoc wrote:
       | Realistically it's winning because it's accessible rather than
       | perfectly suited
        
       | willvarfar wrote:
       | My experience was that data science was doable but clunky and
       | ugly with pandas. It got slightly better with polars. Only really
       | slightly better. Then, for me at least, it jumped lightyears
       | ahead with duckdb.
       | 
       | These days I run some big query on an OLAP database and download
       | the results to parquet stored on the local disk of a cloud
       | notebook VM and then mine it to bits with duckdb reading straight
       | from these parquet files.
       | 
       | The notebooks end up with very clear SQL queries and results
       | (most notebook servers support SQL cells with highlighting and
       | completion etc), and small pockets of python cells for doing
       | those corner case things that an imperative language makes
       | easier.
       | 
       | So when I get to the bottom of the article where it shows the
       | difference between Python and R, I'm screaming "wouldn't that
       | look better in SQL?!" :)
        
         | mettamage wrote:
         | Huh, as a frequent polars user, I'll try duckdb.
        
           | knorke wrote:
           | well, duckdb works very well with pandas, too
        
         | goatlover wrote:
         | So you're saying you prefer SQL to dataframes. I prefer
         | dataframes and staying in the native language.
        
           | willvarfar wrote:
           | Duckdb can see and manipulate dataframes too. Duckdb has it's
           | own storage, but other table storage - e.g. the parquet files
           | I mentioned or even csv files or even dataframes from pandas
           | and polars - are first-class citizens. Duckdb lets you query
           | them quickly and expressively.
        
       | sheepscreek wrote:
       | I really didn't understand the author's grievances. The only
       | concrete example they illustrated was one where they concluded
       | that Python without Pandas is verbose and ugly to achieve the
       | same outcome, hence Python is not great for Data Science.
       | 
       | That's a bad argument or a naive and obvious one; depending on
       | how you look at it.
       | 
       | Python wasn't designed for Data Science. It is not a DSL for it.
       | MATLAB was arguably designed for scientific computing, and yet
       | it's the most disliked language in the StackOverflow
       | liked/disliked index.
       | 
       | Here's a different way to look at it. A good programming language
       | is like the weather in a city. I would love to live somewhere
       | where it's 72F/23C all year round. But if it's in the middle of
       | nowhere and I've got no friends to hang out with, would I? I
       | don't think so.
       | 
       | FWIW, Python is like Sweden or Finland, with shitty weather for 6
       | months of the year yet thriving against all odds.
       | 
       | PS: I think the article's topic is a bit click-batey (not a
       | particularly useful discussion) because it's polarizing and no
       | one will be 100% right about it. It's perhaps best thought of as
       | an opinion piece.
        
       | moi2388 wrote:
       | You had me at "Python is not a great language"
        
       | neuropacabra wrote:
       | I expected the author will complain rightfully about the tooling,
       | including linters, formatters and package managers. Things
       | improved drastically over the years with Astral's ruff, uv and
       | alpha stage ty.
       | 
       | But the article says that very exotic syntax is more readable. I
       | think this is mostly about the libraries, where honestly I
       | equally don't like matplotlib and R's ggplot. But I would not
       | think it's language problem.
       | 
       | I was hoping to find some performance benchmarks or something
       | more than feelings about certain block of code. Don't get me
       | wrong I am also not a die hard fan of Python although I have
       | written a lot or production code in it. Mentioning bloated,
       | boilerplate code...I am afraid author should look on Java or any
       | modern JavaScript project.
        
       | BiteCode_dev wrote:
       | Notice how the article load_penguins() example starts neatly
       | after all the messy parts of data science are done and stops
       | right before the next pain starts.
       | 
       | It lives in a sterile, idealized world.
       | 
       | Python is a great language for data science in practice because
       | it turns out data science is also:                  - gluing a
       | lot of data sources             - cleaning up a ton of terribly
       | shaped data             - validation and error handling
       | - I/O, networking, and format conversion             - emboarding
       | non-programmers into programming             - wrapping a lot of
       | compiled languages' libs or plugging system             -
       | prototyping stuff and exposing that prototype to some people
       | - turning prototypes into more permanent projects
       | 
       | And it turns out Python and its ecosystem are good at those while
       | remaining decent at the other things.
       | 
       | There are other languages excellent at some of those, or some of
       | the other things, but rarely good at most. And because humanity
       | is vast, diverse, and constantly renewing, being the second best
       | at those is eventually always winning.
       | 
       | Because whoever you are, you will be annoyed at not having the
       | best experience at task X. But you would be mortified if you had
       | the worst experience at doing task Y and Z. And task X, Y, and Z
       | change depending on who you ask.
       | 
       | And you want to get things done, while days have 24 hours.
       | 
       | As usual, to understand the Python phenomenon, you have to see
       | the whole picture. Not your little corner of the bubble. Not the
       | ideal world in your head either. Life is not a maths problem with
       | a clearly laid out premise and an elegant answer.
       | 
       | That's the same debate about why PHP won the web in 2000 no
       | matter the size of the spaghetti plate, why Windows stayed used
       | for so long despite it being terrible, why people keep using
       | iphones after all the abuses, etc. There is more to it than the
       | use case you have every day. People have needs you don't haven't
       | thought about.
       | 
       | So it's not "let the language war begin". It's, "dude, get more
       | experience, go work with accountants, ngos, govs and logistic
       | chains, go work in china, africa and south america, go from a
       | startup to schools to corporate, satisfy the geeks, the artists
       | and the business people, than we'll talk".
        
       | codeptualize wrote:
       | Wait, so there is one example, which shows the R and Python
       | equivalents are pretty much the same..
       | 
       | I was all hyped up, ready to see the amazing examples and
       | arguments that would convince me to pick up R, and it gave me
       | absolutely nothing (except quotes and brackets..).
       | 
       | Disappointing.
        
       | jakubmazanec wrote:
       | I wish people used Julia more. Few years ago I reimplemented some
       | MATLAB code for a novel algorithm [1] I wanted to use in my
       | dissertation about psychometrics and Julia was great language to
       | work with - and also the code ran for 20 minutes instead of 60.
       | 
       | [1] https://link.springer.com/article/10.1007/s11336-017-9581-x
        
         | maratc wrote:
         | How important was this saving of 40 minutes for the whole
         | timeline of the project of writing your dissertation about
         | psychometrics?
        
           | jakubmazanec wrote:
           | Very important. This was only a simulated dataset, the final
           | analysis would be done on a much larger one (sadly, in the
           | end didn't finish it, because of unrelated reasons). Also,
           | the rewrite didn't take long; the final Julia code was small,
           | few hundreds, or maybe a thousand lines.
        
       | mfld wrote:
       | This really calls for an A/B speed programming test of Python vs.
       | R practitioners.
        
       | HelloNurse wrote:
       | Guess what, doing a relatively complex but standard task
       | (filtering and aggregating example penguins) with a specialized
       | and ossified library (Pandas) is better than doing it
       | "bare.handed" with basic lists and dicts.
       | 
       | More terse, more efficient, less error prone, hopefully more
       | numerically accurate, as if Python had an ecosystem of well
       | designed libraries on par with R.
        
       | sarusso wrote:
       | The main flaw of this article is comparing a general-purpose
       | language built with production systems in mind (Python) with a
       | domain-specific language designed for interactive analysis (R)...
       | Beware of comparing apples and oranges, because productizing R
       | code typically requires rewriting it in another language.
        
       | poulpy123 wrote:
       | But python is a great language for data science. As the anglos
       | say: the proof is in the pudding, and the fact it is massively
       | used for data science prove it is great at data science.
       | 
       | You will say that not everything that is successful is great, and
       | you will be right, but the success of python came organically,
       | and not because of advertisement, de facto monopoly, politics,
       | money, or first-arrived-advantage.
       | 
       | Although there is one cause that isn't intrinsic to python but
       | from the people who built numpy. The fact there is a single
       | numerical library, extremely easy to use, fast and extensive in
       | the whole ecosystem was very very huge
        
       | Pinegulf wrote:
       | Once the data is clean and neatly in standard format this becomes
       | a matter of preference.
       | 
       | Work experience says that 90% of work is gathering, cleaning and
       | transforming data from different sources. In this capacity Python
       | has more options available.
        
       | markkitti wrote:
       | I tried this in Julia with TidierData.jl, and it looks quite
       | similar to the R version.                 using TidierData,
       | DataFrames       using PalmerPenguins: load            penguins =
       | load()            @chain penguins begin         DataFrame
       | @drop_missing(body_mass_g)         @group_by(species, island)
       | @summarize(           body_weight_mean =
       | mean(body_mass_g),           body_weight_std =
       | std(body_mass_g)         )         show(_, allrows=true)
       | end
        
       | ebonnafoux wrote:
       | In the article
       | 
       | > Contrast this with equivalent code that is full of logistics,
       | where I'm using only basic Python language features and no
       | special data wrangling package:                  n = len(values)
       | # Calculate mean        mean = sum(values) / n        # Calculate
       | standard deviation        variance = sum((x - mean) \* 2 for x in
       | values) / (n - 1)        std_dev = math.sqrt(variance)
       | 
       | He doesn' t know about the statistics package in the standart
       | library of Python
       | (https://docs.python.org/3/library/statistics.html). Of course,
       | if you do not know to use Python, you will have a lot of
       | boilerplate.
        
       | analog31 wrote:
       | >>> Without fail, from the students that use Python, the response
       | is: "This will take me a bit. Let me sit down at my desk and
       | figure it out and then I'll be back."
       | 
       | This is completely aside, but I wouldn't hold this against the
       | students or Python. The students may be following an age-old rule
       | of office politics: "Never troubleshoot in front of an audience."
       | And why this is more prevalent among the students who use Python,
       | well... sample size of 30.
        
       | Vaslo wrote:
       | My team has all moved slowly from R to Python. There was no
       | pressure to do so. R has a clunky feel with a bunch of modules
       | that can be a challenge to automate. Python's general purpose use
       | beats whatever superior modules R has all day. If someone wants
       | the same package on Python from R it's probably out there.
       | 
       | While plotting may be clunky, I just don't see r as much better.
       | Plus in 2025 I can just provide a sample of data and what plot I
       | want in an LLM and I get zero shot code of the plot I want.
       | 
       | Author sounds very academic to me.
        
       | orochimaaru wrote:
       | It's not. Julia is better, much better. But Julia came too late.
       | 
       | A lot of data science code is already in Python. That's where
       | it's going to stay because rewriting code is time consuming. My
       | guess is we will continue to improve Python gradually and keep
       | refactoring the code.
        
         | 1gn15 wrote:
         | > It's not. Julia is better, much better. But Julia came too
         | late.
         | 
         | Sounds a lot like "worse is better". Python is the worse
         | option, incomplete and inelegant, but is much more practical
         | due to being there first and receiving the bulk of the
         | attention.
        
         | fithisux wrote:
         | Julia macros are a game changers.
         | 
         | You do not need a DSL.
        
       | hmokiguess wrote:
       | My issue with Python is that it makes it too easy to do things
       | wrong, it accepts all and anyone. It's too inclusive and
       | permissive, which is great for expression and creativity but bad
       | for exact sciences and rigid disciplines. In certain matters
       | opinions and cargo cult programming are often a detriment for
       | science. Unfortunately for high level abstractions it's not that
       | simple to do it right without sacrificing speed, so the industry
       | forces the hand of the community in a lot of ways.
        
       | skeeter2020 wrote:
       | This is not about "Python is not a great language for data
       | science" but the author's expertise and affection for R. I guess
       | that title wouldn't get as many clicks.
        
       | fithisux wrote:
       | Personally I use R for the occasional script or some tidyverse
       | quick processing.
       | 
       | But the language has many rough edges
       | 
       | 1. non standard eval is very weird, rlang fixes these
       | shortcomings 2. unintuitive names or functions not belonging to
       | packages, base has a mix of functions 3. S3 mixes with naming, no
       | problem personally with S3 and S7 is even better, but mixing S3
       | names with ordinary names is unintuitive, keep snake case 4.
       | data.frames are unintuitive, tidyverse fixes this 5. f(a=)
       | seriously? or working with unintuitive functions in body for
       | discrete ranges of function arguments? 6. no imports per file in
       | packages, I can live with this .. still ... 7. AST functions are
       | unintuitive
       | 
       | R has some excellent parts:
       | 
       | non-standard evaluation, AST in the base language, lazy
       | evaluation
       | 
       | but it is being killed by the bad parts
       | 
       | I think all the external fixes and sanity in names should go into
       | base
       | 
       | but it will take a lot of time if it ever happens due to legacy.
       | 
       | Julia fixes many of these not as elegantly as R but it's
       | pragmatic approach is too attractive.
        
       | nyrikki wrote:
       | > Contrast this with equivalent code that is full of logistics,
       | where I'm using only basic Python language features and no
       | special data wrangling package
       | 
       | While I am not a python cheerleader, but a user because the
       | reality is that it is a pretty good glue language, the above is a
       | bit of a problem.
       | 
       | Duckdb, pandas, numpy etc.. is what makes python nice.
       | 
       | About a decade ago I worked at a major BI software company and
       | ran into another silly problem when trying to evangelize R, wikis
       | kbs and search engines don't like single letter search terms.
       | 
       | So it didn't matter how much better R was at the time, people
       | found learning it more difficult than it should have been.
        
       | another_twist wrote:
       | I think TypeScript will shine here. Especially for data output
       | pipelines so we can emit strongly typed datasets.
       | 
       | Also add to the fact that TS based exploratory code can
       | potentially plot SVG via d3 and maybe even exported to a webpage.
        
       | Decabytes wrote:
       | Python pays the bills. If it was up to me I'd use a different
       | language, but there is no denying that its got a strong story in
       | just about every field now. As I've gotten older, I've come to
       | realize that programming languages are vehicles for solving
       | computer based problems, and I've learned to find joy in solving
       | those problems in whatever language my company/project is using.
       | 
       | But in my personal projects, my favorite language to use it Dart.
        
       | prepend wrote:
       | Doesn't need to be great, just needs to be good enough.
        
       | northlondoner wrote:
       | There is a similar tread, regarding life-time of projects, such
       | as which ecosystem is better for long-term maintainability:
       | https://news.ycombinator.com/item?id=46055463
        
       | northlondoner wrote:
       | Has anybody else noticed how much Python took from Scala for type
       | hints? I was using Scala around 2015 and when I see type hints,
       | immediately recognise its similarity to Scala's approach.
        
       | CephalopodMD wrote:
       | Python is the 2nd best language for almost everything
        
         | iLemming wrote:
         | Yup, paradoxically, it's also the 2nd worst language for almost
         | everything.
        
       | knorke wrote:
       | okay, click bait worked on me. but the claims are weak. basically
       | "Python is not a great language... because it's not that of a
       | domain language than R"
       | 
       | mediocre!
        
       ___________________________________________________________________
       (page generated 2025-11-26 23:01 UTC)