[HN Gopher] Python is not a great language for data science
___________________________________________________________________
Python is not a great language for data science
Author : speckx
Score : 320 points
Date : 2025-11-25 16:38 UTC (1 days ago)
(HTM) web link (blog.genesmindsmachines.com)
(TXT) w3m dump (blog.genesmindsmachines.com)
| lenerdenator wrote:
| > I think people way over-index Python as the language for data
| science. It has limitations that I think are quite noteworthy.
| There are many data-science tasks I'd much rather do in R than in
| Python.1 I believe the reason Python is so widely used in data
| science is a historical accident, plus it being sort-of Ok at
| most things, rather than an expression of its inherent
| suitability for data-science work.
|
| Python doesn't need to be the best at any one thing; it just has
| to be serviceable for a lot of things. You can take someone who
| has expertise in a completely different domain in software (web
| dev, devops, sysadmin, etc.) and introduce them to the data
| science domain without making them learn an entirely new language
| and toolchain.
| dmurray wrote:
| That's not why it's used in data science though. Lots of data
| scientists use Python all day and have no concept of ever
| working in a different field.
|
| It's used in data science because it's used in data science.
| mohaine wrote:
| But data science usually isn't an island.
|
| Use whatever you want on your one off personal projects but
| use something more non-data science friendly if you ever want
| your model to run directly in a production workflow.
|
| Productionizing R models is quite painful. The normal way is
| to just rewrite it not in R.
| dmurray wrote:
| I've soured a lot on directly productionizing data science
| code. It's normally an unmaintainable mess.
|
| If you write it in R and then rewrite it in C (better:
| rewrite it in English with the R as helpful annotations,
| then have someone else rewrite it in C), at least there is
| some chance you've thought about the abstractions and
| operations that are actually necessary for your problem.
| lenerdenator wrote:
| That's probably true now, but at one point, they were looking
| for people to start doing data science, and were pulling
| people from other domains.
| vkazanov wrote:
| It's used in data science because no other language has this
| level of library support.
|
| And it got this unprecedented level of support because right
| from the start it made its focus clear syntax and (perceived)
| simplicity.
|
| There is also a sort of cumulative effect from being nice for
| algorithmic work.
|
| Guido's long-term strategy won over numerous other strong
| candidates for this role.
| passivegains wrote:
| I think the key thing not obvious to most data scientists
| is they're not using python because it meets their needs,
| it's because we've failed them. twice.
|
| 1. data scientists _aren 't_ programmers, so why do they
| need a programming language? the tools they should be using
| don't exist. they'd need programmers to make them, and all
| we have to offer is... more programming languages.
|
| 2. the giant problem at the heart of modern software: the
| most important feature of a modern programming language is
| being easy to read and write. this feature is conspicuously
| absent from most important languages.
|
| they're trapped. they can't do what they need without a
| programming language but there are only a handful they can
| possibly use. the real reason python ended up with such
| good library support is they never really had a choice.
| vkazanov wrote:
| When the first scientific libraries were written for
| python, most alternatives didn't even consider being
| readable, or convenient. The choice was more like
| C/Cpp/Fortran vs Python.
|
| And then Python went into a self-reinforcing loop, with
| scientific community coming up with more and more ways to
| improve Python support for the kind of interactive work
| that was required for data analysis. Think ipython ->
| jupyter -> jupyter forks and other python-centric
| notebook systems.
|
| So when data analysis evolved into data science and
| machine learning, gpu-first library vendors already faced
| a crowd of people knowing python.
|
| It is crazy how right now one can utilize 100s of gpus
| through these bits of dirty python wrapped in json.
| aragilar wrote:
| I think you're forgetting perl (plus other unix utils)
| and matlab. PDL (perl data language) was a thing, as was
| IDL (and other similar tools).
| bsder wrote:
| Partially, but it's _also_ because 90% of your work in "data
| science" isn't direct analysis.
|
| You need to get the data from somewhere. Do you need to
| scrape that because Python is okay at scraping? Oh, after its
| scraped, we looked at it and it's in
| ObtuseBinaryFormat0.0.LOL.Beta and, what do you know,
| somebody wrote a converter for that for Python. And we need
| to clean all the broken entries out of that and Python is
| decent at that. etc.
|
| The trick is that while Python may or may not be anybody's
| first choice for a particular task, Python is an okay second
| or third choice for most tasks.
|
| So, you can learn Python. Or you learn <best language> and
| <something else>. And if <something else> is Python, was
| <best language> sufficiently better than Python to be worth
| spending the time learning?
| forgotpwd16 wrote:
| Article is well written but fails to address its own thesis by
| postponing it to a sequel article. At its current state only
| alludes that Python is not great because requires specialized
| packages. (And counterexample is R for which also used a
| package.)
| stevenpetryk wrote:
| Totally agree. The author's most significant example is two
| code snippets that are quite similar and both pretty nice.
| puzzlingcaptcha wrote:
| The 'sequel' is also online:
| https://blog.genesmindsmachines.com/p/python-is-not-a-great-...
| forgotpwd16 wrote:
| Thanks! In such serial articles usually there's link to the
| end pointing to the next one so, since there wasn't any,
| thought next one hadn't been written. This one indeed
| addresses the thesis. The TL;DR, taken directly from the
| article,
|
| >The core problems I see with Python as a language for data
| science are call-by-reference semantics, lack of built-in
| concepts of missing values, lack of built-in vectorization,
| and lack of non-standard evaluation.
| yeahwhatever10 wrote:
| A little late for this
| ASalazarMX wrote:
| "Not great" doesn't necessarily mean "bad", it can be
| interpreted as "good", or even "very good". An honest title
| would have explicitly qualified how suitable the author found
| it was.
|
| That the author avoided saying Python was a bad language
| outright speaks a great deal of its suitability. Well, that,
| and the majority data science in practice.
| iLemming wrote:
| From many practical points, Clojure is great for data. And you
| can even leverage python libs via clj-python.
| phforms wrote:
| In the past few years I have seen some serious efforts from the
| Clojure community to make Clojure more attractive for data
| science. Check out the Scicloj[1] group and their data science
| stack/toolkit Noj[2] (still in beta) as well as the high-
| performance tabular data processing library tech.ml.dataset
| (TMD)[3].
|
| - [1] https://scicloj.github.io
|
| - [2] https://scicloj.github.io/noj
|
| - [3] https://github.com/techascent/tech.ml.dataset
| geokon wrote:
| What's worth emphasizing is that you're not marrying in to an
| ecosystem of libs. There are a lot of separate pieces that
| you can typically use separately. I do climate data work
| without most of Scicloj's tools, but I do use tech.ml.dataset
| extensively
| paulfharrison wrote:
| R is so good in part because of the efforts of people like Di
| Cook, Hadley Wickham, and Yihui Xie to create an software
| environment that they like working in.
|
| It also helps that in R any function can completely change how
| its arguments are evaluated, allowing the tidyverse packages to
| do things like evaluate arguments in the context of a data frame
| or add a pipe operator as a new language feature. This is a very
| dangerous feature to put in the hands of statisticians, but it
| allows more syntactic innovation than is possible in Python.
| cb321 wrote:
| Like Python, R is a 2 (+...) language system. C/Fortran
| backends are needed for performance as problems scale up.
|
| Julia and Nim [1] are dynamic and static approaches
| (respectively) to 1 language systems. They both have both user-
| defined operators and macros. Personally, I find the surface
| syntax of Julia rather distasteful and I also don't live in
| PLang REPLs / emacs all day long. Of course, neither Julia nor
| Nim are impractical enough to make calling C/Fortran all that
| hard, but the communities do tend to implement in the new
| language without much prompting.
|
| [1] https://nim-lang.org/
| Lyngbakr wrote:
| I was a bit disappointed to discover that this was essentially an
| R vs. Python article, which is a data science trope. I've been in
| the field for 20+ years now and while I used to be firmly on team
| R, I now think that we don't really have a good language for data
| science. I had high hopes for Julia and even Clojure's data
| landscape looks interesting, but given the momentum of Python I
| don't see how it could be usurped at this point.
| vkazanov wrote:
| It is EVERYWHERE. I recently had to interview a bunch of data
| scientists, and only one of them knew SQL. Surely, all of then
| worked with python. I bet none of them even heard of R.
| Lyngbakr wrote:
| Yikes. Were they experienced data scientists or straight out
| of school? I find it very odd (and a bit scary) that they
| didn't know SQL.
| garciasn wrote:
| Experienced Data Scientists and/or those straight out of
| school are EXTREMELY lacking in valuable SQL experience and
| always have been. Take a DS with 25 years experience in
| SAS, _many of them_ are great with DATAstep, but have far
| less experience using PROC SQL for querying the data in the
| most effective way--even if they were pulling the data down
| with pass-through via SAS /ACCESS.
|
| Often they'd be doing very simplistic querying and then
| manipulating via DATAstep prior to running whatever
| modeling and/or reporting PROCs later, rather than pushing
| it upstream into a far faster native database SQL pull via
| pass-through.
|
| Back in 2008/2009, I saved 30h+ runtime on a regular report
| by refactoring everything in SQL via pass-through as
| opposed to the data scientists' original code that simply
| pulled the data down from the external source and
| manipulated it in DATAstep. Moving from 30h to 3m (Oracle
| backend) freed up an entire FTE to do more than babysit a
| long-running job 3x a week to multiple times per day.
| garciasn wrote:
| SAS > R > Python.
|
| The focus of SAS and R were primarily limited to data
| science-related fields; however, Python is a far more generic
| programming language, thus the number of folks exposed to it
| is wider and thus the hiring pool of those who come in
| exposed to Python is FAR LARGER than SAS/R ever were, even
| when SAS was actively taught/utilized in
| undergraduate/graduate programs.
|
| As a hiring leader in the Data Science and Engineering space,
| I have extensive experience with all of these + SQL, among
| others. Hiring has become much easier to go cross-field/post-
| secondary experience and find capable folks who can hit the
| ground running.
| username135 wrote:
| you beat me to it. i understand why sas gets hate but I
| think that comes with simply not understanding how powerful
| it is.
| garciasn wrote:
| It was a great language, but it was/is extremely cost-
| prohibitive plus it simply fell out of favor in academia,
| for many of the same reasons, and thus was supplanted by
| free alternatives.
| SiempreViernes wrote:
| What would it even mean to be a "good language for data
| science"?
|
| In the first place data science is more a label someone put on
| bag full of cats, rather than a vast field covered by similarly
| sized boxes.
| username135 wrote:
| SAS has entered the chat
| RobinL wrote:
| I think a lot of this comes down to the question: Why aren't
| tables first class citizens in programming languages?
|
| If you step back, it's kind of weird that there's no mainstream
| programming language that has tables as first class citizens.
| Instead, we're stuck learning multiple APIs (polars, pandas)
| which are effectively programming languages for tables.
|
| R is perhaps the closest, because it has data.frame as a 'first
| class citizen', but most people don't seem to use it, and use
| e.g. tibbles from dplyr instead.
|
| The root cause seems to be that we still haven't figured out the
| best language to use to manipulate tabular data yet (i.e. the way
| of expressing this). It feels like there's been some convergence
| on some common ideas. Polars is kindof similar to dplyr. But no
| standard, except perhaps SQL.
|
| FWIW, I agree that Python is not great, but I think it's also
| true R is not great. I don't agree with the specific comparisons
| in the piece.
| jna_sh wrote:
| I know the primary data structure in Lua is called a table, but
| I'm not very familiar with them and if they map to what's
| expected from tables in data science.
| TheSoftwareGuy wrote:
| IIRC those are basically hash tables, which are first-class
| citizens in many languages already
| Jtsummers wrote:
| Lua's tables are associative arrays, at least fundamentally.
| There's more to it than that, but it's not the same as the
| tables/data frames people are using with pandas and similar
| systems. You could build that kind of framework on _top_ of
| Lua 's tables, though.
|
| https://www.lua.org/pil/2.5.html
| kelipso wrote:
| People use data.table in R too (my favorite among those but
| it's been a few years). data.table compared to dplyr is quite a
| contrast in terms of language to manipulate tabular data.
| CivBase wrote:
| What is a table other than an array of structs?
| RobinL wrote:
| I would argue that's about how the data is stored. What I'm
| trying to express is the idea of the programming language
| itself supporting high level tabular
| abstractions/transformations such as grouping, aggregation,
| joins and so on.
| CivBase wrote:
| Ah, that makes more sense. Thanks for the clarification.
| camdenreslink wrote:
| Sounds a lot like LINQ in .NET (which is usually compatible
| with ORMs actually querying tables).
| p1necone wrote:
| Implementing all of those things is an order of magnitude
| more complex than any other first class primitive datatype
| in most languages, and there's no obvious "one right way"
| to do it that would fit everyones use cases - seems like
| libraries and standalone databases are the way to do it,
| and that's what we do now.
| pjc50 wrote:
| Yeah, that's LINQ+EF. People have hated ORMs for so long
| (with some justification) that perhaps they've forgotten
| what the use case is.
|
| (and yes there's special language support for LINQ so it
| counts as "part of the language" rather than "a library")
| redwall_hp wrote:
| Map/filter/reduce are idiomatic Java/Kotlin/Scala.
|
| SELECT thing1, thing2 FROM things WHERE thing2 != 2;
|
| val thingMap = things.map { it.thing2 to it.thing2 }.filter
| { it.thing2 !=2 }
|
| Then you've got distinct(), sorting methods, take/drop for
| limits, count/sumOf/average/minOf/maxOf.
|
| There are set operations, so you can do unions and
| differences, check for presence, etc.
|
| Joins are the hard part, but map() and some lambda work can
| pull it off.
| thom wrote:
| It's not that you can't model data that way (or indeed with
| structs of arrays), it's just that the user experience starts
| to suck. You might want a dataset bigger than RAM, or that
| you can transparently back by the filesystem, RAM or VRAM.
| You might want to efficiently index and query the data. You
| might want to dynamically join and project the data with
| other arrays of structs. You might want to know when you're
| multiplying data of the wrong shapes together. You might want
| really excellent reflection support. All of this is obviously
| possible in current languages because that's where it
| happens, but it could definitely be easier and feel more of a
| first class citizen.
| ModernMech wrote:
| The difference is semantics.
|
| What is a paragraph but an array of sentences? What is a
| sentence but an array of words? What's a word but an array of
| letters? You can do this all the way down. Eventually you
| need to assign meaning to things, and when you do, it helps
| to know _what_ the thing actually is, specifically, because
| an array of structs can be many things that aren 't a table.
| FridgeSeal wrote:
| Well it could be a struct of arrays.
|
| Nitpicking aside, a nice library for doing "table stuff"
| without "the whole ass big table framework" would be nice.
|
| It's not hard to roll this stuff by hand, but again, a nicer
| way wouldn't be bad.
| kevinhanson wrote:
| this is my biggest complaint about SAS--everything is either a
| table or text.
|
| most procs use tables as both input and output, and you better
| hope the tables have the correct columns.
|
| you want a loop? you either get an implicit loop over rows in a
| table, write something using syscalls on each row in a table,
| or you're writing macros (all text).
| nextos wrote:
| I don't think this is the real problem. In R and Julia tables
| are great, and they are libraries. The key is that these
| languages are very expressive and malleable.
|
| Simplifying a lot, R is heavily inspired by Scheme, with some
| lazy evaluation added on top. Julia is another take at the
| design space first explored by Dylan.
| Iwan-Zotow wrote:
| R was clone of S
| paddleon wrote:
| > R is perhaps the closest, because it has data.frame as a
| 'first class citizen', but most people don't seem to use it,
| and use e.g. tibbles from dplyr instead.
|
| You're forgetting R's data.table, https://cran.r-project.org/we
| b/packages/data.table/vignettes...,
|
| which is amazing. Tibbles only wins because they fought the
| docs/onboarding battle better, and dplyr ended up getting
| industry buy-in.
| elehack wrote:
| And readability. data.table is very capable, but the
| incantations to use it are far less obvious (both for reading
| and writing) than dplyr.
|
| But you can have the best of both worlds with
| https://dtplyr.tidyverse.org/, using data.table's performance
| improvements with dplyr syntax.
| extr wrote:
| Yeah data.table is just about the best-in-class tool/package
| for true high-throughput "live" data analysis. Dplyr is great
| if you are learning the ropes, or want to write something
| that your colleagues with less experience can easily spot
| check. But in my experience if you chat with people working
| in the trenches of banks, lenders, insurance companies, who
| are running hundreds of hand-spun crosstabs/correlational
| analyses daily, you will find a lot of data.table users.
|
| Relevant to the author's point, Python is pretty poor for
| this kind of thing. Pandas is a perf mess. Polars, duckdb,
| dask etc, are fine perhaps for production data pipelines but
| quite verbose and persnickety for rapid iteration. If you put
| a gun to my head and told me to find some nuggets of insight
| in some massive flat files, I would ask for an RStudio cloud
| instance + data.table hosted on a VM with 256GB+ of RAM.
| RodgerTheGreat wrote:
| There are a number of dynamic languages to choose from where
| tables/dataframes are truly first-class datatypes: perhaps most
| notably Q[0]. There are also emerging languages like Rye[1] or
| my own Lil[2].
|
| I suspect that in the fullness of time, mainstream languages
| will eventually fully incorporate tabular programming in much
| the same way they have slowly absorbed a variety of idioms
| traditionally seen as part of functional programming, like
| map/filter/reduce on collections.
|
| [0]
| https://en.wikipedia.org/wiki/Q_(programming_language_from_K...
|
| [1] https://ryelang.org/blog/posts/comparing_tables_to_python/
|
| [2] http://beyondloom.com/tools/trylil.html
| middayc wrote:
| Another page about Rye tables:
| https://ryelang.org/cookbook/working-with/tables/
| liveranga wrote:
| Nushell is another one with tables built-in:
|
| https://www.nushell.sh/book/working_with_tables.html
| middayc wrote:
| It's interesting how often there are similarities between
| Numshell, Rye and Lil, although I think they are from
| different influences. I guess it's sort of current
| zeitgeist if you want something light, high level and
| interactive.
| mncharity wrote:
| Interesting links - tnx. Apropos the optimism of
| "eventually", I think of language support for say key-value
| pair collections, namespaces, as still quite impoverished.
| With each language supporting only a small subset of the
| concision, apis, and datastructures, found useful in some
| other. This some 3 decades after becoming mainstream, and the
| core of multiple mainstream languages. Diminishing returns,
| silos, segregation of application domains, divergence of
| paradigm/orientation/idioms, assorted dysfunctions as a
| field, etc... "eventually" can be decades. Maybe LLMs can
| quicken that... or perhaps call an end to this era,
| permitting a "no, we collectively just never got around to
| creating any one language which supported all of {X}".
| constantcrying wrote:
| >Why aren't tables first class citizens in programming
| languages?
|
| Matlab has them, in fact it has multiple competing concepts of
| it.
| alexnewman wrote:
| APL Is great
| 7thaccount wrote:
| Perfect solution for doing analysis on tables. Wes McKinney
| (inventor of pandas is rumored to have been inspired by it
| too).
|
| My problem with APL is 1.) the syntax is less amazing at
| other more mundane stuff, and 2.) the only production worthy
| versions are all commercial. I'm not creating something that
| requires me to pay for a development license as well as
| distribution royalties.
| smartmic wrote:
| Agreed. I once used it for data preparation for a data
| science project (GNU APL). After a steep learning curve, it
| felt very much like writing math formulas -- it was fun and
| concise, and I liked it very much. However, it has zero
| adoption in today's data science landscape. Sharing your work
| is basically impossible. If you're doing something just for
| yourself, though, I would probably give it a chance again.
| 127 wrote:
| Because there's no obvious universal optimal data structure for
| heterogeneous N-dimensional data with varying distributions?
| You can definitely do that, but it requires an order of
| magnitude more resource use as baseline.
| riskassessment wrote:
| > R is perhaps the closest, because it has data.frame as a
| 'first class citizen', but most people don't seem to use it,
| and use e.g. tibbles from dplyr instead.
|
| Everyone in R uses data.frame because tibble (and data.table)
| inherits from data.frame. This means that "first class" (base
| R) functions work directly on tibble/data.table. It also makes
| it trivial to convert between tibble, data.table, and
| data.frames.
| ModernMech wrote:
| It makes sense from a historical perspective. Tables _are_ a
| thing in many languages, just not the ones that mainstream devs
| use. In fact, if you rank programming languages by usage
| outside of devs, the top languages _all_ have a table-ish
| metaphor (SQL, Excel, R, Matlab).
|
| The languages devs use are largely Algol derived. Algol is a
| language that was used to express algorithms, which were
| largely abstractions over Turing machines, which are based
| around an infinite 1D tape of memory. This model of 1D memory
| was built into early computers, and early operating systems and
| early languages. We call it "mechanical sympathy".
|
| Meanwhile, other languages at the same time were invented that
| weren't tied so closely to the machine, but were more for the
| purpose of doing science and math. They didn't care as much
| about this 1D view of the world. Early languages like Fortran
| and Matlab had notions of 2D data matrices because math and
| science had notions of 2D data matrices. Languages like C were
| happy to support these things by using an array of pointers
| because that mapped nicely to their data model.
|
| The same thing can be said for 1-based and 0-based indexing --
| languages like Matlab, R, and Excel are 1-based because that's
| how people index tables; whereas languages like C and Java are
| 0-based because that's how people index memory.
| cb321 wrote:
| As a slight refinement of your point, C does have storage map
| based N-D arrays/tensors like Fortran, just with the old
| column-major/row-major difference and a clunky "multiple
| [][]" syntax. There was just a restriction early on to need
| compile-time known dimensions to the arrays (up to the final
| dimension, anyway) because it was a somewhat half-done/half-
| supported thing - and because that _also_ fit the linear data
| model well. So, it is also common to see char *argv[] like
| arrays of pointers or in numerics sometimes libraries which
| do their own storage map equations from passed dimensions.
|
| Also, the linear memory model itself is not really _only_
| because of Algol /Turing machines/theoretical CS/"early"
| hardware and mechanical sympathy. DRAM has rows & columns
| internally, but byte addressability leads to hiding that from
| HW client systems (unless someone is doing a rowhammer attack
| or something). More random access than tape rewind/fast
| forward is indeed a huge deal, but I think the actual
| popularity of linearity just comes from its simplicity as an
| interface more than anything else. E.g.s, segmented x86
| memory with near/far pointers was considered ugly relative to
| a big 32-bit address space and disk files and other
| allocation arenas have internally a large linear address/seek
| spaces. People just want to defer using >1 number until they
| really need to. People learn univariate-X before they learn
| multivariate-X where X could be calculus, statistics, etc.,
| etc.
| IgorPartola wrote:
| SQL is not just about a table but multiple tables and their
| relationships. If it was just about running queries against a
| single table then basic ordering, filtering, aggregation, and
| annotation would be easy to achieve in almost any language.
|
| Soon as you start doing things like joins, it gets complicated
| but in theory you could do something like an API of an ORM to
| do most things. With using just operators you quickly run into
| the fact that you have to overload (abuse) operators or write a
| new language with different operator semantics:
| orders * customers | (customers.id == orders.customer_id |
| orders.amount > Decimal('10.00')
|
| Where * means cross product/outer join and | means filter. Once
| you add an ordering operator, a group by, etc. you basically
| get SQL with extra steps.
|
| But it would be nice to have it built in so talking to a
| database would be a bit more native.
| sgarland wrote:
| Every time I see stuff like this (Google's new SQL-ish
| language with pipes comes to mind), I am baffled. SQL to me
| is eminently readable, and flows beautifully.
|
| For reference, I think the same is true of Python, so it's
| not like I'm a Perl wizard or something.
| IgorPartola wrote:
| Oh I agree. The problem is that they are two different
| languages. Inside a Python file, SQL is just a string. No
| syntax highlighting, no compile time checking, etc. A
| Kwisatz Haderach of languages that incorporates both its
| own language and SQL as first class concepts would be very
| nice but the problem is that SQL is just too different.
|
| For one thing, SQL is not really meant to be dynamically
| constructed in SQL. But we often need to dynamically
| construct a query (for example customer applied several
| filters to the product listing). The SQL way to handle that
| would be to have a general purpose query with a thousand
| if/elses or stored procedures which I think takes it from
| "flows beautifully" to "oh god who wrote this?" Or you
| could just do string concatenation in a language that
| handles that well, like Python. Then wrap the whole thing
| in functions and objects and you get an ORM.
|
| I still have not seen a language that incorporates anything
| like SQL into it that would allow for even basic ORM-like
| functionality.
| kelipso wrote:
| Are you thinking of query generators like Ecto in Elixir?
| riidom wrote:
| PyTorch was first only Torch, and in Lua. I didn't follow it
| too close at the time, but apparently due to popular demand it
| got redone in Python and voila PyTorch.
| RA_Fisher wrote:
| R's the best, bc it's been a statistical analysis language from
| the beginning in 1974 (and was built and developed for the
| purpose of analysis / modeling). Also, the tidyverse is
| marvelous. It provides major productivity in organizing and
| augmenting the data. Then there's ggplot, the undisputed best
| graphical visualization system + built-ins like barplot(), or
| plot().
|
| But ultimately data analysis is going beyond Python and R into
| the realm of Stan and PyMC3, probabilistic programming
| languages. It's because we want to do nested integrals and
| those software ecosystems provide the best way to do it (among
| other probabilistic programming languages). They allow us to
| understand complex situations and make good / valuable
| decisions.
| OkayPhysicist wrote:
| There's a number of structures that I think are missing in our
| major programming languages. Tables are one. Matrices are
| another. Graphs, and relatedly, state machines are tools that
| are grossly underused because of bad language-level support.
| Finally, not a structure per se, but I think most languages
| that are batteries-included enough to included a regex engine
| should have a a full-fledged PEG parsing engines. Most, if not
| all, Regex horror stories derive from a simple "Regex is built
| in".
|
| What tools are _easily_ available in a language, by default,
| shape the pretty path, and by extension, the entire feel of the
| language. An example that we 've largely come around on is key-
| value stores. Today, they're table stakes for a standard
| library. Go back to 90's, and the most popular languages _at
| best_ treated them as second-class citizens, more like imported
| objects than something fundamental like arrays. Sure, you can
| implement a hash map in any language, or import some else 's
| implementation, but oftentimes you'll instead end up with
| nightmarish, hopefully-synchronized arrays, because those are
| built-in, and ready at hand.
| throwaway2037 wrote:
| > There's a number of structures that I think are missing in
| our major programming languages. Tables are one. Matrices are
| another.
|
| I disagree. Most programmers will go their entire career and
| never need a matrix data structure. Sure, they will use
| libraries that use matrices, but never use them directly
| themselves. It seems fine that matrices are not a separate
| data type in most modern programming languages.
| OkayPhysicist wrote:
| Unless you think "most programmers" === "shitty webapp
| developers", I strongly disagree. Matrices are first class,
| important components in statistics, data analysis,
| graphics, video games, scientific computing, simulation,
| artificial intelligence and so, so much more.
|
| And all of those programmers are either using specialized
| languages, (suffering problems when they want to turn their
| program into a shitty web app, for example), or committing
| crimes against syntax like
|
| rotation_matrix.matmul(vectorized_cat)
| lock1 wrote:
| That's needlessly aggressive. Ignoring webapps, you could
| do gamedev without even knowing what a matrix is.
|
| You don't even need such construction in most native
| applications, embedded systems, and OS kernel
| development.
| theamk wrote:
| I am working in embedded. Had to optimize weights for an
| embedded algorithm, decided to use linear regression and
| thus needed matrices.
|
| And if you do robotics, the chances of encountering a
| matrix are very high.
| throwaway2037 wrote:
| This is my exactly point. Even in a highly specialised
| library for pricing securities, the amount of code that
| uses matrices is surprisingly small.
| voidUpdate wrote:
| To be fair, I do use matrices a reasonable amount in
| gamedev. And if you're writing your engine from scratch,
| rather than using something like unity, you will almost
| certainly need matrices
| habinero wrote:
| I don't see why the majority of engineers need to cater
| to your niche use cases. It's a programming language, you
| can just make the library if it doesn't exist. Nobody's
| stopping you.
|
| Plus, plenty of third party projects have been
| incorporated into the Python standard library.
| Koshkin wrote:
| At least in C++ you don't need 'matmul'
| jltsiren wrote:
| When there is no clear canonical way of implementing
| something, adding it to a programming language (or a standard
| library) is risky. All too often, you realize too late that
| you made a wrong choice, and then you add a second version.
| And a third. And so on. And then you end up with a confusing
| language full of newbie traps.
|
| Graphs are a good example, as they are a large family of
| related structures. For example, are the edges undirected,
| directed, or something more exotic? Do the nodes/edges have
| identifiers and/or labels? Are all nodes/edges of the same
| type, or are there multiple types? Can you have duplicate
| edges between the same nodes? Does that depend on the types
| of the nodes/edges, or on the labels?
| WorldMaker wrote:
| Even the raw storage for graphs doesn't have just one
| answer: you could store edge lists or you could store
| adjacency matrixes. Some algorithms work better with one,
| some work better with the other. You probably don't want to
| store both because that can be extra memory overhead as
| well as a locking problem if you need to atomically update
| both at once. You probably don't want to automatically flip
| back and forth between representations because that could
| cause garbage collector churn if not also long breadth or
| depth searches, and you may not want to encourage manual
| conversions between data structures either (to avoid
| providing a performance footgun to your users). So you
| probably want the edge list Graph type and the adjacency
| matrix Graph type to look very different, even though (they
| are trivially convertible they may be expensive to convert
| as mentioned), and yeah that's the under-the-hood storage
| mechanism. From there you get into possible exponential
| explosion as you start to get into the other higher level
| distinctions between types of graphs (DAGs versus Trees
| versus cyclic structures and so forth, and all the
| variations on what a node can be, if edges can be weighted
| or labeled, etc).
| HelloNurse wrote:
| > I think most languages that are batteries-included enough
| to included a regex engine should have a a full-fledged PEG
| parsing engines
|
| Then there would be more PEG horror stories. In addition,
| string and indices in regex processing are universal, while a
| parser is necessarily more framework-like, far more complex
| and doomed to be mismatched for many applications.
| fluorinerocket wrote:
| Would love to see a language in which hierarchical state
| machines, math/linear algebra, I/O to sensors and actuators,
| and time/timing were first class citizens.
|
| Mainly for programming control systems for robotics and
| aerospace applications
| dm319 wrote:
| Dplyr is quite happy with data.frame. R is built around tabular
| data. Other statistical languages are too, such as Stata.
| getnormality wrote:
| Saying that SQL is the standard for manipulating tabular data
| is like saying that COBOL is the standard for financial
| transactions. It may be true based on current usage, but nobody
| thinks it's a good idea long term. They're both based on the
| outdated idea that a programming language should look like
| pidgin English rather than math.
| Iwan-Zotow wrote:
| In R data.table is basically SQL in another shape
| maest wrote:
| > Why aren't tables first class citizens in programming
| languages?
|
| They are in q/kdb and it's glorious. Sql expressions are also
| first class citizens and it makes it very pleasant to write
| code
| don-bright wrote:
| Every copy of Microsoft Excel includes Power Query which is in
| the M language and has tables as a type. Programs are
| essentially transformations of table columns and rows. Not sure
| if its mainstream but is widely available. M language is also
| included in other tools like PowerBI and Power Automate.
| m_mueller wrote:
| Fortran gives you that and more, it has first class
| multidimensional arrays, including matrix operations.
| genidoi wrote:
| This is an interesting observation. One possible explanation
| for a lack of robust first class table manipulation support in
| mainstream languages could be due to the large variance in
| real-world table sizes and the mutually exclusive subproblems
| that come with each respective jump in order-of-magnitude row
| size.
|
| The problems that one might encounter in dealing with a 1m row
| table are quite different to a 1b row table, and a 1b row table
| is a rounding error compared to the problems that a 1t row
| table presents. A standard library needs to support these
| massive variations at least somewhat gracefully and that's not
| a trivial API surface to design.
| brikym wrote:
| This. I really really want some kind of data frame which has
| actual compile time typing my LSP/IDE can understand. Kusto
| query language (Azure Data Explorer) has it and the auto
| completion and error checking is extremely useful. But kusto
| query language is really just limited to one cloud product.
| poulpy123 wrote:
| > Why aren't tables first class citizens in programming
| languages?
|
| Because they were created by before the need for it and maybe
| before their invention.
|
| Manipulating numeric arrays and matrices in python is a bit
| clunky because it was not designed as a scientific computing
| language so they were added as library. It's much more
| integrated and natural to use in scientific computer languages
| such as matlab. However the reverse is also true: because
| matlab wasn't designed to do what python does, it's a bit
| clunkier to use outside scientific computing
| jstanley wrote:
| Tables were definitely around before programming languages.
|
| There are clay tablets from ancient Sumeria that represent
| information using tables.
| HenriTEL wrote:
| Well you nailed it, the language you're looking for is SQL.
| There's a reason why duckdb got such traction over the last
| years. I think data scientists overlook SQL and Excel like
| tooling.
| RobinL wrote:
| Out of the current options, I strongly agree - I even wrote a
| blog post! https://www.robinlinacre.com/recommend_sql/
|
| But on the other hand, that's doesn't mean SQL is ideal - far
| from it. When using DuckDB with Python, to make things more
| succinct, reusable and maintainable, I often fall into the
| pattern of writing Python functions that generate SQL
| strings.
|
| But that hints at the drawbacks of SQL: it's mostly not
| composable as a language (compared to general purpose
| languages with first-class abstractions). DuckDB syntax does
| improve on this a little, but I think it's mostly fundamental
| to SQL. All I'm saying is that it feels like something better
| is possible.
| hermitcrab wrote:
| There are a number of data-focussed no-code/visual/drag-and-
| drop tools where data tables/frames are very much a first class
| citizen (e.g. Easy Data Transform, Alteryx, Knime).
| SubjectToChange wrote:
| Mathematica recently added the Tabular command, for what it's
| worth. I haven't used it much yet, but it seems to be quite
| capable.
| timbit42 wrote:
| The 3rd edition of Dartmouth BASIC, back in the 1960's, had a
| MAT command for dealing with matrices.
| serjester wrote:
| Seems like their critique boils down to two areas - pandas
| limitations and fewer built ins to lean on.
|
| Personally I've found polars has solved most of the "ugly"
| problems that I had with pandas. It's way faster, has an
| ergonomic API, seamless pandas interop and amazing support for
| custom extensions. We have to keep in mind Pandas is almost 20
| years old now.
|
| I will agree that Shiny is an amazing package, but I would argue
| it's less important now that LLMs will write most of your code.
| kasperset wrote:
| R data science people generally come to data science field from
| life science or stats field. Python data science people generally
| originate from other fields that are mostly engineering focused.
| Again this may not apply to all the cases but that is my general
| observation.
|
| Recently I am seeing that Python is heavily pushed for all data
| science related things. Sometimes objectively Python may not be
| the best option especially for stats. It is hard to change
| something after it becomes the "norm" regardless of its
| usability.
| exabrial wrote:
| The problem is there's so much momentum behind it that's hard to
| course correct. PyTorch is now a goliath.
| jakobnissen wrote:
| Excellent article - except that the author probably should have
| gated their substantiation of the claim behind a cliffhanger, as
| other commenters have mentioned.
|
| The author's priorities are sensible, and indeed with that set of
| priorities, it makes sense to end up near R. However, they're not
| universal among data scientists. I've been a data scientist for
| eight years, and have found that this kind of plotting and
| dataframe wrangling is only part of the work. I find there is
| usually also some file juggling, parsing, and what the author
| calls "logistics". And R is terrible at logistics. It's also bad
| at writing maintainable software.
|
| If you care more about logistics and maintenance, your conclusion
| is pushed towards Python - which still does okay in the
| dataframes department. If you're ALSO frequently concerned about
| speed, you're pushed towards Julia.
|
| None of these are wrong priorities. I wish Julia was better at
| being R, but it isn't, and it's very hard to be both R and useful
| for general programming.
|
| Edit: Oh, and I should mention: I also teach and supervise
| students, and I KEEP seeing students use pandas to solve non-
| table problems, like trying to represent a graph as a dataframe.
| Apparently some people are heavily drawn to use dataframes for
| everything - if you're one of those people, reevaluate your
| tools, but also, R is probably for you.
| ActorNightly wrote:
| >Excellent article
|
| Except its not. Data science in python pretty much requires you
| to use numpy. So his example of mean/variance code is a dumb
| comparison. Numpy has mean and variance functions built in for
| arrays.
|
| Even when using raw python in his example, some syntax can be
| condesed quite a bit:
|
| groups = defaultdict(list) [groups[(row['species'],
| row['island'])].append(row['body_mass_g']) for row in filtered]
|
| It takes the same amount of mental effort to learn python/numpy
| as it does with R. The difference is, the former allows you to
| integrate your code into any other applicaiton.
| ModernMech wrote:
| I dunno. Numpy has its own data types, its own collections,
| its own semantics which are all different enough from Python,
| I think it's fair to consider it a DSL on its own. It'd be
| one thing if it was just, operator overloading to provide
| broadcasting for python, but Numpy's whole existence is to
| patch the various shortcomings Python has in DS.
| dragonwriter wrote:
| > Numpy has mean and variance functions built in for arrays.
|
| Even outside of Numpy, the stdlib has the statistics packages
| which provides mean, variance, population/sample standard
| deviation, and other statistics functions for normal
| iterables. The attempt to make Python out-of-the-box code
| look bad was either deliberately constructed to exaggerate
| the problems complained of, or was the product of a very
| convenient ignorance of the applicable parts of Python and
| its stdlib.
| a_bonobo wrote:
| >I find there is usually also some file juggling, parsing,
| [...]
|
| I'd say I'm 50/50 Python/R for exactly this reason: I write
| Python code on HPC or a server to parse many, many files, then
| I get some kind of MB-scale summary data I analyse locally in
| R.
|
| R is _not good_ at looping over hundreds of files in the
| gigabytes, Python is _not good_ at making pretty insights from
| the summary. A tool for every task.
| puzzlingcaptcha wrote:
| The second part of the article is right here:
| https://blog.genesmindsmachines.com/p/python-is-not-a-great-...
| whyenot wrote:
| What makes Python a great language for data science, is that so
| many people are familiar with it, and that it is an easy language
| to read. If you use a more obscure language like Clojure, Common
| Lisp, Julia, etc., many people will not be familiar with the
| language and unable to read or review your code. Peer review is
| fundamental to the scientific endeavor. If you only optimize on
| what is the best language for the _task_ , there are clearly
| better languages than Python. If you optimize on what is best for
| _science_ then I think it is hard not to argue that Python (and
| R) are the best choices. In science, just getting things done is
| not enough. Other people need to be able to read and understand
| what you are doing.
|
| BTW AI is not helping and in fact is leading to a generation of
| scientists who know how to write prompts, but do not understand
| the code those prompts generate or have the ability to peer
| review it.
| iLemming wrote:
| I can't speak for Julia - never used it; never used Common Lisp
| for analyzing data (I don't think it's very "data-oriented" for
| the modern age and the shape of data), but Clojure is really
| not "obscure" - it only looks weird for the first fifteen
| minutes or so; once you start using it - it is one of the most
| straightforward and reasonable languages out there - it is in
| fact simpler than Python and Javascript. Immutable-by-default
| makes it far much easier to reason about the code. And OMG, it
| is so much more data-oriented - it's crazy that more people
| don't use it. Most never even heard about it.
| MarsIronPI wrote:
| Common Lisp fan here, but not a data scientist. Why do you
| say to avoid CL for data analysis? Not trying to flame or
| anything, just curious about your experience with it.
| iLemming wrote:
| I don't have great experience of using CL for analyzing
| data, because of "why?", if I already have another Lisp
| that is simply amazing for data.
|
| Clojure, unlike lists in traditional Lisps, based on
| composable, unified abstraction for its collections, they
| are lazy by default and literal readable data structures,
| they are far easier to introspect and not so "opaque"
| compared to anything - not just CL (even Python), they are
| superb for dealing with heterogeneous data. Clojure's
| cohesive data manipulation story is where Common Lisp's
| lists-and-symbols just can't match.
| dreamcompiler wrote:
| Homework assignments notwithstanding, very few serious
| Common Lisp programs use lists and symbols as their
| primary data structures. This has been true since around
| 1985.
|
| Common Lisp has O[1] vectors, multidimensional arrays,
| hash-tables (what Clojure calls maps), structs, and
| objects. It has set operations too but it doesn't enforce
| membership uniqueness. It also has bignums, several sizes
| of floats, infinite-precision rationals, and complex
| numbers. Not to mention characters, strings, and logical
| operations on individual bits. The main difference from
| Clojure is that CL data structures are not immutable. But
| that's an orthogonal issue to the suggestion that CL
| doesn't contain a rich library of modern data structures.
|
| Common Lisp has never been limited to "List Processing."
| iLemming wrote:
| I wasn't trying to denigrate Common Lisp, I'm sorry if I
| hurt your feelings. It does have comprehensive support
| for all kinds of data structures. I wasn't talking it
| being limited to "list processing". SBCL is great for
| many things, but from many practical points Clojure
| actually much better suited for data analysis.
|
| You're saying: "hash-tables (what Clojure calls maps)"
| not only inaccurate, you're hand-waving Clojure's core
| design philosophy (immutability, structural sharing, lazy
| sequences) as orthogonal. But those aren't cosmetic
| differences - they're the reason why Clojure's data
| structures are fundamentally better for data analysis. I
| think you're confusing "having equivalent data types"
| with "solving the same problem the same way"
| 7thaccount wrote:
| I tried to get into Clojure, but a lot of the JVM hosted
| languages require some Java experience. Same thing with Scala
| and Kotlin or F# on .NET.
|
| The early tooling was also pretty dependent on Vim or Emacs.
| Maybe it's all easier now with VSCode or something like that.
| iLemming wrote:
| None of this even remotely true. I've gotten into Clojure
| without knowing jackshit about Java, almost ten years
| later, after tons of things successfully built and
| deployed, still don't know jackshit about Java. Mia, co-
| host of 'Clojure apropos' podcast was my colleague, we've
| worked together on multiple teams, she learned Clojure as
| her very first PL. Later she tried learning some Java and
| she was shocked how impossibly weird it looked compared to
| Clojure. Besides, you can use Clojure without any JVM -
| e.g., with nbb. I use it for things like browser automation
| with Playwright.
|
| The tooling story is also very solid - I use Emacs, but
| many of my friends and colleagues use IntelliJ, Vim,
| Sublime and VSCode, and some of them migrated to it from
| Atom.
| 7thaccount wrote:
| It might not be a problem for you, but it has been for
| many. I did start by reading through 3 Clojure books. The
| repl and the basic stuff like using lists is all easy of
| course, but the tooling was pretty poor compared to what
| I was used to (I like lisp, but Emacs is a commitment).
| Also, a lot of tutorials at the time definitely assumed
| java familiarity, especially with debugging java stack
| traces.
| iLemming wrote:
| > It might not be a problem for you, but it has been for
| many
|
| Do you have a habit of referring to yourself in plural,
| or do you typically like to generalize things based on
| your personal experiences?
|
| I personally know many Clojurists who never had problems
| you're describing - hundreds of people. Sure, that could
| be the case of survivorship bias, perhaps I just don't
| befriend people who struggled with getting into Clojure
| specifically in a way you're describing. But like they
| say: "Those who are willing to make the effort will find
| the solutions. Those who aren't will find the excuses."
|
| Clojure undeniably had challenges in the past, and still
| has some today. But not the things you're talking about.
| This is literally not an exaggeration - it's as easy as
| installing Calva extention for VSCode - that's all one
| needs to mess around with Clojure.
| geokon wrote:
| It doesn't require any Java but the docs do at times sort
| of assume you understand the JVM to some extent - which was
| a bit frustrating when first learning the language. It'll
| use terms like "classpath" without explaining what that is.
| However nowadays with LLMs these are insignificant
| speedbumps.
|
| If you want to use Java you also don't really need to know
| Java beyond "you create instances of classes and call
| methods on them". I really don't want to learn a dinosaur
| like Java, but having access to the universe of Java libs
| has saved me many times. It's super fun and nice to use and
| poke around mature Java libs interactively with a REPL :)
|
| All that said I'd have no idea how to write even a
| helloworld in Java
|
| PS: Agreed on Emacs. I love Emacs.. but it's for turbo
| nerds. Having to learn Emacs and Clojure in parallel was a
| crazy barrier. (and no, Emacs is not as easy people make it
| out to be)
| hekkle wrote:
| > What makes Python a great language for data science, is that
| so many people are familiar with it
|
| While I agree with you in principal this also leads to what I
| call the "VB Effect". Back in the day VB was taught at every
| school as part of the standard curriculum. This made every kid
| a 'computer wizz'. I have had to fix many a legacy codebase
| that was started by someone's nephew the whizz kid.
| aethor wrote:
| Peer review is fundamental to scientific endeavor but... in ML
| fields, reviewers almost never check the code and Python
| package management is hardly reproducible. So clearly we are
| not there, Python or not.
| flexagoon wrote:
| That's ok, I don't think _anyone_ knows how to properly write
| Julia. After using it for a while and following the community
| (watching talks, checking the forum etc), I don 't think it has
| a concept of code quality. You just throw random code at the
| wall until it starts working. Which makes sense, considering
| most of the users are scientists.
| huherto wrote:
| For what is worth. The Kotlin folks have been adding some cool
| features and tools for data analysis.
| https://kotlinlang.org/docs/data-analysis-overview.html
| niemandhier wrote:
| Python is just a language that:
|
| 1. Is easy to read
|
| 2. Was easy to extend in languages that people who work with
| scientific data happen to like.
|
| When I did my masters we hacked around in the numpy source and
| contributed here and there while doing astrophysics.
|
| Stuff existed in Java and R, but we had learned C in the first
| semester and python was easier to read and contrary to MATLAB
| numpy did not need a license.
|
| When data science came into the picture, the field was full of
| physicists that had done similar things. They brought their tools
| as did others.
| jillesvangurp wrote:
| The main feature of Python is that it is approachable by people
| who have never programmed before. They might have a vague
| notion of wanting to instruct a computer to first do this and
| then do that. Imperative programming is their starting point.
| And Python delivers that. It was designed as a scripting
| language whose primary use indeed was to script together other
| things. It always was good at that and that was the main thing
| it was used for in the nineties.
|
| It got popular once Linux distributions started relying on a
| lot of python scripts (e.g. Red Hat and Debian). As a side
| effect it was present on a lot of Linux and Unix systems early
| on. Scientists in the early 2000s and late nineties had access
| to workstations running Linux and Unix. So, Python was simply
| the approachable thing that was just there already.
|
| And because it's so easy, there are lots of people getting into
| Python. So it got its own dynamic of generations of researchers
| in all sorts of fields knowing about Python being the goto
| thing to reach for. It never really was the best at anything it
| does. That wasn't even a goal. It's a bit slow. A bit
| verbose/clumsy compared to some of the alternatives that some
| data scientists prefer. It lacks a lot of features other
| languages have. Etc. This doesn't matter because it is simple
| and easy. The type of users that are new to programming are
| looking for something simple that they can understand. Not the
| platonic ideal of a language that mathematicians or computer
| scientists might prefer.
|
| Python is the modern equivalent of BASIC which had this role
| before python was created. It wasn't that amazing. But early
| home computers had it as part of their OS. E.g. the Commodore
| 64 that was my first computer had an interactive Basic shell
| with the ability to load games from a tape as the main OS
| experience.
| jswelker wrote:
| Inherited Python code is a mixed bag. Inherited R code is a
| nightmare.
| mushufasa wrote:
| Languages inherently have network effects; most people around the
| world learn English so they can talk with other professionals who
| also know English, not because they are passionate about Charles
| Dickens.
|
| My take (and my own experience) is that python won because the
| rest of the team knows it. I prefer R but our web developers
| don't know it, and it's way better for me to write code that the
| rest of our team can review, extend, and maintain.
| pacbard wrote:
| When you think about a data science pipeline, you really have
| three separate steps:
|
| [Data Preparation] --> [Data Analysis] --> [Result Preparation]
|
| Neither Python or R does a good job at all of these.
|
| The original article seems to focus on challenges in using Python
| for data preparation/processing, mostly pointing out challenges
| with Pandas and "raw" Python code for data processing.
|
| This could be solved by switching to something like duckdb and
| SQL to process data.
|
| As far as data analysis, both Python and R have their own niches,
| depending on field. Similarly, there are other specialized
| languages (e.g., SAS, Matlab) that are still used for domain-
| specific applications.
|
| I personally find result preparation somewhat difficult in both
| Python and R. Stargazer is ok for exporting regression tables but
| it's not really that great. Graphing is probably better in R
| within the ggplot universe (I'm aware of the python port).
| huherto wrote:
| Isn't the author saying that Python + Pandas is almost as good as
| R, but Python without Pandas is less powerful than R.
|
| I can't help to conclude that Python is as good as R because I
| still have the choice of using Pandas when I need it. What did I
| get wrong?
| paddleon wrote:
| you missed the "almost as" in your first sentence.
|
| also, we didn't define "good".
| programmertote wrote:
| Disclaimer: I have nothing against R or Python and I'm not
| partial to either.
|
| Python, the language itself, might not be a great language for
| data science. BUT the author can use Pandas or Polars or another
| data-science-related library/framework in Python to get the job
| done that s/he was trying to write in R. I could read both her R
| and Pandas code snippets and understand them equally.
|
| This article reads just like, "Hey, I'm cooking everything by
| making all ingredients from scratch and see how difficult it
| is!".
| NuSkooler wrote:
| You could end it with "Python is not a great language".
|
| Now, is Python a SUCCESSFUL language? Very.
| rdtsc wrote:
| They basically advocate using R. I think it depends what they
| mean by "data science" and if the person will be doing just data
| science. If that's the case then R may be better. As in their
| whole career is going to built on that domain. But let's say they
| are on a general computer science track, now they'll probably
| benefit from learning Python more than R, simply because they can
| use it for other purposes.
|
| > Either way, I'll not discuss it further here. I'll also not
| consider proprietary languages such as Matlab or Mathematica, or
| fairly obscure languages lacking a wide ecosystem of useful
| packages, such as Octave.
|
| I feel, to most programming folks R is in the same category. R is
| to them what Octave is to the author. R is nice nice, but do they
| really want to learn a "niche" language, even if it has better
| some features than Python? Is holding a whole new paradigm,
| syntax, library ecosystem in your head worth it?
| solatic wrote:
| Shell is the best language for data science. Pick the best tools
| for each of getting data, cleaning data, transforming data, and
| visualizing data, then stitch them together by sheer virtue of
| the fact that text is the universal interoperable protocol and
| files are the universal way of saving intermediate stages of
| data.
|
| Best part is, write a --help, and you can load them into LLMs as
| tools to help the LLMs figure it out for you.
|
| Fight me.
| xn wrote:
| redo[1] with shell scripts has become my goto method of dealing
| with multi-step data problems. It makes it easy to review each
| step of data retrieval, clean-up, transformation, etc.
|
| I use mlr, sqlite, rye, souffle, and goawk in the shell
| scripts, and visidata to interactively review the intermediate
| files.
|
| 1. https://redo.readthedocs.io/en/latest/
| spicybbq wrote:
| Part 2 is here:
|
| https://blog.genesmindsmachines.com/p/python-is-not-a-great-...
| drchaim wrote:
| Python was a great language for data science, when data science
| become a mainstream thing.
|
| it was easy to think about the structures (iterators) it was easy
| to extend. it had a good community.
|
| And for that, people start extending it via libraries.
|
| There are plenty more alternatives now.
| thom wrote:
| I think this expectation that data science code is a thing you
| write basically top to bottom to get some answers out, put them
| in a graph and move on with your life is not a useful lens
| through which to evaluate two programming languages. R definitely
| is an efficient DSL for doing stats this way, but it's a painful
| way to build a durable piece of software. Python is nowhere near
| perfect but I've seen fewer codebases that made my eyes bleed,
| however pretty the graphs might look.
| amai wrote:
| The example would better be written in SQL. So according to the
| author that would make SQL a great language for data science. SQL
| also supports tables natively. This conclusion is of course
| ridiculous and shows the shallow reasoning in this article.
| constantcrying wrote:
| Python is also an embarrassingly bad language for numerics. It
| comes without support for different floating point types does not
| have an n-D Array data type and is extremely slow.
|
| At the same time it is an absolute necessity to know if you are
| doing numerics. What this shows, at least to me, is that it is
| "good enough" and that the million integrations, examples and
| pieces of documentation matter more than whether the
| peculiarities of the language work in favor of its given use
| case, as long as the shortcomings can be mostly addressed.
| slashdave wrote:
| Native python is hopeless for numerics, which is why just about
| everyone just uses numpy, which solves all of these issues. Of
| course, a separate package. But the strength of python is that
| it can fairly seamlessly incorporate these kinds of packages
| that add core capabilities. Another important example: pytorch.
| kbr2000 wrote:
| https://en.wikipedia.org/wiki/Ousterhout's_dichotomy
| coolThingsFirst wrote:
| Python just has poor aesthetics. __init__(self) is unacceptable
| in a language in 2025. Ruby would've been a much better choice.
| Sloppiness in language design is just a bad idea.
| stOneskull wrote:
| there's @dataclass in 2025
| semiinfinitely wrote:
| correct, its only the best on that we have
| keeeba wrote:
| As a fairly extensive user of both Python and R, I net out
| similarly.
|
| If I want to wrangle, explore, or visualise data I'll always
| reach for R.
|
| If I want to build ML/DL models or work with LLM's I will usually
| reach for Python.
|
| Often in the same document - nowadays this is very easy with
| Quarto.
| Joel_Mckay wrote:
| Python has a list of issues fundamentally broken in the
| language, and relies heavily on integrated library bindings to
| operate at reasonable speeds/accuracy.
|
| Julia allows embedding both R and Python code, and has some
| very nice tools for drilling down into datasets:
|
| https://www.queryverse.org/
|
| It is the first language I've seen in decades that reduces
| entire paradigms into single character syntax, often
| outperforming both C and Numpy in many cases. =3
| pphysch wrote:
| Deeply ironic for a Julia proponent to smear a popular
| language as "fundamentally broken" without evidence.
|
| https://yuri.is/not-julia/
| Joel_Mckay wrote:
| Python threading and computational errata issues go back a
| long time. It is a popular integration "glue" language, but
| is built on SWiG wrappers to work around its many
| unresolved/unsolvable problems.
|
| Not a "smear", but rather a well known limitation of the
| language. Perhaps your environment context works
| differently than mine.
|
| It is bizarre people get emotionally invested in something
| so trivial and mundane. Julia is at v1.12.2 so YMMV, but
| Queryverse is a lot of fun =3
| kelipso wrote:
| This is like one of those people posting Dijkstra's letter
| advocating for 0-based indexing without ever having read or
| understood what they posted.
| pphysch wrote:
| What does indexing syntax have to do with Julia having a
| rough history of correctness bugs and footguns?
| Joel_Mckay wrote:
| Sure, all software is terrible if looking at bug
| frequency history...
|
| https://github.com/python/cpython/issues
|
| Griefers ranting about years old _closed_ tickets on
| v1.0.5 versions on a blog as some sort of proof of
| lameness... is a poorly structured argument. Julia
| includes regression testing features built into even its
| plotting library output, and thus issues usually stay
| resolved due to pedantic reproducibility. Also, running
| sanity-checks in any llvm language code is usually wise.
|
| Best of luck =3
| pphysch wrote:
| Just saying, "other languages have bug reports" is a
| exceptionally poor way to promote Julia =3
| Joel_Mckay wrote:
| To be blunt: Moores law is now effectively dead, and
| chasing the monolithic philosophy with lazy monads will
| eventually limit your options.
|
| Languages like Julia trivially handle conditional
| parallelism much more cleanly with the broadcast
| operator, and transparent remote host process instancing
| over ssh (still needs a lot of work to reach OTP like
| cluster functionality.)
|
| Much like Go, library resources ported into the native
| language quietly moves devs away from the same polyglot
| issues that hit Python.
|
| Best of luck. =3
| IshKebab wrote:
| Python's not a great language for anything. Maybe for teaching
| programming I guess (except then you end up with people that only
| know Python).
| aorist wrote:
| > Examples include converting boxplots into violins or vice
| versa, turning a line plot into a heatmap, plotting a density
| estimate instead of a histogram, performing a computation on
| ranked data values instead of raw data values, and so on.
|
| Most of this is not about Python, it's about matplotlib. If you
| want the admittedly very thoughtful design of ggplot in Python,
| use plotnine
|
| > I would consider the R code to be slightly easier to read
| (notice how many quotes and brackets the Python code needs)
|
| This isn't about Python, it's about the tidyverse. The reason you
| can use this simpler syntax in R is because it's non-standard-
| evaluation allows packages to extend the syntax in a way Python
| does not expose: http://adv-r.had.co.nz/Computing-on-the-
| language.html
| npalli wrote:
| Python is nothing without it's batteries.
| jskherman wrote:
| Python _is_ its batteries.
| pphysch wrote:
| The design and success of e.g. Golang is pretty strong
| support for the idea that you can't and shouldn't separate a
| language from its broader ecosystem of tooling and packages.
| LtWorf wrote:
| The success of python is due to not needing a broader
| ecosystem for A LOT of things.
|
| They are of course now abandoning this idea.
| lmm wrote:
| > The success of python is due to not needing a broader
| ecosystem for A LOT of things.
|
| I honestly think that was a coincidence. Perl and Ruby
| had other disadvantages, Python won despite having bad
| package management and a bloated standard library, not
| because of it.
| rjzzleep wrote:
| It's because Ruby captured the web market and Python
| everything else, and I get everything is more timeless
| than a single segment.
| vkazanov wrote:
| Ruby was _competing_ on the web market and lost to many
| others, including Python. In part, because python had a
| much broader ecosystem, and php had wide adoption through
| wordpress and others, and javascript was expanding from
| browsers.
| procaryote wrote:
| The bloated standard library is the only reason I kept
| using python in spite of the packaging nightmare. I can
| do most things with no dependencies, or with one
| dependency I need over and over like matplotlib
|
| If python had been lean and needed packages to do
| anything useful, while still having a packaging
| nightmare, it would have been unusable
| lmm wrote:
| Well, sure, but equally I think there would have been a
| lot more effort to fix the packaging nightmare if it had
| been more urgent.
| ModernMech wrote:
| There was a massive effort though, the proliferation of
| several different package managers is evidence of that.
| LtWorf wrote:
| The bloated standard library is the reason why you can
| send around a single .py file to others and they can
| execute it instantly.
|
| Most of the python users are not able nor aware of venv,
| uv, pip and all of that.
| 1vuio0pswjnm7 wrote:
| What language is used to write the batteries
| logicprog wrote:
| C/C++, in large part
| saboot wrote:
| And below that, FORTRAN :)
| JPKab wrote:
| These days it's a whole lot of Rust.
| volemo wrote:
| These days it's still a whole lot of Fortran, with some
| Rust sprinkled on top. (:
| pjmlp wrote:
| Which since Fortran 2003, or even Fortran 95, has gotten
| rather nice to use.
| Koshkin wrote:
| IDK it's become too verbose IMHO, looks almost like COBOL
| now. (I think it was Fortran 66 that was the last Fortran
| true to its nature as a "Formula Translator"...)
| pjmlp wrote:
| We are way beyond comparing languages to COBOL, now that
| plenty folks type whole book sized descriptions into tiny
| chat windows for their AI overloads.
| throwaway2037 wrote:
| I hear this so much from Python people -- almost like they
| are paid by the word to say it. Is it different from Perl,
| Ruby, Java, or C# (DotNet)? Not in my experience, except
| people from those communities don't repeat that phrase so
| much.
|
| The irony here: We are talking about data science. 98% of
| "data science" Python projects start by creating a virtual
| env and adding Pandas and NumPy which have numerous (really:
| squillions of) dependencies outside the foundation library.
| m55au wrote:
| Someone correct me if I'm completely wrong, but by default
| (i.e. precompiled wheels) numpy has 0 dependencies and
| pandas has 5, one of which is numpy. So not really
| "squillions" of dependencies.
|
| pandas==2.3.3
|
| +-- numpy [required: >=1.22.4, installed: 2.2.6]
|
| +-- python-dateutil [required: >=2.8.2, installed:
| 2.9.0.post0]
|
| | +-- six [required: >=1.5, installed: 1.17.0]
|
| +-- pytz [required: >=2020.1, installed: 2025.2]
|
| +-- tzdata [required: >=2022.7, installed: 2025.2]
| noitpmeder wrote:
| I don't know about _squillions_, but numpy definitely has
| _requirements_, even if they're not represented as such
| in the python graph.
|
| e.g.
| https://github.com/numpy/numpy/blob/main/.gitmodules
| (some source code requirements)
| https://github.com/numpy/numpy/tree/main/requirements
| (mostly build/ci/... requirements) ...
| m55au wrote:
| They're not represented, because those are build-time
| dependencies. Most users when they do pip install numpy
| or equivalent, just get the precompiled binaries and none
| of those get installed. And even if you compile it
| yourself, you still don't need those for running numpy.
| nonameiguess wrote:
| Read https://numpy.org/devdocs/building/blas_lapack.html.
|
| NumPy _will_ fall back to internal and very slow BLAS and
| LAPACK implementations if your system does not have a
| better one, but assuming you 're using NumPy for its
| performance and not just the convenience of adding array
| programming features to Python, you're really gonna want
| better ones, and what that is heavily depends on the
| computer you're using.
|
| This isn't really a Python thing, though. It's a hard
| problem to solve with any kind of scientific computing.
| If you insist on using a dynamic interpreted language,
| which you probably have to do for exploratory interactive
| analysis, and you still need speed over large datasets,
| you're gonna need to have a native FFI and link against
| native libraries. Thanks to standardization, you'll have
| many choices and which is fastest depends heavily on your
| hardware setup.
| aragilar wrote:
| The wheels will most likely come with openblas, so while
| you can get the original blas (which is really only slow
| by comparison, for small tasks it's likely users won't
| notice), this is generally not an issue.
| dm319 wrote:
| > This isn't about Python, it's about the tidyverse.
|
| > it's non-standard-evaluation allows packages to extend the
| syntax in a way Python does not expose
|
| Well this is a fundamental difference between Python and R.
| debtta wrote:
| The point is that the ability to extend the syntax of R leads
| to chaos and mess (in general) but when used correctly and
| effectively in the tidyverse, improves the experience of
| writing and reading code.
| robot-wrangler wrote:
| >> I would consider the R code to be slightly easier to read
| (notice how many quotes and brackets the Python code needs)
|
| Oh god no, do people write R like that, pipes at the end?
| Elixir style pipe-operators at the beginning is the way.
|
| And if you really wanted to "improve" readability by confusing
| arguments/functions/vars just to omit quotes, python can do
| that, you'll just need a wrapper object and getattr hacks to
| get from `my_magic_strings.foo` -> `'foo'`. As for the
| brackets.. ok that's a legitimate improvement, but again not
| language related, it's library API design for function sigs.
| rtaylorgarlock wrote:
| Upvoted for pipes at the beginning
| medstrom wrote:
| IIRC, putting pipe operator `|>` at end of line prevents the
| expression from terminating early. Otherwise the newline
| would terminate it.
| tmtvl wrote:
| The right way is putting the pipe operator at the beginning
| of the expression. (-> (gather-some-data)
| (map 'Vector #'some-functor) (filter #'some-
| predicate) (reduce #'some-gatherer))
|
| Or for those who have an irrational fear of brackets:
| -> gather-some-data map 'Vector #'some-
| functor filter #'some-predicate reduce
| #'some-gatherer
| evolighting wrote:
| R is more of a statistical software than a programming
| language. So, if you are a so-called "statistician," then R
| will feel familiar to you
| UniverseHacker wrote:
| No, R is a serious general purpose programming language that
| is great for building almost any type of complex scientific
| software with. Projects like Bioconductor are a good example.
| evolighting wrote:
| Perhaps a in a context of comparison with Python?
|
| In my limited experience, Using R feels like to using
| JavaScript in the browser: it's a platform heavily focused
| on advanced, feature-rich objects (such as DataFrames and
| specialized plot objects). but you could also just build
| almost anything with it.
| blubber wrote:
| No, it's not. Even established packages have bugs caused by
| R weirdness. I like it nevertheless.
| Cosi1125 wrote:
| Care to give some examples?
| northlondoner wrote:
| Yes, R is a proper general purpose programming language.
| Turing complete, functional, procedural, object
| oriented.../
| steine65 wrote:
| Just in case someone reads this far and sees blubber's
| confident "No." Blubber is definitely wrong here. I used
| to do all of my programming in R. Throw the question into
| an LLM if you're wondering if R has a package like ___ in
| python.
| getnormality wrote:
| It's not about Python, it's about how R lets you do something
| Python can't?
| isolli wrote:
| Or seaborn. It was built exactly for this purpose: abstracting
| some of the annoying kinks of matplotlib while still offering a
| rich set of features.
|
| https://seaborn.pydata.org/tutorial/introduction.html
| jampekka wrote:
| I wonder what the last example of "logistics without libraries"
| would look like in R. Based on my experience of having to do
| "low-level" R, it's gonna be a true horror show.
|
| In R it's often that things for which there's a ready made
| libraries and recipes are easy, but when those don't exist,
| things become extremely hard. And the usual approach is that if
| something is not easy with a library recipe, it just is not
| done.
| m000 wrote:
| The way you describe it, can we say that R was AI-first
| without even knowing?
| nerdponx wrote:
| R is overtly and heavily inspired by Lisp which was a big
| deal in AI at one point. They knew what they were doing.
| debtta wrote:
| Python: easy things are easy, hard things are hard.
|
| R: easy things are hard, hard things are easy.
| blubber wrote:
| "The reason you can use this simpler syntax in R is because
| it's non-standard-evaluation ..."
|
| So it actually is about Python vs R.
|
| That said, while this kind of non-standard evaluation is nice
| when working interactively on the command line, I don't think
| it's that relevant when writing code for more elaborated
| analyses. In that context, I'd actually see this as a
| disadvantage of R because you suddenly have to jump through
| loops to make trivial things work with that non-standard
| evaluation.
| _Wintermute wrote:
| The increasing prevalence of non-standard evaluation in R
| packages was one of the major reasons I switched from R to
| python for my work. The amount of ceremony and constant API
| changes just to have something as an argument in a function
| drove me mad.
| disgruntledphd2 wrote:
| > nd constant API changes
|
| Yeah, this was so very very painful. I once ended up
| maintaining a library that basically used all the different
| NSE approaches, which was not very much fun at all.
| drnick1 wrote:
| I suppose it depends on what exactly is meant by "data science."
| If find that for stochastic simulations, C++ and the Eigen
| library are unbeatable. You get the readability of high-level
| code with the performance of low-level code thanks to the "zero-
| cost abstractions" of Eigen.
|
| If by data science you mean loading data to memory and running
| canned routines for regression, classification and other
| problems, then Python is great and mostly calls C/FORTRAN
| binaries under the hood, so Python itself has relatively little
| overhead.
| KaiserPro wrote:
| The observation I make here is in that first python example with
| the penguins, what the fuck is that?
|
| It makes it look like perl, on a bad day, or worse autogenerated
| javascript.
|
| Why on earth is it so many levels deep in objects?
| johnea wrote:
| They could have just left the last three words off of that title
| 8-/
|
| Python is not a great language
|
| First, the white space requirements are a bad flashback to 1970s
| fortran.
|
| Second, it is the language that is least compatible with itself.
| taeric wrote:
| I'm heavily inclined to agree with the general thought, but I
| balk at the low level code showing why a language is bad at
| something. In this specific case, without the tidyverse, R isn't
| exactly peaches and cream.
|
| As annoying as it is to admit it, python is a great language for
| data science almost strictly because it has so many people doing
| data science with it. The popularity is, itself, a benefit.
| progval wrote:
| The pure Python code in the last example is more verbose than it
| needs to be. groups = {} for row in
| filtered: key = (row['species'], row['island'])
| if key not in groups: groups[key] = []
| groups[key].append(row['body_mass_g'])
|
| can be rewritten as: groups =
| collections.defaultdict(list) for row in filtered:
| groups[(row['species'],
| row['island'])].append(row['body_mass_g'])
|
| and variance = sum((x - mean) ** 2 for x in
| values) / (n - 1) std_dev = math.sqrt(variance)
|
| as: std_dev = statistics.stddev(values)
| ashdev wrote:
| Disagree.
|
| In the first instance, the original code is readable and tells
| me exactly what's what. In your example, you're sacrificing
| readability for being clever.
|
| Clear code(even if verbose) is better than being clever.
| billyoyo wrote:
| Using a very common utility in the standard library is to
| avoid reinventing the wheel is not "clean code"?
|
| defaultdict is ubiquitous in modern python, and is far from a
| complicated concept to grasp.
| ux266478 wrote:
| I don't think that's the right metaphor to use here, it
| exists at a different level than what I would consider
| "reinventing the wheel". That to me is more some attempt to
| make a novel outward-facing facet of the program when
| there's not much reason to do so. For example,
| reimplementing shared memory using a custom kernel driver
| as your IPC mechanism, despite it not doing anything that
| shared memory doesn't already do.
|
| The difference between the examples is so trivial I'm not
| really sure why the parent comment felt compelled to
| complain.
| MarsIronPI wrote:
| I think code clarity is subjective. I find the second easier
| to read because I have to look at less code. When I read
| code, I instinctively take it apart and see how it fits
| together, so I have no problem with the second approach.
| Whereas the first approach is twice as long so it takes me
| roughly twice as long to read.
| explodes wrote:
| The 2nd version is the most idiomatic.
| pphysch wrote:
| I would keep the explicit key= assignment since it's more
| than just a single literal but otherwise the second version
| is more idiomatic and readable.
| ashdev wrote:
| Interesting! Thanks for the responses. I'm not python native
| and haven't worked as extensively with python as some of you
| here.
|
| That said, I'll change my mind here and agree on using std
| library, but I'd still have separate 'key' assignment here
| for more clarity.
| freehorse wrote:
| Imo, if you read such code the first time, you may prefer the
| first. If you read it for the 20th time, you may prefer the
| second. Once you understand what you are doing, often one
| prefers more concise syntax that helps in handling complexity
| within a larger project. But it can seem a bit "too clever"
| in the beginning.
| rkomorn wrote:
| This happened to me with comprehensions in python, and with
| JS' love for anonymous/arrow functions.
|
| Once you get used to a language's "quirks" (so long as
| they're considered idiomatic), they no longer feel quirky,
| and it's usually pretty quick.
| freehorse wrote:
| You get to the same point with non-considered idiomatic
| syntax also, the only problem being that it will be only
| you who understands it.
| rkomorn wrote:
| Only so long as you keep the habit going.
|
| I've definitely written some things that I came back to
| much later and had to relearn (which is somewhere between
| embarrassing and humbling).
| roadside_picnic wrote:
| > (n - 1)
|
| It's also funny that one would write their own standard
| deviation function and _include_ Bessel 's correction. Usually
| if I'm manually re-implementing a standard deviation function
| it's because I'm afraid the implementors blindly applied the
| correction without considering whether or not it's actually
| meaningful for the given analysis. At the very least, the
| correct name for what's implemented there should really be
| `sample_std_dev`.
| m55au wrote:
| It is sadly really inconsistent. The stdlib statistics has
| two separate functions, stdev for sample and pstdev for
| population. Numpy and pandas both have .std() with ddof
| (delta degrees of freedom) as a parameter, but numpy defaults
| to 0 (population) and pandas to 1 (sample).
| gcbirzan wrote:
| There's also itertools.groupby, maybe not much shorter (need to
| define the keyfunc, sort, then iterate), but it does make the
| intent obvious.
| shevy-java wrote:
| > I think people way over-index Python as the language for data
| science. It has limitations that I think are quite noteworthy.
| There are many data-science tasks I'd much rather do in R than in
| Python.
|
| R is kind of a super-specialized language. Python is much more
| general purpose.
|
| R failed to evolve, let's be honest. Python won via jupyter - I
| see this used ALL the time in universities. R is used too, but
| mostly for statistics related courses only, give or take.
|
| Perhaps R is better for its niche, but Python has more momentum
| and in thus, dominates over R. That's simply the reality of the
| situation. It is like the bulldozer moving forward, at a fast
| speed.
|
| > I say "This is great, but could you quickly plot the data in
| this other way?"
|
| Ok so ... he would have to adjust R code too, right? And finding
| good info on that is simply harder. He says he has experience
| with universities. Well, I do too, and my experience is that
| people are WAY better with python than with R. You simply see
| that more students will drop out from R than from python. That's
| also simply the reality of the situation.
|
| > They appear to be sufficiently cumbersome or confusing that
| requests that I think should be trivial frequently are not.
|
| I am sure the reverse also applies. Pick some python library, do
| something awesome, then tell the R students to do the same. I bet
| he will have the same problems.
|
| > So many times, I felt that things that would be just a few
| lines of simple R code turned out to be quite a bit longer and
| fairly convoluted.
|
| Ok, so here he is trolling. Flat out - I said it.
|
| I wrote a LOT of python and quite a bit of R. There is no way in
| life that the R code is more succinct than the python code for
| about 90% of the use cases out there. Sorry, that's simply not
| the case. R is more verbose.
|
| > Here is the relevant code in R, using the tidyverse approach:
| penguins |> filter(!is.na(body_mass_g)) |>
| group_by(species, island) |> summarize(
|
| This is like perl. They also don't adapt. R is going to lose
| grounds.
|
| This professor just hasn't realised that he is slowly becoming a
| fossil himself, by being unable to see that x is better than y.
| oivey wrote:
| > R failed to evolve, let's be honest. Python won via jupyter
|
| Ju = Julia Pyt = Python Er = R
|
| R is not only supported in Jupyter, it was there from the
| start. I've never written a single line of R. It is bizarre how
| little people know about their tools.
| aragilar wrote:
| But it used to be iPython (and the notebook interface did
| come out when it was still iPython).
| oivey wrote:
| Yeah. The extra language support is partially why they
| renamed it.
| rob_c wrote:
| Refuses to learn tool so tool is broken... There is no problem
| with python for this. If you hate boiler plate job the club, get
| llms to generate it for you and move on to doing real work (or
| get involved in improving the language or libraries directly)
| hekkle wrote:
| For those who thought the article was TL;DR, the author argues.
|
| - A General programming language like Python is good enough for
| data science but isn't specifically designed for it.
|
| - A language that is specifically designed for Data Science like
| R is better at Data Science.
|
| Who would have thought?
| fnord77 wrote:
| Python is not a great language
| hekkle wrote:
| Not great at what?
|
| I agree that Python is not great at anything specifically, but
| it is good at almost everything, and that's what makes it
| great.
| psunavy03 wrote:
| I'm not sure what that last example is meant to be other than an
| anti-Python caricature. If you're implementing calculating things
| like standard deviations by hand, that's not real-world coding,
| that's the undergraduate harassment package which should end with
| a STEM bachelor's.
|
| Of course there's a bunch of loops and things; you're exposing
| what has to happen in both R and Python under the hood of all
| those packages.
| roadside_picnic wrote:
| > that's not real-world coding
|
| It's pretty clear the post is focused on the context of work
| being done in an academic research lab. In that context I think
| most of the points are pretty valid, but most of the real world
| benefit I've experience from using Python is being able to work
| more closely with engineering (even on non-Python teams).
|
| I shipped R code to a production environment once over my
| career and it felt incredibly fragile.
|
| R is _great_ for EDA, but really doesn 't work well for
| iteratively building larger software projects. R is has a great
| package system, but it's not so great when you need abstraction
| in between.
| SubiculumCode wrote:
| Yeah, to me, R has never really been a.language I'd choose to
| program with...it's a statistical powerhouse to analyze
| datasets with great packages / SOTA statistical methods, etc,
| not a roduction tool.
| aussieguy1234 wrote:
| I felt forced to use python when I gave langgraph agents a go.
|
| Worked quite well, but the TS/JS langgraph version is way behind.
| React agents are just a few lines of code, compared to 50 odd
| lines for the same thing in JS/TS.
|
| Better to use a different language, even one i'm not familiar
| with, to be able to maintain a few lines of code vs 50 lines.
| yodsanklai wrote:
| Maybe R is fine for people who use it all the time? but as SWE
| that occasionally needs to do some data analysis, I find it much
| easier to rely on tools I know rather than R. R is pretty
| convoluted as a language.
| dragonwriter wrote:
| The bare python/stdlib example used (as well as bare python and
| avoiding add-on data science oriented libraries not being the way
| most people would use python for data science) is just...bad?
| (And, by bad here I mean showing signs of deliberately avoiding
| stdlib features in order to increase the appearance of the things
| the author then complains about.)
|
| A better stdlib-only version would be: from
| palmerpenguins import load_penguins import math
| from itertools import groupby from statistics import
| fmean, stdev penguins = load_penguins()
| # Convert DataFrame to list of dictionaries penguins_list
| = penguins.to_dict('records') # create key function
| for grouping/sorting by species/island def key_func(x):
| return x['species'], x['island'] # Filter out rows
| where body_mass_g is missing and sort by species and island
| filtered = sorted((row for row in penguins_list if not
| math.isnan(row['body_mass_g'])), key=key_func) #
| Group by species and island groups = groupby(filtered,
| key=key_func) # Calculate mean and standard
| deviation for each group results = [] for
| (species, island), group in groups: values =
| [row['body_mass_g'] for row in group] mean_value =
| fmean(values) sd_value = stdev(values,
| xbar=mean_value) results.append({
| 'species': species, 'island': island,
| 'body_weight_mean': mean_value, 'body_weight_sd':
| sd_value })
| rossdavidh wrote:
| Speaking as a python programmer who has occasionally done work in
| R: yes, of course. Python is not a great language for anything;
| it's a pretty good language for just about anything. That is, and
| always has been, its strength.
|
| If you're doing data science all day, you should learn R, even if
| it's so weird at first (for somebody coming from a C-style
| language) that it seems way harder; R is made for the way
| statisticians work and think, not the way computer programmers
| work and think. If you're doing data science all day, you should
| start thinking and working like a statistician and working in R,
| and the fact that it seems to bend your mind is probably at least
| in part good, because a statistician needs to think differently
| than a programmer.
|
| I work in python, though, almost all of the time.
| ZhiqiangWang wrote:
| Agree, the argument is well made in sklearn API design paper
| https://arxiv.org/abs/1309.0238
| actuallyalys wrote:
| As much as I like Python and personally prefer it to R, I don't
| really disagree. But I'm not sure R is a _great_ language for
| data science either--it has its own weaknesses, e.g., writing
| custom loops (or functional equivalents with map or reduce) was
| pretty clunky last I tried it.
|
| The other thing is that a lot of R's strengths are really the
| tidyverse's. Some of that is to R's credit as an extensible
| language that enables a skilled API designer to really shine of
| course, but I think there's no reason Python the language
| couldn't have similar libraries. In fact it has, in plotnine. (I
| haven't tried Polars yet but it does at least seem to have a more
| consistent API.)
| dcreater wrote:
| Fixed title: Python is not a great language for data science if
| pandas/polars/ibis did not exist
| mike_ivanov wrote:
| Please read the article. It literally shows pandas code as an
| example.
| roadside_picnic wrote:
| So I've been writing Python for around 20 years now, and doing
| data science/ML work for around 15. Despite being a Python
| programmer first I spent a good 5 years using R exclusively.
| There's a lot of things I genuinely love about R and I strongly
| believe that R is unfairly maligned by devs... but there's a good
| reason I have written exclusively Python for DS work for the last
| 5 years.
|
| > Python is pretty good for deep learning. There's a reason
| PyTorch is the industry standard. When I'm talking about data
| science here, I'm specifically excluding deep learning.
|
| I've written very little deep learning code over my career, but
| made very frequent use of the GPU and differentiable programming
| for non-deep learning specific tasks. In general Python is much
| easier to write quantitative programs that make use of the
| hardware, and you have a lot more options when your problem
| doesn't fit into RAM.
|
| > I have been running a research lab in computational biology for
| over two decades.
|
| I've been working nearly exclusively in industry for these two
| decades and a _major_ reason I find Python just better is it 's
| much, much easier to interface with other parts of engineering
| when you're a using truly general purpose PL. I've actually never
| worked for a pure Python shop, but it's generally much easier to
| get production ML/DS solutions into prod when working with
| Python.
|
| > Data science as I define it here involves a lot of interactive
| exploration of data and quick one-off analyses or experiments
|
| This re-iterates the previous difference. In my experience I
| would call this "step one" in all my DS related work. The first
| step is to understand the problem and de-risk. But the vast
| majority of code and work is related to delivering a scalable
| product.
|
| You can say that's not part of "data science", but if you did
| you'd have a hard time finding a job on most of the teams I've
| worked on.
|
| All that said, my R vs Python experience has boiled down to: If
| your end result is a PDF report, R is superior. If your end
| result is shipping a product, then Python is superior. And my
| experience has been that, outside of university labs, there
| aren't a lot of jobs out there for DS folks who only want to
| deliver PDFs.
| UniverseHacker wrote:
| Doing computational biology for several decades in about a dozen
| languages, I do think R is a much better language for data
| science, but in practice I end up using Python almost every time
| because it has more libraries, and it's easier to find software
| engineers and collaborators to work on Python. However, R makes
| for much simpler cleaner code, less silent errors, and the 1
| indexing makes dealing with biological sequences much less
| hassle.
| 3eb7988a1663 wrote:
| Pardon? Less silent errors? R has quite a few foot guns around
| permissively parsing user intention. Which does make it handy
| for exploratory analysis, but a lot more fragile when you want
| production code.
|
| Just a simple one that can get you, R is 1-indexed. Yet if you
| have a vector, accessing myvec[0] is not an error.
| Alternatively, if you had say, a vector length of 3 and do
| myvec[10] that gets NA (an otherwise legal value). Or you could
| make an assignment past the end of the vector myvec[15] <- 3.14
| , which will silently extend the array, inserting NAs
| _Wintermute wrote:
| In my experience R is king of happily chugging along spitting
| out nonsense results when it should have errored 100 lines ago.
| drtournier wrote:
| JavaScript is not a great language for web development either,
| yet...
| janalsncm wrote:
| Python is versatile which is what makes it popular. You can load
| back and forth from a GPU using well-tested libraries. You can
| memmap things if you need to. If your loops are too slow you can
| rewrite the hot loops in rust or C. You can read and write from
| most file formats in a couple of lines.
| plaidfuji wrote:
| Python is a pretty bad language for tabular data analysis and
| plotting, which seems to be the actual topic of this post. R is
| certainly better, hell Tableau, Matlab, JMP, Prism and even Excel
| are all better in many cases. Pandas+seaborn has done a lot, but
| seaborn still has frustrating limits. And pandas is essentially a
| separate programming language.
|
| If your data is already in a table, and you're using Python,
| you're doing it because you want to learn Python for your next
| job. Not because it's the best tool for your current job. The one
| thing Python has on all those other options is $$$. You will be
| far more employable than if you stick to R.
|
| And the reason for that is because Python is one of the _best_
| languages for data and ML _engineering_ , which is about 80% of
| what a data science job actually entails.
| getnormality wrote:
| ...unless your data engineering job happens on a database, in
| which case R's dbplyr is far better than anything Python has to
| offer.
| jampekka wrote:
| > And pandas is essentially a separate programming language.
|
| I'd say dplyr/tidyverse is a lot more a separate programming
| language to R than pandas is to Python.
| _ZeD_ wrote:
| Sooo... Is this a post about python envy?
| morshu9001 wrote:
| Data science is the one thing I consider Python especially good
| at
| culebron21 wrote:
| This was underwhelming. I work with Python and Pandas, and I can
| show examples of much clumsier workflows I run into. The most
| often, you get dataframe[(dataframe.column1 == something) &
| ~dataframe.column2.isna()] constucts, which show that python
| syntax falls short here, and isn't suitable for such
| manipulations. Unfortunately, there's no alternative, and I don't
| see R as much easier, there are plenty of ugly things as well
| there.
|
| There's Julia -- it has serious drawbacks, like slow cold start
| if you launch a Julia script from the shell, which makes it
| unsuitable for CLI workflows.
|
| Otherwise you have to switch to compiled languages, with their
| tradeoffs.
| markkitti wrote:
| > Unfortunately, there's no alternative, and I don't see R as
| much easier, there are plenty of ugly things as well there.
|
| Have you tried Polars? It really discourages the inefficient
| creation of intermediate boolean arrays such as in the code
| that you are showing.
|
| > There's Julia -- it has serious drawbacks, like slow cold
| start if you launch a Julia script from the shell, which makes
| it unsuitable for CLI workflows.
|
| Julia has gotten significantly better over time with regard to
| startup, especially with regard to plotting. There is
| definitely a preference for REPL or notebook based development
| to spread the costs of compilation over many executions.
| Compilation is increasingly modular with package based
| precompilation as well as ahead-of-time compilation modes. I do
| appreciate that typical compilation is an implicit step making
| the workflow much more similar to a scripting language than a
| traditionally compiled language.
|
| I also do appreciate that traditional ahead-of-time static
| compilation to binary executable is also available now for
| deployment.
|
| After a day of development in R or Python, I usually start
| regretting that I am not using Julia because I know yesterday's
| code could be executing much faster if I did. The question
| really becomes do I want to pay with time today or over the
| lifetime of the project.
| jampekka wrote:
| > Have you tried Polars? It really discourages the
| inefficient creation of intermediate boolean arrays such as
| in the code that you are showing.
|
| The problem is not usually inefficiency, but syntactic noise.
| Polars does remove that in some cases, but in general gets
| even more verbose (apparently by design), which gets annoying
| fast when doing explorative data analysis.
| slowhadoken wrote:
| Sounds like a skill issue
| gyulai wrote:
| I think, the lesson learned from > Python v. R < is that people
| prefer doing data science in a _general purpose_ language that is
| also okay-ish for data science over a language that 's purpose-
| built for data science but suffers from diseconomies.
| Specifically: Imagine a new database or something like that has
| just come out. Now, the audience that wants to wire it into
| applications and the audience that wants to tap it to extract
| data for analytics put their weight together to create the demand
| for the Python library. The economies for that work out better
| than if you had to create two different libraries in two
| different languages to satisfy those two groups of demand.
| LanceH wrote:
| You mention a good point of using Python to put out the
| results.
|
| I think munging the input into a clean enough data set that you
| can work on is another place Python excels compared to analysis
| specific tools like R.
| Surac wrote:
| I at the moment try to learn python as a hobby language. I use c
| c++ and c# to earn my money. MY biggest problem is finding good
| examples that are up to date. I spent a whole day learning that
| there a four (I think) ways to do formatting strings. This
| ,,bloat" in syntax makes even a simple print very heavy to
| digest. I don't even bother using v2 python only v3. Also using
| whitespaces to block things together sounds appealing but in
| reality you need to use editors that can indent and unindent
| whole blocks or I never get it right
| lenkite wrote:
| 15 years ago, Python programmers used to mock Perl by quoting
| the Zen of Python: "There should be one - and preferably only
| one - obvious way to do it.". This was in stark contrast to
| Perl's TIMTOWTDI motto: "There Is More Than One Way To Do It."
|
| The Zen of Python is sadly now an absolute lie.
| gbacon wrote:
| Rather than an absolute lie, I'm more inclined to
| characterize it as naive or black-and-white thinking, outside
| of CRUD apps and undergraduate intro projects.
| Stratoscope wrote:
| You seem to be making things more difficult for yourself than
| they need to be.
|
| For the strings, just use f-strings and forget all the others.
| You can even do things like this for debugging:
| >>> class User: ... pass ... user = User()
| ... user.name = "Surac" ... >>>
| print(f"{user.name=}") user.name='Surac' >>>
|
| For the block indenting, what editor are you using? Pretty much
| every modern editor lets you select a block and indent/unindent
| with Tab/Shift+Tab.
|
| VS Code and PyCharm are both free and are great for Python
| coding. They each have a full debugger, which is invaluable
| when you are learning a language.
| louistsi wrote:
| I think their point is that it's not clear to someone with 0%
| Python experience which of the /many/ different ways of doing
| things (like string interpolation) is the "correct" /
| idiomatic way.
| IshKebab wrote:
| > but in reality you need to use editors that can indent and
| unindent whole blocks or I never get it right
|
| What editor are you using that can't do that? Notepad?
| Havoc wrote:
| Realistically it's winning because it's accessible rather than
| perfectly suited
| willvarfar wrote:
| My experience was that data science was doable but clunky and
| ugly with pandas. It got slightly better with polars. Only really
| slightly better. Then, for me at least, it jumped lightyears
| ahead with duckdb.
|
| These days I run some big query on an OLAP database and download
| the results to parquet stored on the local disk of a cloud
| notebook VM and then mine it to bits with duckdb reading straight
| from these parquet files.
|
| The notebooks end up with very clear SQL queries and results
| (most notebook servers support SQL cells with highlighting and
| completion etc), and small pockets of python cells for doing
| those corner case things that an imperative language makes
| easier.
|
| So when I get to the bottom of the article where it shows the
| difference between Python and R, I'm screaming "wouldn't that
| look better in SQL?!" :)
| mettamage wrote:
| Huh, as a frequent polars user, I'll try duckdb.
| knorke wrote:
| well, duckdb works very well with pandas, too
| goatlover wrote:
| So you're saying you prefer SQL to dataframes. I prefer
| dataframes and staying in the native language.
| willvarfar wrote:
| Duckdb can see and manipulate dataframes too. Duckdb has it's
| own storage, but other table storage - e.g. the parquet files
| I mentioned or even csv files or even dataframes from pandas
| and polars - are first-class citizens. Duckdb lets you query
| them quickly and expressively.
| sheepscreek wrote:
| I really didn't understand the author's grievances. The only
| concrete example they illustrated was one where they concluded
| that Python without Pandas is verbose and ugly to achieve the
| same outcome, hence Python is not great for Data Science.
|
| That's a bad argument or a naive and obvious one; depending on
| how you look at it.
|
| Python wasn't designed for Data Science. It is not a DSL for it.
| MATLAB was arguably designed for scientific computing, and yet
| it's the most disliked language in the StackOverflow
| liked/disliked index.
|
| Here's a different way to look at it. A good programming language
| is like the weather in a city. I would love to live somewhere
| where it's 72F/23C all year round. But if it's in the middle of
| nowhere and I've got no friends to hang out with, would I? I
| don't think so.
|
| FWIW, Python is like Sweden or Finland, with shitty weather for 6
| months of the year yet thriving against all odds.
|
| PS: I think the article's topic is a bit click-batey (not a
| particularly useful discussion) because it's polarizing and no
| one will be 100% right about it. It's perhaps best thought of as
| an opinion piece.
| moi2388 wrote:
| You had me at "Python is not a great language"
| neuropacabra wrote:
| I expected the author will complain rightfully about the tooling,
| including linters, formatters and package managers. Things
| improved drastically over the years with Astral's ruff, uv and
| alpha stage ty.
|
| But the article says that very exotic syntax is more readable. I
| think this is mostly about the libraries, where honestly I
| equally don't like matplotlib and R's ggplot. But I would not
| think it's language problem.
|
| I was hoping to find some performance benchmarks or something
| more than feelings about certain block of code. Don't get me
| wrong I am also not a die hard fan of Python although I have
| written a lot or production code in it. Mentioning bloated,
| boilerplate code...I am afraid author should look on Java or any
| modern JavaScript project.
| BiteCode_dev wrote:
| Notice how the article load_penguins() example starts neatly
| after all the messy parts of data science are done and stops
| right before the next pain starts.
|
| It lives in a sterile, idealized world.
|
| Python is a great language for data science in practice because
| it turns out data science is also: - gluing a
| lot of data sources - cleaning up a ton of terribly
| shaped data - validation and error handling
| - I/O, networking, and format conversion - emboarding
| non-programmers into programming - wrapping a lot of
| compiled languages' libs or plugging system -
| prototyping stuff and exposing that prototype to some people
| - turning prototypes into more permanent projects
|
| And it turns out Python and its ecosystem are good at those while
| remaining decent at the other things.
|
| There are other languages excellent at some of those, or some of
| the other things, but rarely good at most. And because humanity
| is vast, diverse, and constantly renewing, being the second best
| at those is eventually always winning.
|
| Because whoever you are, you will be annoyed at not having the
| best experience at task X. But you would be mortified if you had
| the worst experience at doing task Y and Z. And task X, Y, and Z
| change depending on who you ask.
|
| And you want to get things done, while days have 24 hours.
|
| As usual, to understand the Python phenomenon, you have to see
| the whole picture. Not your little corner of the bubble. Not the
| ideal world in your head either. Life is not a maths problem with
| a clearly laid out premise and an elegant answer.
|
| That's the same debate about why PHP won the web in 2000 no
| matter the size of the spaghetti plate, why Windows stayed used
| for so long despite it being terrible, why people keep using
| iphones after all the abuses, etc. There is more to it than the
| use case you have every day. People have needs you don't haven't
| thought about.
|
| So it's not "let the language war begin". It's, "dude, get more
| experience, go work with accountants, ngos, govs and logistic
| chains, go work in china, africa and south america, go from a
| startup to schools to corporate, satisfy the geeks, the artists
| and the business people, than we'll talk".
| codeptualize wrote:
| Wait, so there is one example, which shows the R and Python
| equivalents are pretty much the same..
|
| I was all hyped up, ready to see the amazing examples and
| arguments that would convince me to pick up R, and it gave me
| absolutely nothing (except quotes and brackets..).
|
| Disappointing.
| jakubmazanec wrote:
| I wish people used Julia more. Few years ago I reimplemented some
| MATLAB code for a novel algorithm [1] I wanted to use in my
| dissertation about psychometrics and Julia was great language to
| work with - and also the code ran for 20 minutes instead of 60.
|
| [1] https://link.springer.com/article/10.1007/s11336-017-9581-x
| maratc wrote:
| How important was this saving of 40 minutes for the whole
| timeline of the project of writing your dissertation about
| psychometrics?
| jakubmazanec wrote:
| Very important. This was only a simulated dataset, the final
| analysis would be done on a much larger one (sadly, in the
| end didn't finish it, because of unrelated reasons). Also,
| the rewrite didn't take long; the final Julia code was small,
| few hundreds, or maybe a thousand lines.
| mfld wrote:
| This really calls for an A/B speed programming test of Python vs.
| R practitioners.
| HelloNurse wrote:
| Guess what, doing a relatively complex but standard task
| (filtering and aggregating example penguins) with a specialized
| and ossified library (Pandas) is better than doing it
| "bare.handed" with basic lists and dicts.
|
| More terse, more efficient, less error prone, hopefully more
| numerically accurate, as if Python had an ecosystem of well
| designed libraries on par with R.
| sarusso wrote:
| The main flaw of this article is comparing a general-purpose
| language built with production systems in mind (Python) with a
| domain-specific language designed for interactive analysis (R)...
| Beware of comparing apples and oranges, because productizing R
| code typically requires rewriting it in another language.
| poulpy123 wrote:
| But python is a great language for data science. As the anglos
| say: the proof is in the pudding, and the fact it is massively
| used for data science prove it is great at data science.
|
| You will say that not everything that is successful is great, and
| you will be right, but the success of python came organically,
| and not because of advertisement, de facto monopoly, politics,
| money, or first-arrived-advantage.
|
| Although there is one cause that isn't intrinsic to python but
| from the people who built numpy. The fact there is a single
| numerical library, extremely easy to use, fast and extensive in
| the whole ecosystem was very very huge
| Pinegulf wrote:
| Once the data is clean and neatly in standard format this becomes
| a matter of preference.
|
| Work experience says that 90% of work is gathering, cleaning and
| transforming data from different sources. In this capacity Python
| has more options available.
| markkitti wrote:
| I tried this in Julia with TidierData.jl, and it looks quite
| similar to the R version. using TidierData,
| DataFrames using PalmerPenguins: load penguins =
| load() @chain penguins begin DataFrame
| @drop_missing(body_mass_g) @group_by(species, island)
| @summarize( body_weight_mean =
| mean(body_mass_g), body_weight_std =
| std(body_mass_g) ) show(_, allrows=true)
| end
| ebonnafoux wrote:
| In the article
|
| > Contrast this with equivalent code that is full of logistics,
| where I'm using only basic Python language features and no
| special data wrangling package: n = len(values)
| # Calculate mean mean = sum(values) / n # Calculate
| standard deviation variance = sum((x - mean) \* 2 for x in
| values) / (n - 1) std_dev = math.sqrt(variance)
|
| He doesn' t know about the statistics package in the standart
| library of Python
| (https://docs.python.org/3/library/statistics.html). Of course,
| if you do not know to use Python, you will have a lot of
| boilerplate.
| analog31 wrote:
| >>> Without fail, from the students that use Python, the response
| is: "This will take me a bit. Let me sit down at my desk and
| figure it out and then I'll be back."
|
| This is completely aside, but I wouldn't hold this against the
| students or Python. The students may be following an age-old rule
| of office politics: "Never troubleshoot in front of an audience."
| And why this is more prevalent among the students who use Python,
| well... sample size of 30.
| Vaslo wrote:
| My team has all moved slowly from R to Python. There was no
| pressure to do so. R has a clunky feel with a bunch of modules
| that can be a challenge to automate. Python's general purpose use
| beats whatever superior modules R has all day. If someone wants
| the same package on Python from R it's probably out there.
|
| While plotting may be clunky, I just don't see r as much better.
| Plus in 2025 I can just provide a sample of data and what plot I
| want in an LLM and I get zero shot code of the plot I want.
|
| Author sounds very academic to me.
| orochimaaru wrote:
| It's not. Julia is better, much better. But Julia came too late.
|
| A lot of data science code is already in Python. That's where
| it's going to stay because rewriting code is time consuming. My
| guess is we will continue to improve Python gradually and keep
| refactoring the code.
| 1gn15 wrote:
| > It's not. Julia is better, much better. But Julia came too
| late.
|
| Sounds a lot like "worse is better". Python is the worse
| option, incomplete and inelegant, but is much more practical
| due to being there first and receiving the bulk of the
| attention.
| fithisux wrote:
| Julia macros are a game changers.
|
| You do not need a DSL.
| hmokiguess wrote:
| My issue with Python is that it makes it too easy to do things
| wrong, it accepts all and anyone. It's too inclusive and
| permissive, which is great for expression and creativity but bad
| for exact sciences and rigid disciplines. In certain matters
| opinions and cargo cult programming are often a detriment for
| science. Unfortunately for high level abstractions it's not that
| simple to do it right without sacrificing speed, so the industry
| forces the hand of the community in a lot of ways.
| skeeter2020 wrote:
| This is not about "Python is not a great language for data
| science" but the author's expertise and affection for R. I guess
| that title wouldn't get as many clicks.
| fithisux wrote:
| Personally I use R for the occasional script or some tidyverse
| quick processing.
|
| But the language has many rough edges
|
| 1. non standard eval is very weird, rlang fixes these
| shortcomings 2. unintuitive names or functions not belonging to
| packages, base has a mix of functions 3. S3 mixes with naming, no
| problem personally with S3 and S7 is even better, but mixing S3
| names with ordinary names is unintuitive, keep snake case 4.
| data.frames are unintuitive, tidyverse fixes this 5. f(a=)
| seriously? or working with unintuitive functions in body for
| discrete ranges of function arguments? 6. no imports per file in
| packages, I can live with this .. still ... 7. AST functions are
| unintuitive
|
| R has some excellent parts:
|
| non-standard evaluation, AST in the base language, lazy
| evaluation
|
| but it is being killed by the bad parts
|
| I think all the external fixes and sanity in names should go into
| base
|
| but it will take a lot of time if it ever happens due to legacy.
|
| Julia fixes many of these not as elegantly as R but it's
| pragmatic approach is too attractive.
| nyrikki wrote:
| > Contrast this with equivalent code that is full of logistics,
| where I'm using only basic Python language features and no
| special data wrangling package
|
| While I am not a python cheerleader, but a user because the
| reality is that it is a pretty good glue language, the above is a
| bit of a problem.
|
| Duckdb, pandas, numpy etc.. is what makes python nice.
|
| About a decade ago I worked at a major BI software company and
| ran into another silly problem when trying to evangelize R, wikis
| kbs and search engines don't like single letter search terms.
|
| So it didn't matter how much better R was at the time, people
| found learning it more difficult than it should have been.
| another_twist wrote:
| I think TypeScript will shine here. Especially for data output
| pipelines so we can emit strongly typed datasets.
|
| Also add to the fact that TS based exploratory code can
| potentially plot SVG via d3 and maybe even exported to a webpage.
| Decabytes wrote:
| Python pays the bills. If it was up to me I'd use a different
| language, but there is no denying that its got a strong story in
| just about every field now. As I've gotten older, I've come to
| realize that programming languages are vehicles for solving
| computer based problems, and I've learned to find joy in solving
| those problems in whatever language my company/project is using.
|
| But in my personal projects, my favorite language to use it Dart.
| prepend wrote:
| Doesn't need to be great, just needs to be good enough.
| northlondoner wrote:
| There is a similar tread, regarding life-time of projects, such
| as which ecosystem is better for long-term maintainability:
| https://news.ycombinator.com/item?id=46055463
| northlondoner wrote:
| Has anybody else noticed how much Python took from Scala for type
| hints? I was using Scala around 2015 and when I see type hints,
| immediately recognise its similarity to Scala's approach.
| CephalopodMD wrote:
| Python is the 2nd best language for almost everything
| iLemming wrote:
| Yup, paradoxically, it's also the 2nd worst language for almost
| everything.
| knorke wrote:
| okay, click bait worked on me. but the claims are weak. basically
| "Python is not a great language... because it's not that of a
| domain language than R"
|
| mediocre!
___________________________________________________________________
(page generated 2025-11-26 23:01 UTC)