[HN Gopher] Why I'm still using Python
___________________________________________________________________
Why I'm still using Python
Author : japhyr
Score : 110 points
Date : 2022-12-30 16:06 UTC (6 hours ago)
(HTM) web link (mostlypython.substack.com)
(TXT) w3m dump (mostlypython.substack.com)
| password4321 wrote:
| I continue to reach for Python first for my personal scripting,
| especially gluing together existing libraries.
|
| Building tools multiple people will be using regularly? Might be
| worth speeding up at least the slow parts with another language.
| (I don't know it well enough for it to by my first choice here).
|
| Also: I appreciate typed languages more when working with a team.
| I'm not familiar with the options Python offers to keep teams on
| the same page.
| worik wrote:
| I had to install Mailman 3 a while ago.
|
| It would not run on a $5 VPS from DO, because to deploy it
| required more memory (two gigabytes I think... meh).
|
| Mailman 1 ran in 40K. It was written in Perl. Clunky, not flash,
| and functional.
|
| Mailman 3 is written in Python using Python tooling - Django I
| believe.
|
| Clunky, flash, and functional. And costed my $5 a month extra.
|
| I do not hate Python, I forgive the language for people misusing
| it. But it is a _scripting_ language. That is what it is good
| for. Calling other systems. Not building the tight loops
| [deleted]
| KyeRussell wrote:
| It's easy (and common) to point to Instagram et al as examples
| of people using Python and even Django at scale. Beyond those
| big hitters though, Python is obviously a very popular
| language, and Django is a very popular web framework. The
| people making these tooling decisions aren't just noobs. Python
| and Django obviously meet the functional and performance
| characteristics desired by a lot of people.
|
| I have Django projects, some of which I am going to assume are
| more complex than Mailman, using far less than two gigabytes in
| production. Not through any extreme optimisation efforts. It's
| just...what's happened. In my experience, the resource
| footprint you experienced with Mailman is not representative.
|
| Notwithstanding the fact that we've seen amazing performance
| improvements in Python as recently ad 6(?) months ago, if speed
| were my primary concern, I am obviously not picking Python.
| This doesn't mean that it should be relegated to the frankly at
| this point outdated bucket of "scripting languages". The
| language's features, performance profile, and package ecosystem
| all surpass these use cases. The conventional wisdom, which I
| completely agree with, is that these days things are good
| enough that speed need not be a primary concern for most people
| most of the time. Sure, oldschool Mailman was fast, but did
| anyone want to or know how to actually work on it anymore?
| Evidently not.
| Simran-B wrote:
| How popular would Python be without the scientific users and the
| machine learning ecosystem?
|
| I have the feeling that its user base is pretty lopsided, but I
| might underestimate its importance in other areas, like teaching
| programming. I'd love to see a breakdown of its popularity by
| field.
| baq wrote:
| > How popular would Python be without the scientific users and
| the machine learning ecosystem?
|
| That's tail wagging the dog. 'Why Python became the scientific
| computing language?' is the right question.
| college_physics wrote:
| This. Somebody should actually trace the milestones that led
| to the current state as it not by any means obvious why it
| happened. At some point a winner-takes-all dynamic obscured
| some direct competitors like R or julia but prior to that all
| bets were open. It could also have been c++, since libraries
| like armadillo and eigen make it easy to write script-like
| code. Or even java that is everywhere in enterprize
| shanebellone wrote:
| All the top tech companies use Python for product.
|
| Google
|
| Facebook + Instagram (largest deployment of Django)
|
| Spotify (millions of lines of Python)
|
| Netflix
|
| Reddit
|
| Dropbox (millions of lines of Python)
|
| YouTube
|
| Plenty more examples.
| theCrowing wrote:
| It's also the primary language used for everything regarding
| VFX. Think Pixar, ILM, WETA, Disney etc. it's mostly inhouse
| tools so nobody ever sees them but the code bases are huge
| and are growing as fast as the demand for more and more high
| quality VFX from streaming services and studios.
| shanebellone wrote:
| That's interesting. I wasn't aware of this. Thanks for the
| share.
| mihaaly wrote:
| All those use dozens (hundreds?) of other tech for product
| too.
|
| How is this an answer to the question?
| shanebellone wrote:
| "How popular would Python be without the scientific users
| and the machine learning ecosystem?"
|
| ... do I really need to explain this?
| rrdharan wrote:
| Google / YouTube no longer use Python "for product", at
| least, not in the serving path. Dropbox has also been moving
| away and towards Golang.
|
| Instagram (Cinder), Google (Unladen Swallow) and Dropbox
| (Pyston) have also all experimented with heavy engineering
| investment trying to improve Python performance and only one
| of them (Cinder) has outcompeted the "just rewrite it in a
| faster language" strategy.
| shanebellone wrote:
| Also...
|
| https://devblogs.microsoft.com/python/python-311-faster-
| cpyt...
| solarkraft wrote:
| I think it's a great pattern to start with Python and do
| something once performance becomes a problem. You can
| afford it when you're successful thanks to the code you
| wrote in Python, or it won't be a problem anyway.
| TillE wrote:
| Very. Unless you're a diehard JavaScript head, it's still the
| language of choice for smallish projects like a Discord bot, or
| for any kind of sophisticated script.
| buffalobuffalo wrote:
| As of late, my "language" of choice is a mashup of python and
| nim. I use python for the ecosystem and nim for parts that need
| to run fast. It's the easiest interoperability I've ever worked
| with.
| marsupialtail_2 wrote:
| I just find it funny how my post on why I wrote a SQL engine in
| pure Python is right next to this one on the front page
| atoav wrote:
| One of _the_ single language features that makes me stay with
| python is list comprehensions. So maybe something like that:
| stuff = [cleanup(s) for s in stuff if s.unprocessed]
|
| This, the way you can handle/manipulate strings and the powerful
| included libraries make python a good language.
|
| Downsides? Dependency managment, tooling, speed
| cortesoft wrote:
| I feel like this is a lot cleaner and more clear in Ruby:
|
| stuff = stuff.map {|s| cleanup(s) if s.unprocessed }.compact
| nequo wrote:
| Only knowing Python and not Ruby, your snippet looks
| confusing to me.
|
| 1. What does the anonymous function return if s.unprocessed
| is false?
|
| 2. If it returns nothing, do I get a missing value element in
| the iterable that results from map?
|
| 3. And what does it mean to compact the result of the map
| call?
|
| The Python snippet raises none of these questions in me. But
| it could be because I am familiar with Python's quirks.
| cortesoft wrote:
| The compact gets rid of all the null values returned when
| unprocessed is false.
|
| Interestingly I wasn't exactly sure what the python code
| would do, which is why I used the compact. In the python
| code, does 'stuff' end up only having the result of the
| cleanup function for items that are unprocessed? Or will it
| have all of the items, and the ones where unprocessed
| returns false just end up being returned unmodified?
|
| My Ruby version returns the processed subset of items that
| were unprocessed. The python version was not clear to me,
| since I don't know what happens if s.unprocessed returns
| false.
|
| So basically, it sounds like we were both confused by the
| same parts of the language we are unfamiliar with... what
| happens to elements where s.unprocessed is false? It was
| clear to me that in Ruby it returns a nil, but I have no
| idea in Python... what happens?
| nequo wrote:
| Huh, our mutual confusion is interesting indeed! Thank
| you for teaching me something about Ruby. What compact
| does makes sense.
|
| Python's list comprehension combines map and filter.
| [process(s) for s in stuff if s.unprocessed]
|
| is equivalent to map process (filter
| isUnprocessed stuff)
|
| in Haskell.
| baq wrote:
| Nothing, literally. This is map+filter optimized (or a
| for+if+append but using dedicated opcodes for efficiency)
| so you don't have to do multiple passes and/or create
| temporary lists with superfluous elements.
| snowpid wrote:
| List Comprehension is just syntax for functional coding
| (filter, map) but a very good one!
| ogogmad wrote:
| Being pedantic: This isn't true in Haskell, because list
| comprehensions can use pattern matching, while map+filter
| can't. heads :: [[a]] -> [a] heads xs =
| [y | (y:ys) <- xs] -- heads [[1],[],[2,3]] ==
| [1,2]
|
| Doing the above using maps and filters looks very different:
| heads' = map head . filter (not . null)
|
| Python doesn't really support Haskell-like pattern matching,
| so the above doesn't apply.
| ulucs wrote:
| The thing you're looking for is easily found with a Hoogle
| search for type (a -> Maybe b) -> [a] ->
| [b]
|
| Which yields mapMaybe: https://hackage.haskell.org/package/
| base-4.17.0.0/docs/Data-.... so you end up with
| heads = mapMaybe safeHead
| pdonis wrote:
| _> Python doesn 't really support Haskell-like pattern
| matching_
|
| It does as of 3.10:
|
| https://peps.python.org/pep-0622/
|
| Not as sophisticated as what Haskell can do, I'm sure, but
| I think it's a good addition to Python.
| von_lohengramm wrote:
| I may be the outlier here, but I have a very hard time
| _comprehending_ list comprehensions. They 've always seemed
| like a SQL query compressed into a single line (or two if
| you're lucky!): readable when it's simple, Lovecraftian when
| it's not. I've always found the functional pipeline style
| much more readable and much easier to explain to others,
| especially with IDE type hints.
| maxbond wrote:
| IMHO the key to legible comprehensions is to use them only
| to express composition and to put the logic into verbosely
| named functions.
|
| Chained pipelines (`foo.filter(...).map(...)`) are fine to
| me, certainly more verbose then comprehensions for better &
| worse, but I get lost in pipelines expressed as nested
| functions (`map(filter(a, ...), ...)`).
| dragonwriter wrote:
| > they've always seemed like a SQL query compressed into a
| single line (or two if you're lucky!):
|
| If its more than a very simple expression, with one simple
| for, and no if or very simple if, I try to use at least
| (more, with parens, if necessary): one line for the result
| expression, one line for each for clause, and one line for
| each if clause.
|
| But, while the boundary is subjective, it doesn't take much
| before functional pipeline style is clearer (Python lambdas
| being more limited and syntacticaly cumbersome, functional
| pipeline style isn't as clean in Python as it is in other
| languages, though.)
| int_19h wrote:
| SQL queries _are_ sequence comprehensions!
|
| Python comprehensions in particular are unfortunate because
| the flow of the query, like in SQL, is not linear - the
| projection part is written first, but happens last. Thus,
| reading a non-trivial query requires moving back and forth
| to resolve the references.
|
| Compare that to C# LINQ, which is also a kind of sequence
| comprehension. That one not only has linear syntax where
| the data flow in the query is strictly left to right, but
| it's explicitly defined by mapping it to functional
| pipeline style. Thus, it ends up being a strict improvement
| on the latter.
|
| (Another example of "good" sequence comprehensions in this
| sense is XQuery FLWOR)
| pharmakom wrote:
| Sorry, but Python list comprehensions are not very good
| compared to other languages.
| curiousgal wrote:
| C#'s LINQ is far more powerful in my opinion.
| matsemann wrote:
| I have a completely opposite reaction, one of the things I hate
| the most with Python is its list-incomprehensions. They quickly
| become hard to read and very imperative, instead of
| declarative. Something like
| stuff.filter(it.unprocessed).map(cleanup(it)) conveys the
| intention behind each statement a hundred times better. And
| list comprehensions becomes impossible to reason about once
| they involve flatmapping or other relative _simple_ functional
| expressions.
| MrVandemar wrote:
| Definite agreement. I started a long time ago with Python2,
| and what attracted me was how simple and readable the syntax
| was.
|
| With Python3 there is a move towards "idiomatic" Python code,
| which to me means moving away from clear and concise to the
| old programmer trap/readability nightmare of "look how many
| things I can get it to do with one line of code!"
| sodapopcan wrote:
| I agree very much so. List comprehensions and can be nice and
| concise and readable in many situations, but due to the whole
| "one true way" thing in Python, they're used in every
| situation, even when a map/reduce/filter etc would be better.
| Although since there are no pipes, I sorta get it.
| spicyusername wrote:
| I almost always prefer to read nested loops and conditionals
| than list comprehensions.
|
| I find the syntax of list comprehensions ugly and inscrutable.
| xvedejas wrote:
| Additionally I find that it's easier to add complexity to the
| loops; once you want to start doing nontrivial things in a
| list comprehension, you often need to rewrite it into a loop
| first anyway.
| sodapopcan wrote:
| List comprehensions are great but certainly not unique to
| Python.
| solarkraft wrote:
| Python is pretty decent for small to medium sized stuff where
| performance isn't a big issue (setup stuff, simple backends,
| ...). A lot of stuff falls into that category.
|
| I especially enjoy the types. Coming from Typescript and being
| quite a fan of the "dynamic language with optional typing"
| pattern it was quite cool to find that Python just has that.
|
| Sadly quite a few pieces of software are likely going to stick
| with Python 3.7, so we are probably going to have to stick with
| the old syntax (explicit imports from typing and uppercase types)
| for a long time.
| college_physics wrote:
| Python has only one really serious drawback for the kind of tool
| it is and its not remotely its own fault: its not a first class
| resident on mobile.
|
| In a different universe its sweet easygoing manner would have
| created enormous value by enabling all sorts of casual
| programmers to have some control over this most ubiquitus of
| computing devices.
|
| Alas we dont live in an era where tech aims to empower users.
|
| NB: There are a number of projects that allow some of this (kivy,
| beeware) but they need to swim hard against the current.
| pharmakom wrote:
| The main drawback of Python is the limited set of language
| features.
| KyeRussell wrote:
| This is how we end up with another Swift.
| ugh123 wrote:
| and those are....?
| college_physics wrote:
| Most developers need to have at least two languages at their
| fingertips, so none of them needs to be perfect. What is
| important is to a spanning set for all the use cases that are
| relevant to you.
|
| I think if we switch from endless arguments about best
| languages to the optimal _pairs_ of languages we 'd book huge
| gains (and less online noise :-)
| sigmonsays wrote:
| all the reasons i switched away from python (ranked most
| important to least) 1. packaging python is awful. Turning a large
| python code base into a deb is awful. I mean really, how many
| _different_ setuptools or pip do you really need? 2. not strictly
| typed. Really impedes large code refactors 3. not concurrent. The
| concurrency primitives are a joke compared to any other language.
| The GIL kills everything. 4. slow. Python gains all it's
| performance by doing the important parts in C
|
| python is not a good fit for anything I do these days
| baq wrote:
| Good decision to move away. Tools are supposed to help, not get
| in the way. Though types are there, opt-in, and who cares if C
| modules make it fast if the end result is fast?
| submeta wrote:
| Been using Python for over 15 years, and although there are many
| programming languages that I find way more attractive (Lisps and
| functional programming languages), I keep sticking with Python,
| because you can learn it very quickly, you have a vast amount of
| resources and a large community that'll help you, and there is
| nothing you cannot do with it: From Webdev to data analysis to
| machine learning, there's a module for almost everything you need
| to solve.
|
| (Edited ;)
| 9dev wrote:
| ...those ChatGPT responses are getting old real fast by now.
| flashfaffe2 wrote:
| Exactly what I thought reading the comment.
| [deleted]
| xwowsersx wrote:
| Good post. I think this is the mature and somewhat inevitable
| lens through which to assess which programming language you're
| using. I have been using primarily Python for the last ~4 years
| or so. It's probably the most boring language I have ever used
| (it probably doesn't help that I've used Scala, Haskell, Rust,
| etc).
|
| Obviously, some languages are better than others for specific
| use-cases and one should think through the strengths and
| weaknesses of each before beginning a new project (assuming you
| expect the project to have some longevity). Similarly, it
| definitely _is_ possible to make the wrong choice about which
| language to use. But even as a PL "enthusiast", I've learned
| over ~15 years of leading successful projects and teams, that
| evaluating languages is always team/org/business-dependent and
| the considerations that factor into those contexts are distinctly
| different from the kind of considerations that I, as a PL
| enthusiast (and I suspect many others in the HN crowd) would
| prioritize. You have to consider things like:
|
| 1. What stage in the business in, i.e. early stage where you may
| simply want to get something, _anything_ working and it 's viable
| to do so in whatever language will get you there fastest (warts
| and all, tech debt be damned, etc) vs the critical juncture where
| you're beyond finding product-market fit and you now have a
| business that is working and you need to increasingly focus on
| reliability, scalability, etc?
|
| 2. What does the pool of engineering labor in the market you have
| access to look like? For example, it may be the case that you'll
| have difficulty, for one reason or another, finding developers
| who work with language X, but could easily find many who can work
| with Y (of course, good devs should theoretically be able to ramp
| on any language, but whether you'd want them to do depends on how
| crucial using that particular language is to your venture,
| whether you can afford the ramp-up time, etc.
|
| 3. Also somewhat related to the quality of engineers you have
| access to, the community/packages/ecosystem surrounding the
| language may matter a lot to such a degree that even if some
| other language were a better fit from a purely technical
| perspective, it may leave you to have to fend for yourself in
| ways that your team simply isn't up for.
|
| .. and many more considerations.
|
| Beyond all of this, I resonate with the author's orientation of
| wanting the language to just get out of the way. Sometimes you're
| looking to geek out on a stimulating and interesting language.
| Other times, that's the last thing you're thinking about. Bonus
| points if the language you find most compelling is also the one
| that, for reasons like the ones mentioned above, is the one that
| makes the most sense for you and your team. All in all, the point
| is that assessing a language in isolation, without any regard for
| what you're actually trying to accomplish, what your team looks
| like, etc. doesn't make much sense.
|
| Side note: I do wish more Substack authors would allow comments
| even from non-paid subscribers. I could understand restricting it
| if you already have an established paid subscriber base and a
| chatty comments section. But if comments are non-existent, it
| seems to me like the author is potentially losing out on valuable
| feedback/commentary by restricting commenting to paid
| subscribers.
| pugio wrote:
| I've been thinking (and researching) heavily in preparation for a
| new (offline, desktop) app I want to build. The app has a lot of
| data wrangling, and probably a decent amount of ML. Python seems
| like the logical choice, but...
|
| I've written a lot of Python, taught classes in it, deployed
| production code... and I still feel a semi-conscious urge to
| reach for something else whenever I contemplate starting a new
| project. Something about its approach, syntax, common idioms,
| always feels just a tad clunkier to me than I'd prefer. The
| whitespace makes it hard to paste lines into my repl, and
| (probably the biggest thing) comprehensions ARE NOT SUFFICIENT
| replacements for easy anonymous inline functions (JS () => , or
| Ruby's blocks).
|
| I like dynamic languages, I really like the advances in tooling
| with type checking and LSP support. I like dynamic notebooks
| (either inline in VS Code #%%, or straight Jupyter), the massive
| package ecosystem (but obviously hate the actual packaging
| tools), cool tools like Rich... but it just doesn't make me as
| happy to use as other languages.
|
| I'm still trying to articulate why. Maybe I was just ruined by
| learning Ruby first.
| pphysch wrote:
| Unless you have a clear specification for what you are going to
| build (as in, you have most of your data schemas and processes
| sketched out), stick with Python and benefit from the
| development velocity as you figure things out.
|
| Then rewrite it later when you need to scale.
| nnadams wrote:
| I have embraced this workflow completely. I used to be more
| concerned about which language, but now I find I much more
| useful to just start immediately in Python. I spend most of
| the time working out the kinks and edge cases, instead of
| memory management or other logistics. Maybe ~75% of the time,
| that's it, no need for further improvement.
|
| Recently I chose to rewrite several thousands of lines of
| Python in Go, because we needed more speed and improved
| concurrency. Already having a working program and tests in
| Python was great. After figuring out a few Go-isms, it was a
| quick couple of days to port it all for big improvements.
| alar44 wrote:
| This is what I do. Hack it together in Python to get
| something running, then rewrite everything in C++.
| JonChesterfield wrote:
| This works especially well if you test the python version
| as you go, then run the same python tests against the C++
| replacement. The up front bother of pybind or similar is
| minor relative to not having to recompile the C++ to
| iterate on test scripts.
| devoutsalsa wrote:
| Elixir is nice is you are looking for something new to try.
| jamal-kumar wrote:
| I recently discovered it. It's pretty amazing for how
| incredibly easy it is to build distributed applications with
| it, that is the part which impressed me the most. It handles
| concurrency well for sure but same program seamlessly across
| multiple computers is just awesome. I know it's basically
| just taking this functionality from being built on top of
| erlang already but it's still very impressive. Plus like
| crystal if you already know ruby, having tried to base it's
| syntax on it, it's an easy thing to pick up if you can grasp
| the functional aspects too.
| yason wrote:
| Python is a good language and I use it a lot. But I can only say
| Python2 was fun but Python3 is serious. I could live with the
| quirks and shortcomings of Python2 and still enjoy programming.
| But even after a decade of history I still find many of the
| things unenjoyable that Python3 decided to introduce. I know why
| they did it, it's not an intellectual complaint. It just
| describes the feeling of starting a new Python program. Used to
| be great fun. Nowadays, merely a technological compromise.
| analog31 wrote:
| I'm strictly a "scientific" programmer. I've used multiple
| languages in 40+ years, but only two lasted more than a decade:
| Turbo Pascal and Python. As Python has passed the decade mark for
| me, I sometimes wonder why I'm still using it and what would make
| me get sick of it.
|
| Some of my reasons for choosing Python in the first place have
| been mooted. Python is free, and multi-platform, but so is
| everything else nowadays. I think that battle is over.
|
| For my use, packaging has not been a big obstacle, maybe because
| I tend to use a relatively small handful of established packages.
| I don't remember needing a package that I couldn't install with
| pip.
|
| Easy to learn? Yes and no. I think a beginner can ease their way
| into Python and get productive, if they start with basic
| scripting and progressively adding a few language features. On
| the other hand, some features are simply over the heads of
| beginners, making the code found in packages practically
| unreadable. But I've never found any language to be better in
| this regard.
|
| Fast? Sure, with numpy in its domain of use. Or writing your own
| C libraries. However, I think a beginner using numpy will run
| into roadblocks that require help from an experienced programmer,
| who can visualize what the array operations are doing under the
| hood. So, writing fast code isn't easier to learn in Python than
| in any other language.
|
| The editor should have a final option, that turns your code brown
| if it's just so bad that nobody should ever have to read it,
| including yourself.
| 6gvONxR4sf7o wrote:
| Another big one that many languages treat as table stakes these
| days is the whole 'batteries included' thing. Whenever a new
| language comes out and I have to make my own (or install) core
| capabilities, it feels like a huge waste of time.
| analog31 wrote:
| Having to piece together a Python installation can be an
| annoyance for a beginner. I always offer to help them set
| things up the first time. Since most of my workplace is on
| Windows, it's easy to have people install WinPython.
|
| This is especially true if they've already tried setting
| things up, and have made a mess of it. Because WinPython is
| self contained, it can work on a computer that already has a
| working or non-working Python installation.
| donkeybeer wrote:
| What do you mean by piecing together a python install? Most
| linux should have one pre-installed or easily available via
| the package manager. After that venvs and pip seem to be
| generally enough for most tasks.
| ghshephard wrote:
| I've been using Python for about 6 years, my last company
| wrote it's core commercial product exclusively in python.
| I've never used venvs, but I've yet to run into a
| situation where virtualenv (maybe with virtualwrappers as
| sugar), and pip + requirements.txt haven't been able to
| handle even fairly complex install situations.
| analog31 wrote:
| My workplace is a Windows shop. I'm multi platform, but
| by and large the folks using Linux don't need my help.
| ;-) So when I help people its almost always on Windows.
| blensor wrote:
| I think one big factor is easy entry but almost no limit:
|
| Like BASIC back in the day it's a language that allows
| newcomers to do things long before they are comfortable with
| the added syntax requirements of C based languages
|
| Despite that the sky is the limit, almost all problems a
| developer will encounter can be tackled with python. Yes for
| many problems there are languages that are better suited for
| the task, but in most cases a project does never even reach a
| state where that will matter.
|
| And it's installed by default in many Linux distributions.
| abdullahkhalids wrote:
| I now use python, but Python is not even half the language
| Mathematica is for scientific computing.
|
| If only Mathematica was free, I think it would become the
| defacto language for scientists.
|
| The most important feature of Mathematica is that it is
| functional and not object-oriented. Mathematics is functional,
| not object-oriented. So thinking in Mathematica is as easy as
| thinking in maths. Thinking in python is to force yourself to
| think in an unnatural way.
| klyrs wrote:
| > Mathematics is functional, not object-oriented.
|
| I beg to differ. If by "math" you mean arithmetic and
| calculus, then sure. Combinatorics, graph theory; probably
| most of discrete math is very much object-oriented.
| Heston wrote:
| How does the speed of Mathematica compare to python or C?
| pharmakom wrote:
| There is Octave and Julia
| baq wrote:
| Mathematica is so much more it's hard to even describe.
| It's free on a raspberry pi, so if you have time and access
| to hardware, take a look.
| jrumbut wrote:
| I think the so much more/hard to even describe is sort of
| a problem?
|
| There is a lot to it, and in my experience it's a very
| long road to getting your first problem solved.
|
| There are a lot of very cool pieces but I want it to be
| something I can pull off the shelf once in a while and be
| productive rather than something that I need to be an
| expert in before I can get started.
| cycomanic wrote:
| I think you're regarding Mathematica from a purely
| "theory/analytics" point of view, however I think you're
| missing the fact for a lot of scientific computing you want
| to interact with other things apart from numerics/analytics
| etc.. I believe that is really where Python shines and why
| it's also eating Matlab's lunch (more so than Mathematica's).
| It's incredibly easy to write a gui to interface with e.g. a
| scientific instrument do some reasonably fast analytics with
| numpy and possibly even produce publication ready plots. This
| is not really possible with Mathematica and while you can do
| this in Matlab, you need several toolboxes which are
| expensive, but more over are of extremely varying quality
| (let's not even talk about the horror of writing GUIs in
| Matlab). With Python, you have all of that for free, with a
| single tool (remember most scientists don't want to learn
| lots of programming languages).
| prepend wrote:
| > If only Mathematica was free, I think it would become the
| defacto language for scientists.
|
| Of course. It's one of the few languages that I think are
| worth it.
|
| I've been waiting for Wolfram to open source it for 10 years
| as I can't afford it, or really justify why my org should use
| it.
| pdonis wrote:
| _> The most important feature of Mathematica is that it is
| functional and not object-oriented._
|
| Python has object-oriented features but there is nothing that
| requires you to use them; you certainly don't have to express
| every Python program in terms of classes and methods, the way
| you do in Java. Doing functional programming in Python is
| common. Is there a particular aspect of doing functional
| programming in Python that you find to be a roadblock?
| deafpolygon wrote:
| Is JavaScript worth looking at, as an alternative to Python?
| baq wrote:
| As an alternative for web apps, yes. In other domains, not
| really, but it'll work, it just doesn't solve any problem
| better except it isn't Python.
| jeremyjacob wrote:
| You might look at Deno[0] which provides that scripting
| language experience, similar to python.
|
| [0] https://deno.land
| rkwertz wrote:
| This language seems to require at least one positive front page
| post per day so people keep using it. Compare with the anti-
| golang posts today.
| candrewlee14 wrote:
| Python does a great job at putting together powerful scripts
| quickly. But I have a really hard time opening up a big Python
| codebase and making sense of it quickly. No static types can make
| data pretty hard for me to follow. This may just be something I
| need to improve at, but I have a distinctly better feeling
| working with unfamiliar code with static types. Python makes me
| feel I'm at the will of the documenter to be informed, whereas
| the types self-document in many cases in other langs.
| cycomanic wrote:
| I think this is actually a big difference between SWEs and
| programmers who are scientists/data scientists/analysts etc.
| first. In my experience, people with more formal education in
| programming find type systems to be very helpful, while people
| from other backgrounds, find them confusing and an nuisance.
| Barrin92 wrote:
| I think that's a take in the category of 'you don't want types,
| you want better names'.
|
| Static types are very useful for compilers but looking at a
| function and seeing int -> int -> int -> string -> bool -> int,
| says very little about the semantics of a program. It's
| _always_ the names and documentations that tell human beings
| how to make sense of a program.
|
| When we put things in record types, the sense-making value
| isn't in the static analysis but in the fact that our vague
| collection of parameters now has a proper name that indicates
| what it's all about.
| dragonwriter wrote:
| > Static types are very useful for compilers but looking at a
| function and seeing int -> int -> int -> string -> bool ->
| int, says very little about the semantics of a program.
|
| Sure, but on the other hand a type of APIProxyEvent ->
| LambaContext -> APIGatewayProxyResponse says quite a bit more
| about the semantics.
|
| Unless a function is highly abstract, int -> int -> int ->
| string -> bool -> int is probably an underspecific type
| signature.
|
| EDIT: To be clear, I generally find that the thesis "you
| don't want types, you want better names" comes from assuming
| _bad_ types, and suggesting replacing them with _good names_.
| And, for casual inspection, yes, good names may be superior
| to bad types. On the other hand, I can't statically check
| good names, I can statically check types (good types, or bad-
| because-underspecific types, but good types, as well as
| telling me more as a _reader_ , will also statically catch
| more possible errors.) Ultimately, what I want is good types
| _and_ good names,
| ruuda wrote:
| Python has optional type annotations. Most new code tends to
| have them, and they help tremendously navigating and
| understanding large codebases.
| lordgroff wrote:
| It's a far cry from an actually typed language. I like Python
| and deal with its code bases for a living, but frankly it has
| gotten too big for its britches.
| The_Colonel wrote:
| It's honestly not very good compared to a "real" type system
| (platform). I've used it about a year ago for a couple of
| projects, and it was painful and ultimately not very useful
| exercise.
| depressedpanda wrote:
| While I agree that type hinting by its very own nature
| feels a bit bolted on, I vastly prefer going into code
| bases which include type hinting. I personally always add
| type hinting to the code I write, as I actually consider it
| quite useful.
|
| In what ways do think it's painful?
| The_Colonel wrote:
| I remember mypy being slow and buggy. I remember one mypy
| upgrade broke all our builds because they changed some
| type of resolution thing. IIRC after some outcry they
| backtracked and started providing some migration path.
|
| The other thing which rubbed me the wrong way was that
| the python was happy to run the code with completely
| wrong type hints.
|
| I guess I went into it with wrong expectations, even
| though it says right in the name - it's "type _hints_ ".
| The whole experience felt more like a formalized
| documentation with partially working optional
| verification (which can't really be relied upon).
| icedchai wrote:
| Type hints are basically for documentation and metadata.
| You also find a bunch of third party libraries and
| frameworks, like pydantic, fastapi, etc. that makes use
| of them.
| baq wrote:
| It's as real as typescript... not useful for code
| generation or optimization, but very helpful for
| correctness.
| ruuda wrote:
| I've used Mypy since 2016 on big as well as small
| codebases, and it has been extremely useful for me. It
| caught numerous bugs at typecheck time. The benefits of
| better readability and e.g. better jump to definition and
| autocomplete are harder to quantify, but subjectively they
| feel substantial. If you are gradually annotating a big
| codebase, it does require a "critical mass" before types
| start to pay off, I agree that adding them later can be
| painful.
|
| Mypy's type system is quite advanced compared to some
| statically typed language like C and Go; it has generics,
| union types, inference, and it can prove properties of
| types from control flow. I work mostly in Rust, Haskell,
| and Python, and I rarely find Mypy limiting.
|
| You do have to embrace types though; if you have a dictly
| typed program where everything is a Dict[str, Any], then
| putting that in annotations isn't going to be very helpful;
| converting the dicts to named tuples or dataclasses is.
| dragonwriter wrote:
| > Python has optional type annotations
|
| Moreover, it has static type checkers with varying degrees of
| support for type inference.
| dang wrote:
| This was also discussed in the last day:
|
| _Breaking up with Python_ -
| https://news.ycombinator.com/item?id=34177703 - Dec 2022 (172
| comments)
| velcrovan wrote:
| So, the reasons are:
|
| * I've been using it for a long time
|
| * I have lots of friends that use it
|
| * Even though I'm interested in other languages I don't have time
| to learn them
|
| These are all perfectly valid reasons! But, they say very little
| about Python.
| stabbles wrote:
| They should put energy labels on programming languages. It's
| convenient to use Python, but a waste of cpu cycles
| [deleted]
| rahen wrote:
| If so, there would only be room for C and assembler.
|
| Give me at least ksh and awk, they don't use much resources and
| are infinitely handy.
| duckmysick wrote:
| Or to go one step further, phase out (outlaw) energy-
| inefficient programming languages like we do with high-emission
| cars.
|
| I'm not advocating for it, but it's an interesting thought
| experiment.
| nnadams wrote:
| A group of researchers published a paper on this topic, "Energy
| Efficiency across Programming Languages" [0]. They have tables
| listing the relative energy usage and memory consumption.
|
| It's just one paper of course, but the energy results weren't
| too surprising. C was the baseline. C++, Rust, and Java close
| behind. Languages like C# and Go near the middle, and then
| Python, Ruby, and Perl at the bottom for most energy used.
|
| [0] https://greenlab.di.uminho.pt/wp-
| content/uploads/2017/10/sle...
| awjlogan wrote:
| This got a reasonable amount of attention, but it's not
| particularly relevant. The example solutions would never be
| written in pure interpreted Python - you would import numpy
| and use the numeric functions from there, effectively using C
| rather than Python.
| heresjohnny wrote:
| You seem to favor human brain cycles over CPU cycles.
| Conversely, time spent on memory management and pointer
| arithmetic is time that could've been used to add value to the
| world.
| alar44 wrote:
| You don't "manage memory" unless you're writing C, and if you
| think "pointer arithmetic" is hard, you're a beginner level
| programmer.
| jnsaff2 wrote:
| Well for code that gets executed billions of times spending
| more human cycles seems a fair trade off. For boring old
| corner shop crud, maybe not.
| heresjohnny wrote:
| Most code isn't fire and forget though. Bugs occur,
| requirements change. The human cycles are a perpetual
| investment. And Python is simply very, very maintainable.
|
| I do agree that there is a point where the scale tips. I
| wonder where that is, and what kind of application we're
| talking about then. Many applications are at their basic
| level old corner shop crud, really.
| lylejantzi3rd wrote:
| Why hasn't anybody created a more efficient version of Python?
| Slap some static types on that baby, get rid of the heavy
| runtime, and you've got a good start toward something special.
| adsharma wrote:
| https://github.com/py2many/py2many/blob/main/doc/langspec.md
|
| Reimplement a large enough, commonly used subset of python
| stdlib using this dialect and we may be in the business of
| writing cross platform apps (perhaps start with android and
| Ubuntu/Gnome)
| doodlesdev wrote:
| There are more efficient interpreters and compilers, problem
| is they don't work with the existing libraries. If you mean
| another language, there is Lua for scripting purposes and
| Julia for math stuff. JavaScript also exists, which despite
| its fame it is actually MUCH faster than Python, and you can
| even run it in a runtime such as bun.js and get even more out
| of it.
|
| IMO the biggest problem is that the Python libraries
| ecosystem is just too good to miss out on. I absolutely hate
| Python but even I have to admit it is much easier to do some
| stuff in Python compared to other languages simply because of
| the libraries available to do basically anything you could
| dream of.
| ravenstine wrote:
| Wouldn't surprise me if a subset of Python compiled to
| asm.js would run faster in V8 than cpython.
| JonChesterfield wrote:
| Probably doesn't have to be a subset. Call back to
| cpython for the really dynamic madness, much like a jit
| tiers back out of machine code when assumptions don't
| hold.
| bombolo wrote:
| But would this "subset of python" be actually useful?
|
| Once you break compatibility stuff doesn't work... and a
| fast python that can't run my software is less useful
| than a slow python that runs the software.
| dragonwriter wrote:
| > Wouldn't surprise me if a subset of Python compiled to
| asm.js would run faster in V8 than cpython.
|
| I'm certain that there exists _some_ subset of python for
| which this is true, but so what?
| biomcgary wrote:
| Python performance, both single core and concurrent, seem
| to be its primary Achilles heel. However, Python is
| unmatched for developer speed in compute heavy AI / ML
| since it uses C, etc., under the hood. I don't really like
| Python but it's impossible to avoid in data science because
| you don't have to re-invent the wheel.
|
| Several languages have persisted beyond their natural shelf
| life due to their library ecosystems. For a period of time,
| CPAN was one of the best arguments for using Perl. More
| recently, Java's ecosystem is also a pretty good reason to
| use Java right now, even if you don't like the language.
|
| Languages seem to thrive early due to initial ease of use
| (Python, particularly, but also relative uptake of Go vs
| Rust), but can wane due to a thousand cuts (or sigils in
| Perl's case).
| lylejantzi3rd wrote:
| > However, Python is unmatched for developer speed in
| compute heavy AI / ML since it uses C, etc., under the
| hood.
|
| John Carmack seems to think otherwise.
|
| > "That means that in the time that Python can perform a
| single FLOP, an A100 could have chewed through 9.75
| million FLOPS"
|
| https://twitter.com/id_aa_carmack/status/1503844580474687
| 493...
| cldellow wrote:
| Am I missing something?
|
| The OP's comment is that while Python's native
| interpreter is slow, Python code as a whole can still be
| fast since people will delegate the hot spots to non-
| Python libraries optimized for speed.
|
| To refute that, you linked to John Carmack quoting Horace
| He saying that Python's native interpreter does CPU-based
| math slower than a GPU. But...that's why people writing
| Python use libraries to delegate math-intensive work to a
| GPU, which is the OP's point.
| rahen wrote:
| That's the spot filled by Go, and previously Pascal. Native
| code with good efficiency while still being fairly easy and
| productive.
| pphysch wrote:
| Go and Python have a quite different set of strengths.
| Python heavily leverages operator overloading and other
| metaprogramming features that Go does not support (well).
|
| If you pick up Go expecting it to be a "more efficient
| Python", you will be sorely disappointed.
| bombolo wrote:
| In my experience go's libraries are generally not so high
| quality
| pphysch wrote:
| Hard to judge. In Python it's very easy to create nice
| porcelain APIs for powerful frameworks because you can
| hide complex behaviors behind operators. Imagine Pandas
| or Django without overloaded access operators.
|
| This is not the case in Go.
| kuang_eleven wrote:
| Because it honestly doesn't matter most of the time.
|
| In the majority of use cases, your runtime is dominated by
| I/O, and for the remaining use-cases, you either have low-
| level functions written in other languages wrapped in Python
| (numpy, etc.) or genuinely have a case Python is a terrible
| fit for (eg. low-level graphics programming or embedded).
|
| Why bother making a new variant language with limitation and
| no real benefit?
| Simran-B wrote:
| I would add high-volume parsing / text processing to the
| list of bad fits for Python, although I'm not sure if there
| are native extensions for the different use cases?
| lylejantzi3rd wrote:
| This perspective is a common one, but it lacks credibility.
|
| https://twitter.com/id_aa_carmack/status/150384458047468749
| 3...
| baq wrote:
| His code has had 16.6ms to execute since before a lot of
| people here had been born. Of course Python is hopeless
| in his domain. It's creator and development team will be
| the first to admit this.
| KyeRussell wrote:
| You've posted this multiple times in this thread, and not
| once has it been relevant to the point being made. You
| are sticking your fingers in your ears and deferring to a
| contextless tweet by a celebrity.
| pdonis wrote:
| John Carmack is hardly an unbiased source.
|
| In any case, if your program is waiting on network or
| file I/O, who cares whether the CPU could have executed
| one FLOP's worth of bytecode or 9.75 million FLOPs worth
| of native instructions in the meantime?
| cb321 wrote:
| You may be interested in https://en.wikipedia.org/wiki/Cython
| which was a fork of Pyrex which dates back to the very early
| noughties. It is basically as you describe and even has a way
| to "bypass the GIL" in extension modules. There is also an
| easy way to create a "compiled script" using `--embed`. This
| is essentially a gradually typed system [1] like the Common
| Lisp `declare` facility. Cython even has a warning system to
| tell you what you forgot to declare, and an annotated HTML
| generator to highlight "more Python C API heavy" portions of
| your code.
|
| Personally, I think going all the way to Nim [2] is more
| satisfying than a "gradually typed" system, though.
|
| [1] https://en.wikipedia.org/wiki/Gradual_typing
|
| [2] https://nim-lang.org/
| nnadams wrote:
| Nim [0] might be one similar example. Similar syntax as
| Python and trying to be just as extendable as Lisp.
|
| [0] https://nim-lang.org/
| tlavoie wrote:
| Also languages like OCaml, which while less known, are still
| getting improvements all the time. This was an interesting
| blog post series from a few years back:
| https://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
| re...
| spijdar wrote:
| There are several "close to python" languages like this,
| actually, I remember one popping up on HN semi-recently.
|
| If you expand into just "kind of Python-like", you have
| languages like Nim that approximately fill the hole of
| "efficient Python".
|
| My impression, though, is a lot of the popular
| frameworks/libraries for Python lean heavily on runtime
| reflection in Python (e.g. Django). So people end up stuck to
| the full Python interpreter/runtime.
| Simran-B wrote:
| I consider Ruby to be quite similar to Python. However,
| it's kind of like Python's weird brother, with optional
| parentheses for calls without arguments, auto-returning the
| result of the last expression, and some odd-ball syntaxes.
| It manages to be even less efficient, though.
| quesera wrote:
| As someone who probably has more experience with Ruby
| than you do, and probably less with Python, I'd
| characterize the relationship as inverted -- Ruby is more
| consistent than Python.
|
| In Ruby, parens are optional for all methods, and only
| required if you need to force non-default precedence.
|
| Returning the value of the last expression seems
| reasonable to me (just like a UNIX shell).
|
| In Python, I'm bothered by the little inconsistencies.
| Why is str.len (or length, or size, or count, etc) not a
| method? Why len(str) instead?
|
| And Python has plenty of oddball syntax: significant
| whitespace, triple quotes?? You get used to it of course.
|
| Ruby and Python are about equally efficient -- if you're
| optimizing for code efficiency, both are bad choices. In
| many domains, that's not a critical metric, so we talk
| about developer efficiency. I'll argue that developers
| are more efficient when they are more happy, and for me
| that selects Ruby by a mile over Python.
|
| But I won't pretend that it's more than a matter of
| taste. :)
| wood-porch wrote:
| I feel very similarly about Python vs. Ruby
| inconsistencies! I remember starting a job and used
| python for the first time coming from Ruby, and being
| extremely disappointed in the inconsistencies in the
| standard library and the sheer... lack of features in the
| Stdlib compared to Ruby. I've yet to use a language with
| such a magnificent stdlib and documentation. I main
| python nowadays, and I find the ecosystem and typing to
| be better than Ruby, but Ruby is the best language I've
| ever used for many reasons
| cb321 wrote:
| Nim has impressive compile-time reflection which with its
| syntax macro system can (some of the time) get you similar
| powers to Python's dynamic reflection. E.g.,
| https://github.com/c-blake/cligen instead of `autocommand`
| or `click` or `argh` or you name the automatic CLI
| generator, only with full C-like speed for your programs
| and a very low effort FFI to C. People have done web
| frameworks, too, but I do not know them well enough to
| compare to Django. Perhaps someone else could follow-up.
|
| Compile times are not the 25-50 ms of a Python interpreter
| startup, but they are similar to the 250-500 ms of a Go or
| D compile, at least for small light on metaprogramming
| loops kinds of programs, and with, say, the TinyC tcc
| backend.
|
| Of course, big frameworks take a long time to import (or to
| compile). So, your comparison mileage may vary. And once
| "ready to go" and finished a Nim program can start up in
| sub-millisecond time like any C program (esp. if statically
| linked).
| mrkeen wrote:
| There are plenty of statically-typed languages which are a
| bit further along than 'a good start'.
| [deleted]
| [deleted]
| pdonis wrote:
| You might take a look at Nim.
| [deleted]
| rr808 wrote:
| I really struggle with this because I'm a C++/Java dev who is
| 4/10 with Python. I dont understand why anyone would choose a
| single threaded language but its just everywhere I can't avoid
| it.
| baq wrote:
| If your problems need threads and raw CPU, you're doing well,
| since that obviously isn't Python's niche. The thing is, most
| problems aren't like that and if some parts are, Python
| probably has a C extension for it, so it doesn't matter nearly
| as much as you think.
| piyh wrote:
| Do you write a lot of multithreaded code that can't use a
| shared nothing model?
| vb-8448 wrote:
| CPU-bound tasks are limited to a single core due to GIL but,
| for sure, it's not single threaded language!
| synergy20 wrote:
| so is javascript, single threaded language. somehow these two
| are super popular.
|
| both can do async, both have libs for multithread workloads.
|
| I wonder how critical the language itself is as far as multi-
| threaded goes.
| kmac_ wrote:
| I'm C#/Java dev who works with Python and ML devs. I'm amazed
| how fast they can put things together and solve problems. We've
| embedded Jupyter to our product recently, and what I can tell,
| that's the way. Even the integration team started to use it to
| glue things together. I'm becoming a huge fan, it does the job.
| stevesimmons wrote:
| Python's been my main language for 22 years now.
|
| Whenever I think I should switch to faster, more efficient
| language (C#/Go/Rust/...), I am repeatedly blown away by what
| Python lets me do, with an incredibly ecosystem of packages, and
| all the ergonomics of incremental development from a REPL.
|
| Case in point is my current project: building a parser for PDF
| bank statements from all UK banks. I can match a 10page bank
| statement against a library of 100 templates, and then extract
| its data in a standardised format, in around 50ms per statement.
|
| That's broadly comparable to the time for load the source file
| from cloud storage.
|
| And that 50ms is processing pages sequentially, using a single
| core on my little laptop. Plenty of scope to parallelise if I
| wanted to.
|
| The reason why I'm still using Python, to borrow a slogan from
| Bruce Eckel long ago, "Python fits [my] brain".
| roflyear wrote:
| How are you matching a file to a template like that?
| shanebellone wrote:
| I have Python analytics capturing HTTP Headers, processing
| them, and finally storing them in less than 8ms on a single
| core dev instance. Python is plenty fast for many things.
|
| Also, I love the syntax. It reads like a sentence.
| Someone wrote:
| > in less than 8ms on a single core dev instance. Python is
| plenty fast for many things.
|
| What's that, nowadays? 25 million cycles, 75 million
| instructions?
|
| Python isn't plenty fast, modern hardware is.
| KyeRussell wrote:
| The person you're replying to is saying that Python meets
| their needs. Judging by what they've said about their use
| case, I doubt it's a meaningful affirm to their overall
| pipeline.
|
| Your counterpoint is that it'd be slower if it were running
| on a Pentium 4. Do you honestly think that they don't know
| this?
|
| Yes, this is precisely the point. Things are at a point
| where Python's benefits are worth more than what another
| language would bring in speed improvements. I imagine that
| there would be less of a benefit to busses if we had to
| Flintstone them everywhere instead using a combustion
| engine. That's not the current reality though.
| pdonis wrote:
| More precisely, modern CPUs are so much faster than network
| or file I/O that if your program has to wait on I/O, the
| CPU is going to sit idle most of the time anyway, so trying
| to optimize CPU time is pointless. The GP's use case is an
| example: he's capturing HTTP headers, so the bottleneck is
| network I/O, not CPU. So Python for his use case is, as he
| says, plenty fast.
|
| For use cases where CPU is the bottleneck, such as numeric
| simulations, Python users reach for something like numpy,
| where all the CPU intensive stuff is being done in C
| extensions, not Python bytecode.
| elabajaba wrote:
| My problem with Python, especially when reading someone else's
| code, or my own old bad code, is that I find that Python
| doesn't fit [my] brain.
|
| The lack of typing (in my experience you always end up with a
| bunch of libraries that don't have type hints) means I spend
| most of my time just figuring out what I'm supposed to pass to
| a function (I hate the django docs. Give me something like
| docs.rs where I can quickly scan what functions something
| implements, and see what they accept and what they return
| instead of forcing me to read an 800 page novel that STILL
| doesn't give you that basic info), and writing tests and trying
| to guard against passing the wrong thing around.
| dehrmann wrote:
| > The lack of typing
|
| It's gotten a lot better with type hints and typeshed, but I
| agree, not knowing what type an arg should be and hoping duck
| typing works out isn't great.
| stevesimmons wrote:
| In my own code, I always use typehints and write good
| docstrings (nearly) everywhere.
|
| Visual Studio Code's type checker saves so much time and has
| improved my code quality to no end. It is especially powerful
| on polymorphic inputs, making sure the different code paths
| operate on the input type you expect.
| shanebellone wrote:
| PyCharm is good for this too.
|
| I've recently started using Type Hints by default. At the
| very least, it makes your code more readable. Nothing bad
| comes from that.
| mharig wrote:
| I do not see the benefit of type hints. Good docstrings
| (and naming of the function and arguments) are superior, at
| least for the human. Type hints are too much clutter.
| baq wrote:
| There's a certain program size from which they start to
| make sense and further still the benefits are obvious -
| you have a computer to help solve puzzles for you, so why
| not use it if the puzzle is 'will this peg fit into that
| hole?' when you have thousands of pegs and holes.
| Ultimatt wrote:
| 50ms doesn't sound especially fast. If you had a million of
| them to process thats 4 hours, maybe an hour with some
| parallelism? A 100x speedup from going native then becomes
| quite welcome. Python has its place, but the fact you even
| think the tens of ms domain is "fast" or even fast enough on
| modern CPUs shows the real strength of Python, which is most
| people dont actually care at all about performance. Thats not
| to say its performant though. Just that no one cares anymore.
| Some rando boss is happy to wait 4 hours for your script to
| chug through a million statements, because no one actually told
| them it can take 2.5 minutes in a native language. If they did
| maybe the boss would suddenly speak about their dream to not
| only process the data but have a near real-time BI panel
| instead of a batch report so they can react within one business
| day. The issue with Python is missed value, not the value it
| can deliver.
| squeaky-clean wrote:
| Why limit parallelism to 4x? Spin up a ton of lambdas and get
| it done in 10 seconds.
|
| You're also forgetting that 50ms time to load each file from
| cloud storage. It's still 4 hours and 2.5 minutes with your
| native code compared to 8 hours with Python. Suddenly not
| such a massive improvement.
|
| Even then I don't see the issue with it taking 4 hours to do
| a million of them. Do you need to do a million per hour? Is
| it even likely they'll need to do a million of them total?
|
| Do bank statements even come in frequently enough to do a
| realtime dashboard with? I get my bank statement every 30
| days.
|
| How much longer does it take to develop the native version?
| How much longer does it take to modify when a bug is found or
| a bank changes their statement layout? How much more do you
| have to pay a native code engineer compared to a Python dev
| and how easy are they going to be to replace eventually?
| blindhippo wrote:
| One dimension to consider here is cost of compute.
|
| Going from 8 hours to 4 hours is a 50% reduction in the
| time to compute and we're assuming this is occurring on the
| same relative hardware/instance size.
|
| At scale, that could translate into hundreds of thousands
| of dollars in savings.
|
| But your points are relevant - as with anything related to
| development, "it depends" rules the day. There isn't a
| clear cut "x is objectively better than y" in general.
| stevesimmons wrote:
| > The issue with Python is missed value, not the value it can
| deliver.
|
| I'm very aware of the business value here! The limiting
| factor in these types of "messy real-world data" problems is
| the developer time to get 100+ templates right on all the
| different variations encountered in the wild. I can iterate
| on each template extremely effectively in a Jupyter notebook
| REPL, and immediately rerun a sample of 100 statements for
| that bank in a few seconds.
|
| While the total corpus of statements I have access to is
| actually around a million, no one cares how quick processing
| them all are if the extraction isn't reliable enough!
| aidos wrote:
| Exactly. The time spent developing a solution for your
| problem in pretty much any other language is going to cost
| magnitudes more than processing millions of documents in
| python. And that's ignoring the ongoing maintenance where
| the hackability of teasing data out of PDFs in ipython is
| going to top any other system.
|
| I have a soft spot for the work you're doing since I've
| spent a good portion of my life now extracting data from
| PDFs and can appreciate the joys of the process more than
| most.
| justinsaccount wrote:
| > If you had a million of them to process thats 4 hours
|
| If you had 20 bank accounts sending you monthly statements
| for 100 years you'd have 24,000 of them. 50ms each would take
| you 20 minutes to process the 100 year backlog.
|
| If you have a million of them to process then you're a bank
| or similar type of institution that can devote resources to
| this, either computational or developer time to optimize
| things. A 128 core c6a.32xlarge would turn the runtime from 4
| hours into 1.9 minutes and cost $5 to run for an hour.
| [deleted]
| lowbloodsugar wrote:
| Plenty of languages have a huge package ecosystem. Your problem
| is that you don't know them. I'm not saying the distinction is
| helpful for your situation, but your problem is "I have too
| much time invested in learning Pythons ecosystem" and not
| "other languages don't have all the packages I need." For REPL
| that's another mindset. You use a REPL to experiment and
| prototype, I use unit tests. You use python because you already
| learned it. Your brain has been fit to python, not the other
| way around. You can fit it to other languages if you choose. I
| am currently learning my 7th "use in production" language.
| stevesimmons wrote:
| You're partly right. However Python's sweet spot for me is
| when writing code to process complex, messy, poorly
| understood data. Using a Jupyter notebook lets me keep
| partially processed inputs as live objects, examine and
| change their in-memory representations, and quickly prototype
| the next stage of processing. It feels like sculpting with
| playdoh.
|
| Once that's working, the code moves from the Jupyter notebook
| into a proper code module with unit tests. Then the emphasis
| is on building the production system rather than manipulating
| the data.
|
| I get what you mean with unit tests. Just for me it feels
| further away from the data, which is where most the
| complexity in my business domain lies.
| lowbloodsugar wrote:
| Sure. I do use Jupyter in a decent IDE for some adhoc data
| diving or for a spike prototype before "downcoding" into a
| cluster.
| [deleted]
| MrVandemar wrote:
| >Case in point is my current project: building a parser for PDF
| bank statements from all UK banks. I can match a 10page bank
| statement against a library of 100 templates, and then extract
| its data in a standardised format, in around 50ms per
| statement.
|
| I'm interested in management of domestic documents and similar
| institutional correspondence, and I'd love to hear more about
| how you approached and solved your problems with the PDF bank
| statements.
|
| I have lots of bank statements ... and other documents like
| utility bills, rates invoices, etc ... and I'd love to just be
| able to feed them to a script and automatically tease out
| salient details.
| welder wrote:
| We ported a client-deployed command line program from Python to
| Go two years ago and couldn't be happier. It's solved all our
| problems around shipping code to customers in different
| environments. It greatly reduced our support workload.
|
| However, we still use Python for APIs and Web Apps. No language
| is better than another, some are just better for one solution
| more than another.
| bjourne wrote:
| It's the ecosystem. All languages have libraries for bottom-up
| parsing. But few are as well-documented, feature-full, and easy
| to use as SLY: https://sly.readthedocs.io/en/latest/ Many
| languages have libraries for ANSI color support and pretty
| printing, but no other language has Rich:
| https://rich.readthedocs.io/en/stable/index.html While other
| languages may have rudimentary datetime support, Python has
| (multiple!) libraries for converting between Hebrew, Hijri,
| Coptic, and Armenian dates.
| revskill wrote:
| I found joy in programming not by importing some libraries and
| use it.
|
| It's the manipulation of data that makes me enjoy programming.
|
| Python ? Not fun at all. Try list comprehension. You read it from
| right to left ?
|
| Or process data, not by left to right (pipeline), but in a messy
| order.
|
| One liner lambda ?
|
| Indenttion, self, pip hell.
|
| Too many small quirks which force me not to use Python for "fun".
|
| For serious stuffs ? Maybe python is the best choice now.
___________________________________________________________________
(page generated 2022-12-30 23:00 UTC)