[HN Gopher] Python is 1.3x faster by just adjusting some compili...
       ___________________________________________________________________
        
       Python is 1.3x faster by just adjusting some compiling options for
       libpython
        
       Author : chx
       Score  : 72 points
       Date   : 2021-06-12 19:52 UTC (3 hours ago)
        
 (HTM) web link (www.facebook.com)
 (TXT) w3m dump (www.facebook.com)
        
       | jchw wrote:
       | There's nothing wrong with this post factually, but the tone
       | sucks. It has an immensely combative energy for what is not
       | really a charged subject matter.
       | 
       | Like sure. Today, a lot of the historical reasons for things seem
       | silly and irrelevant. At one point, they did not seem silly and
       | irrelevant. For compatibility with stuff sticking around from
       | those days, we get some performance penalties that are not
       | strictly necessary. I don't think anyone is doing that to be an
       | asshole, so the oddly antagonistic tone seems unjustified.
       | 
       | And yes, Windows with a module-level namespace is cleaner in this
       | regard, but Windows design is entirely different and has plenty
       | of its own skeletons. ELF does not, to me, feel significantly
       | more horrible than PE. And I'm not speaking from inexperience; I
       | did at least write a couple of ELF and PE parsing softwares over
       | time, most recently go-winloader[1].
       | 
       | Do we need to override symbols in the same library? Probably
       | not... _kind of_. Your modules may in fact not need this.
       | However, libc probably does. Take a look at what symbols
       | libpthread exports on your system some time.
       | 
       | I hate to be the person to point this out, but please consider
       | not approaching subjects from this position. It feels alienating,
       | and I have no idea why it's necessary to have such a tone.
       | 
       | [1]: https://github.com/jchv/go-winloader
        
         | ineedasername wrote:
         | Agreed. I was trying to find the words for what I was so put
         | off by the article, but you nailed it. The tone made me want to
         | disagree with it just by default. Luckily I 1) recognize that I
         | am not qualified to have an opinion on the technical details
         | and 2) Ruthlessly crush instinctual responses until I've
         | thought them through with less emotion. (most of the time...
         | I'm not robot, or perfect)
         | 
         | Someone in a sibling threat said it's not bad to write like
         | that is for catharsis... I guess to blow off steam or
         | something. But if the method of blowing off steam is belittling
         | other smart people that don't always make perfect decisions
         | then it's probably not a great way to go. If you need to write
         | it for catharsis, go for it, but there's no need to publish it.
         | 
         | Otherwise, my questions on the technical side: Would this
         | performance hit and the alternative option have been obvious at
         | the time? If so, was there a reasonable trade off for why this
         | approach was taken? Or was this choice only wrong in
         | retrospect?
        
         | fpgaminer wrote:
         | Not OP, but I read the "antagonistic" style of the post as just
         | the usual catharsis humor. All in-jest. I've used that style of
         | writing plenty before. It's a good way to blow off the steam of
         | working with these rather absurd, archaic systems that we have
         | to tackle on a daily basis. Programming can feel a bit
         | kafkaesque at times, so a bit of aggressive/dark humor goes a
         | long way.
         | 
         | But I do agree, it felt too thick. Still a very interesting
         | topic regardless.
        
         | zitterbewegung wrote:
         | It actually seems to miss a few points. (I also agree that the
         | post has not enough levity to balance out the negative tone).
         | 
         | 1. PEP 445 makes the use case of LD_PRELOAD irrelevant.
         | 
         | 2. A change like this would go under obvious code review and
         | testing to make it into a released version.
         | 
         | 3. The risk of a regression would still exist but that can
         | either be caught by #2 or the existing unit testing already in
         | Python.
         | 
         | (Disclaimer: I have contributed to the Python codebase)
        
         | Lammy wrote:
         | > It has an immensely combative energy for what is not really a
         | charged subject matter.
         | 
         | It becomes a charged subject matter when one works at companies
         | like Google and Facebook and gets used to navigating
         | performance reviews.
        
           | ineedasername wrote:
           | Would this tone of expression be appropriate in navigating
           | performance reviews? I mean the question honestly: My own
           | answer is "no", but I don't know the culture of performance
           | reviews at companies like that.
        
           | jchw wrote:
           | This is interesting. Not saying you are incorrect, but, I
           | have worked at Google for a few years and didn't pick up on
           | this, most people seem abundantly polite. But, I can just as
           | easily chalk that up to limited experience, since there is
           | clearly quite a lot of different things going on in any large
           | company.
        
         | CalChris wrote:
         | The OP was writing about a 29 year old design decision, and he
         | wasn't writing about a person. Design decisions don't have
         | feelings. I found his no holds barred clarity about something
         | as obscure as dynamic linking namespaces made for an easier if
         | still not easy read.
         | 
         | But that said, I don't think dynamic linking is in the ELF
         | spec. I believe that's a _de facto_ OS + dev tools thing rather
         | than an ELF spec _de jure_ thing. His points are still valid.
        
           | jordigh wrote:
           | > I found his no holds barred clarity
           | 
           | Being right is no excuse to being an asshole.
           | 
           | The attitude will appeal to some. It will strike many others
           | in the wrong way and put them on the defensive.
           | 
           | There's no reason to write this way. A concise, well-
           | articulated, non-combative post will appeal to everyone and
           | still convey the same information.
        
       | derefr wrote:
       | Doesn't gVisor require symbol interposition to do its sandboxing
       | thing? (At least, for binaries with static-linked runtimes, like
       | the type Golang produces by default.)
        
         | dathinab wrote:
         | If you have a fully static-linked library you already don't
         | have symbol interposition.
         | 
         | Furthermore this options still allow the thinks you need
         | interposition for, for calls from/to external dynamic linked
         | libraries like libc.
         | 
         | But most important gVisor is based around intercepting system
         | calls (over simplified), for which you don't need symbol
         | interposition.
        
         | falldmg wrote:
         | Symbol interposition? I don't know for sure, but I would guess
         | gVisor is using ptrace or another mechanism, to interpose on
         | syscalls, not library calls. But these flags, I believe, only
         | impact interposition of symbols in the same library, so even if
         | gVisor did use interposition for something, it may not matter.
        
       | codelord wrote:
       | I would just read the linked post:
       | 
       | https://bugs.python.org/issue38980?fbclid=IwAR0cyfahpBywNzbq...
       | 
       | As it contains almost the same info without the rant and with
       | better explanation.
        
       | kevingadd wrote:
       | Anyone got an archive link? I can't read this without making a
       | Facebook account and signing in
        
         | eptcyka wrote:
         | I read it without signing in. The "Not now" link is greyed out
         | and 4 points smaller and not a button. But it's there.
        
           | ineedasername wrote:
           | I did the first time around, but then I closed the tab, and
           | when I wanted to go back to look at something in more detail
           | I was blocked unless I signed in. Luckily someone posted the
           | full text in another comment.
        
           | mct wrote:
           | I'm seeing "You must log in to continue," with no "not now"
           | option.
        
             | qwertox wrote:
             | Probably expects JavaScript enabled or something. I also
             | don't have a "not now" option. No JavaScript, no CSS.
             | 
             | Honestly, how can someone into tech post something like
             | this on FB?
        
               | OJFord wrote:
               | Leaving aside whether or not they should want to post it
               | there, I'm surprised it has an audience.
               | 
               | Someone saw it and shared it to HN; enough read it to
               | upvote it this much.. maybe Facebook's more popular than
               | I thought! (That sounds silly or sarcastic, but 'among HN
               | users and similar' I'm serious.)
        
       | cerved wrote:
       | Right post, wrong platform
        
       | stereo wrote:
       | Text if you don't want to visit Facebook:
       | 
       | Summary: Python is 1.3x faster when compiled in a way that re-
       | examines shitty technical decisions from the 1990s. ELF is the
       | executable and shared library format on Linux and other Unixy
       | systems. It comes to us from 1992's Solaris 2.0, from back before
       | even the first season of the X-Files aired. ELF files (like
       | X-Files) are full of barely-understood horrors described only in
       | dusty old documents that nobody reads. If you don't know anything
       | about symbol visibility, semantic interposition, relocations, the
       | PLT, and the GOT, ELF will eat your program's performance.
       | (Granted, that's better than being eaten by some monster from a
       | secret underground government base.)
       | 
       | ELF kills performance because it tries too hard to make the new-
       | in-1992 world of dynamic linking look and act like the old world
       | of static linking. ELF goes to tremendous lengths to make sure
       | that every reference to a function or a variable throughout a
       | process refers to the same function or variable no matter what
       | shared library contains each reference. Everything is consistent.
       | 
       | This approach is clean, elegant, and wrong: the cost of
       | maintaining this ridiculous bijection between symbol name and
       | symbol address is that each reference to a function or variable
       | needs to go through a table of pointers that the dynamic linker
       | maintains --- even when the reference is one function in a shared
       | library calling another function in the same shared library. Yes,
       | `mylibrary_foo()` in `libmylibrary.so` has to pay for the
       | equivalent of a virtual function call every time it calls
       | `mylibrary_bar()` just in case some other shared library loaded
       | earlier happened to provide a different `mylibrary_bar()`. That
       | basically never happens. (Weak symbols are an exception, but
       | that's a subject for a different rant.)
       | 
       | (Windows took a different approach and got it right. In Windows,
       | it's okay for multiple DLLs to provide the same symbol, and
       | there's no sad and desperate effort to pretend that a single
       | namespace is still cool.)
       | 
       | There's basically one case where anyone actually relies on this
       | ELF table lookup stuff (called "interposition"): `LD_PRELOAD`.
       | `LD_PRELOAD` lets you provide your own implementation of any
       | function in a program by pre-loading a shared library containing
       | that function before a program starts. If your `LD_PRELOAD`ed
       | library provides a `mylibrary_bar()`, the ELF table lookup goo
       | will make sure that `mylibrary_foo()` calls your `LD_PRELOAD`ed
       | `mylibrary_bar()` instead of the one in your program. It's nice
       | and dynamic, right? In exchange for every program on earth being
       | massively slower than it has to be all the time, you, programmer,
       | can replace `mylibrary_bar()` with `printf("XXX calling bar!!!")`
       | by setting an environment variable. Good trade-off, right?
       | 
       | LOL. There is no trade-off. You don't get to choose between
       | performance and flexibility. You don't get to choose one. You get
       | to choose zero things. Interposition has been broken for years: a
       | certain non-GNU upstart compiler starting with "c" has been
       | committing the unforgivable sin of optimizing calls between
       | functions in the same shared library. Clang will inline that call
       | from `mylibrary_foo()` to `mylibrary_bar()`, ELF be damned, and
       | it's right to do so, because interposition is ridiculous and
       | stupid and optimizes for c00l l1inker tr1ckz over the things
       | people buy computers to actually do --- like render 314341 layers
       | of nested iframe.
       | 
       | Still, this Clang thing does mean that `LD_PRELOAD` interposition
       | no longer affects _all_ calls, because with Clang, contra the
       | specification, will inline some calls to functions not marked
       | inline --- which breaks some people 's c00l l1inker tr1ckz . But
       | we're all still paying the cost of PLT calls and GOT lookups
       | anyway, all to support a feature (`LD_PRELOAD`) that doesn't even
       | work reliably anymore, because, well, why change the defaults?
       | 
       | Eventually, someone working on Python (ironically, of all things)
       | noticed this waste of good performance. "Let's tell the compiler
       | to do what Clang does accidentally, but all the time, and on
       | purpose". Python got 30% faster without having to touch a single
       | line of code in the Python interpreter.
       | 
       | (This state of affairs is clearly evidence in favor of the
       | software industry's assessment of its own intellectual prowess
       | and justifies software people randomly commenting on things
       | outside their alleged expertise.)
       | 
       | All programs should be built with `-Bsymbolic` and `-fno-
       | semantic-interposition`. All symbols should be hidden by default.
       | `LD_PRELOAD` still works in this mode, but only for calls
       | _between_ shared libraries, not calls _inside_ shared libraries.
       | One day, I hope as a profession we learn to change the default
       | settings on our tools.
        
         | qwertox wrote:
         | Thank you. This link asked me to sign in in a very broken page
         | (I block most of Facebook's domains), and am wondering if this
         | is just someone who posted it on FB or if it is a post from the
         | engineering team at FB.
        
           | aioprisan wrote:
           | Just someone posting on FB
        
         | rocqua wrote:
         | Not sure this is ok copyright wise.
        
           | ineedasername wrote:
           | I'm not sure Facebook's privacy intrusions are ok ethically
           | wise. So there's competing value systems at work.
        
         | dathinab wrote:
         | This has interesting parallels with how some languages include
         | the library version in the "symbolic name" (mangled name, fully
         | qualified name etc).
         | 
         | This often allows loading of multiple versions of the same
         | dependency in the same program without ugly hacks. Which is
         | grate if you have multiple dependencies which both have the
         | same sub-dependency (each internal only to their dependent) but
         | need different versions.
         | 
         | It's kinda a nightmare if you run into this problem in
         | languages which don't support it.
        
       | Scaevolus wrote:
       | This is true for _libpython_ (the shared library version), which
       | is the default on some distros (RedHat, Fedora, Arch), but many
       | others (Debian, Ubuntu) use statically linked Python and never
       | paid this performance tax.
        
         | dheera wrote:
         | I think using Pypy instead of CPython will give you several
         | times the performance boost as any of this.
        
           | dharmab wrote:
           | Pypy is not a drop-in replacement for CPython. It does not
           | support many libraries that rely on C extensions, and targets
           | a slightly older version of the language.
        
           | dathinab wrote:
           | Likely, but it doesn't work with all applications.
        
       | th0ma5 wrote:
       | Not entirely related but I recently started playing with the
       | built in "dis" library and it is fun to see the compiled
       | representation of functions that the runtime executes. Just an
       | FYI if you're ever bored and are looking to get more familiar
       | with assembly, it is a very approachable thing to play with.
        
       | BurningFrog wrote:
       | I'm confused.
       | 
       | Is this about something I can do to speed up our 3.8 Python code,
       | or about why Python 3.8 is faster than 3.7?
        
         | ineedasername wrote:
         | I think it was addressed by python already:
         | 
         |  _Eventually, someone working on Python (ironically, of all
         | things) noticed this waste of good performance_
         | 
         | But It would be good to know when & what versions.
         | 
         | I'm also not sure why this is "ironic". Who else but the
         | experts on python would be more likely to discover this &
         | resolve the issue? Which basically makes the whole thing a non-
         | issue:
         | 
         | Python creators made a choice when creating python. A while
         | later they realized they could improve performance by
         | revisiting that choice.
         | 
         | The tone of the article makes it sound like this was an
         | embarrassing mistake of massive proportions.
        
       | kzrdude wrote:
       | Completely beside the article - the first specializing, adaptive
       | interpreter (PEP 659) improvements have been merged to CPython
       | these last weeks, and hopefully we can see updates about
       | benchmarks and performance sooner or later.
        
       | geofft wrote:
       | Related: https://developers.redhat.com/blog/2020/06/25/red-hat-
       | enterp...
       | 
       | > _This article focuses on one specific performance improvement
       | in the python38 package. As we 'll explain, Python 3.8 is built
       | with the GNU Compiler Collection (GCC)'s -fno-semantic-
       | interposition flag. Enabling this flag disables semantic
       | interposition, which can increase run speed by as much as 30%._
       | 
       | (not logged in to FB, so maybe TFA is a reference to this one?)
        
       ___________________________________________________________________
       (page generated 2021-06-12 23:00 UTC)