[HN Gopher] HPy - A better C API for Python
___________________________________________________________________
HPy - A better C API for Python
Author : gjvc
Score : 228 points
Date : 2024-10-06 06:23 UTC (16 hours ago)
(HTM) web link (hpyproject.org)
(TXT) w3m dump (hpyproject.org)
| murkt wrote:
| Imagine how different the Python ecosystem could be, if this was
| done 20 years ago.
| lifthrasiir wrote:
| Unless it was done at the very beginning, I doubt it would have
| been even possible because the current C API is the remnant
| from that very first public version.
| foolfoolz wrote:
| python has one of the most fractured development ecosystems of
| any moderately used language. i'm pretty convinced python is a
| language that attracts poor development practices and magnifies
| them due to its flexibility. the people who love it don't
| understand the extreme flexibility makes it fragile at scale
| and are willing to put up with its annoyances in an almost
| stockholm syndrome way
| bvrmn wrote:
| The reason is a popularity not a technical one. It's
| inevitable to get a diverse interest to improve different
| parts of ecosystem by different parties.
| Const-me wrote:
| > a language that attracts poor development practices
|
| I agree, but note there's another way to frame it: "python
| can be used by people who aren't professional software
| developers".
| miohtama wrote:
| C/C++ is more fractured.
|
| While Python is fractured, it is nowhere near problems of C
| ecosystems.
| rbanffy wrote:
| As anyone who has tried to build multi-platform software
| with C or C++ can easily tell you.
|
| It's almost a relief AIX, Solaris, and HP/UX are either
| very niche, or going the way of the Dodo.
| _fizz_buzz_ wrote:
| It's also fractured because it has such a massive user base
| that use it for very different applications with very
| different priorities.
| Quothling wrote:
| I think any programming language with a lot of popularity
| attracts poor development practices. Simply because a lot of
| programmers don't actually know the underlying processes of
| what they build. The flip-side of this is that freedom and
| flexibility also gives you a lot of control. Yes, it's very
| easy to write bad Python. In fact it's probably one of
| Python's weaknesses as you point out. If you're going to
| iterate over a bunch of elements, you probably expect your
| language standard libraries to do it in an efficient way, and
| Python doesn't necessarily do that. What you gain by this
| flexibility (and arguably sometimes poor design) is that it's
| also possible to write really good Python and tailor it
| exactly to your needs. I think Python scales rather well in
| fact. Django is a good example, as it's a massive workhorse
| for a lot of the web (Instagram still uses their own version
| of it as one example). It does so sort of anonymously similar
| to how PHP and Ruby do it outside of the hype circle, but it
| does it.
|
| One of the advantages Python has, even when it's bad, is that
| it's often "good enough". 95% of the software which gets
| written is never really going to need to be extremely
| efficient. I would argue that in 2024 Go is actually the
| perfect combination of the good stuff from both Python and C.
| But those things aren't necessarily easy to get into if
| you're not familiar with something like memory management,
| (maybe strict typing?), explicit error handling and the
| differences between an interpreted and compiled language.
|
| Anyway I don't think Python is anymore annoying than any
| other language. The freedom it gives you needs to be reigned
| in and if you don't then you'll end up with a mess. A mess
| which is probably perfectly fine.
| gjvc wrote:
| psst "reined in" https://www.merriam-
| webster.com/grammar/do-you-rein-in-or-re...
| trkannr wrote:
| But CPython itself has poor development practices: For
| about 8 years those in the inner circle can modify anything
| and pose as experts while brutally squashing criticism.
| est wrote:
| > most fractured development ecosystems of any moderately
| used language
|
| Can you elaborate? What's done wrong with Python and right
| with other "moderately used language" ?
|
| For start, C/C++ doesn't even have an official ecosystem. For
| Java or Golang, it looks better only because the "ecosystem"
| does not always include native extensions like cgo or JNI.
| Once you add them the complexity were no better than Python's
| rwmj wrote:
| Python .pth files are horrific. Here's an actual .pth file
| I was dealing with the other day (from Google Cloud
| Storage) which completely prevents you from overriding the
| module using PYTHONPATH: import sys, types,
| os;has_mfs = sys.version_info > (3, 5);p =
| os.path.join(sys._getframe(1).f_locals['sitedir'],
| *('google',));importlib = has_mfs and
| __import__('importlib.util');has_mfs and
| __import__('importlib.machinery');m = has_mfs and
| sys.modules.setdefault('google', importlib.util.module_from
| _spec(importlib.machinery.PathFinder.find_spec('google',
| [os.path.dirname(p)])));m = m or
| sys.modules.setdefault('google',
| types.ModuleType('google'));mp = (m or []) and
| m.__dict__.setdefault('__path__',[]);(p not in mp) and
| mp.append(p)
| talideon wrote:
| If .pth files are the worst thing you can find to
| complain about, Python's doing pretty well. That horrific
| .pth file in question is better placed as the feet of its
| creators than the mechanism itself.
| rwmj wrote:
| The fact they considered allowing executable code in path
| lookups shows a certain attitude.
| oefrha wrote:
| It shows that the language is highly dynamic and you can
| patch anything? The .pth mechanism allows the party
| controlling the Python installation (site) to run some
| init code before any user code, basically an rc
| mechanism. Nothing more, nothing radical. Maybe you're
| unhappy with the dynamism, in which case your complaint
| is misplaced.
| rwmj wrote:
| In this case it prevents someone using PYTHONPATH to
| alter or override the order that modules are loaded. Hard
| to justify that.
| est wrote:
| I agree those particular .pth files were horrific.
|
| But python package made by Google were noturously bad.
| Its awefulness dates back to the GAE days.
| crabbone wrote:
| You have Anaconda packaging world vs PyPI. You have
| pyproject.toml for project management, which is not
| supported by Anaconda or the flagship documentation
| generation tool: Sphynx. You have half a dozen of package
| installers, none of them work to the full extent / all have
| different problems. You have plenty of ways to install
| Python, all of them suck. You have plenty of ways to do
| some common tasks, s.a. GUI, Web, automation: and all of
| them suck in different ways, w/o a hint of unifying link.
| Similarly, you have an, allegedly, common relational
| database interface, but most commonly used SQL bindings
| don't use it. And the list goes on.
| est wrote:
| > You have Anaconda packaging world vs PyPI
|
| As I said, it's only because .so extensions were hard. If
| every package were pure Python, I would simply copy paste
| them in my source code `lib` path.
|
| Don't laugh at me, this is called "vendoring" or "static
| linking" by other languages, and the "requests" famously
| included a version of urllib3 for quite a while
| crabbone wrote:
| Oh, but there's plenty more to Python packaging...
| unfortunately. You can put a ton of random stuff that's
| not Python modules into wheels: scripts, data, headers.
| Anaconda doesn't support most of that.
| Demiurge wrote:
| > You have Anaconda packaging world vs PyPI
|
| There is no fracture or "versus" here. You can pip
| install on top of Anaconda. Anaconda provides a more
| stringent solver and OS level packages that some pip
| level modules often depend on, it just solves the
| integration problem, but I use both, including
| requirements.txt in my Anaconda env.yml all the time.
|
| > You have pyproject.toml for project management, which
| is not supported by Anaconda or the flagship
| documentation generation tool: Sphynx.
|
| Again, Anaconda is not "standard" python thing, it is a
| replacement for build OS level packages, such as GDAL,
| which is a just a subset of Python modules. Anaconda does
| not need to support standard python tooling, because
| those python tools exist outside of Anaconda.
|
| To simplify, for every Anaconda package, you can likely
| find it in PyPI, but for every PyPI, you will not find it
| in for conda. Anaconda is not a competitor for PyPI, it
| does not need to replicate every PyPI feature.
|
| > You have plenty of ways to install Python, all of them
| suck.
|
| What does this actually mean? You install Python with all
| the major OS installation methods, and absolutely none of
| them suck, any more than installing anything on this OS
| does. The standard ways are Python Setup.exe, apt-get
| install, and brew install. Yes, you can additional
| options such as conda distros, yet what exactly sucks
| about them? Nothing.
|
| > You have plenty of ways to do some common tasks, s.a.
| GUI, Web, automation: and all of them suck in different
| ways, w/o a hint of unifying link.
|
| I think I'm starting to get it. Everything sucks if
| you've been around long enough. Django is vastly
| prevalent web framework. wx widgets is standard, and
| there are bindings for most GUI toolkits. There are many
| toolkits, is it Pythons fault they all got invented by
| different organizations? Is it an interpreted language
| responsiblity to provide a cross platform GUI toolkit for
| you?
|
| > Similarly, you have an, allegedly, common relational
| database interface, but most commonly used SQL bindings
| don't use it.
|
| What are you even talking about? Who in the world cares
| about this? People use database specific libraries, in
| every single language, because every database has its own
| set of features.
|
| > And the list goes on.
|
| Your list reeks of someone flinging critiques without
| even knowing what they're talking about--just a lot of
| hot air fueled by emotional baggage, likely from some
| long-dead language you once cherished before it was
| mercifully abandoned.
| crabbone wrote:
| > You can pip install on top of Anaconda.
|
| This is what people believe when they don't know how it
| works: no, you cannot. But this isn't even the point. The
| point is that you have different tools that have no
| interop between them, nothing in common at all: conda-
| build and setuptools (and there's plenty of half-
| implemented Python packaging tools that cannot package
| native extensions).
|
| > Again, Anaconda is not "standard" python thing
|
| Python doesn't have a standard _at all_. Nothing is
| standard about any aspect of Python outside of marginal
| stuff like floating point or XML etc. Anaconda is as
| legitimate as any other tool that works with Python. This
| is how it was intended. You probably wanted to say "not
| as popular as", which would be true, but also Anaconda is
| popular enough for this to be a problem.
|
| > which is a just a subset of Python modules.
|
| Are you sure you know what Anaconda is? You make the
| opposite impression...
|
| > Anaconda is not a competitor for PyPI
|
| Anaconda is a competitor of PyPI. It literally provides
| its own package index (this is what P and I stand for in
| PyPI).
|
| > Everything sucks if you've been around long enough.
|
| Python sucks. Let's not extrapolate this to other things.
| Marriages, for example, usually don't suck if they lasted
| long enough. I can think about few more things that get
| better with time.
|
| But, Python is not a good language by any metric. But
| it's also not unique in that aspect. So, idk why would
| you drive so much attention to this fact. Good languages
| are rare, good and popular -- I'm yet to find one.
|
| > Who in the world cares about this?
|
| Parent poster of the post you replied to. But, more
| broadly, common interfaces are important because they
| allow one to avoid vendor lock-in, lower maintenance
| cost, reduce the onboarding time for the new developers.
|
| > Your list reeks of someone flinging critiques without
| even knowing what they're talking about
|
| I don't care to name names. Python is garbage, and I
| never claimed otherwise. As for knowing my stuff... so
| far you seem to be that kind of guy. But, keep going.
| Sometimes the urge to argue may lead to you read about
| the subject of your argument.
| WhereIsTheTruth wrote:
| it's not 'fractured', it's just fragmented, and it's not
| necessarily a bad thing, it gives plenty of room for R&D and
| experimentation
|
| if something doesn't end up working well, you pivot
| jaimebuelta wrote:
| There are only two kinds of languages: the ones people
| complain about and the ones nobody uses.
| poincaredisk wrote:
| >the people who love it don't understand the extreme
| flexibility makes it fragile at scale and are willing to put
| up with its annoyances in an almost stockholm syndrome way
|
| The people who love it understand that its extreme
| flexibility makes it applicable everywhere, while academic
| purity mostly doesn't work in the real work. They also
| prioritize getting things done over petty squabbling, but
| they know how to leverage available tooling where reliability
| is crucial.
|
| (See, I can generalize too)
| redman25 wrote:
| Python with types enforced by CI isn't too bad. Or did you
| have something else in mind?
| analog31 wrote:
| Would some other language have become just as fragmented if
| it had gained the same level of popularity across such a
| broad range of user interests?
| slashdave wrote:
| Perl says "hi"
| amelius wrote:
| It would have taken time to do this and consequently Python
| would have missed the race and some other language would now be
| #1.
| murkt wrote:
| Python missed the race pretty heavily with 2to3 transition
| and still came out on top.
| amelius wrote:
| Survivorship bias. With version 2 they were already at the
| top.
| pkkm wrote:
| > Python would have missed the race
|
| Why do you think that? There's no need for a Python 2->3 like
| transition here, it could have been done while supporting the
| old C API for a while.
| rich_sasha wrote:
| Looks very cool.
|
| How many new extensions are written in C these days? I was under
| the impression it's mostly things like Boost Python, pybind or
| PyO3.
| aragilar wrote:
| There's also Cython.
|
| I would guess also that HPy would replace the includes of
| `Python.h` that pybind11 et al make in order to bind to
| CPython, and so existing extensions should be easier to port?
| masklinn wrote:
| PyO3 is bindings to the C API, so if you're using PyO3 you're
| still using the C API even if you're not actually writing C.
| rich_sasha wrote:
| Yeah, sure, I mean, how many people write C to write an end-
| user Python module. There's stuff that genuinely wraps C
| libraries or predates higher level language wrappers, like
| numpy or matplotlib, but how many new modules are actually
| themselves written in C?
| masklinn wrote:
| The point is that's not relevant, the issue is the API /
| ABI of the modules, its requirements, and its limitations,
| not the langage in which the modules are written.
| m_rcin wrote:
| for C++ 11+, pybind11 > Boost.Python
|
| for C++ 17+, nanobind > pybind11 (both created by the same
| developer)
|
| ">" meaning generally better, as described at
| https://nanobind.readthedocs.io/en/latest/why.html
| physicsguy wrote:
| Quite a lot, for things like simulation code
|
| Less so for general programming.
| trkannr wrote:
| A lot. You don't have to write in C, just use the C-API
| functions. pybind etc. introduce a whole new set of problems,
| with new version issues and decreased debug ability.
| koe123 wrote:
| Is my understanding correct that this would provide version
| agnostic python bindings? Currently, I am building a version of
| my bindings separately for each version (e.g. building and
| linking with python 3.7, 3.8, etc.). While automated, it still
| makes CI/CD take quite a long time.
| aragilar wrote:
| I believe so, but it would presumably depend on what features
| you use.
| gjvc wrote:
| _While automated, it still makes CI /CD take quite a long time_
|
| See about using ccache -- https://ccache.dev/
| IshKebab wrote:
| I wouldn't recommend ccache (or sccache) in CI unless you
| _really_ need it. They are not 100% reliable, and any time
| you save from caching will be more than lost debugging the
| weird failures you get when they go wrong.
| gjvc wrote:
| please provide evidence for this assertion.
| IshKebab wrote:
| Why are you so skeptical? Think about how it works and
| then you'll understand that cache invalidation bugs are
| completely inevitable. Hell, cache invalidation is
| notoriously difficult to get right even when you _aren
| 't_ building it on top of a complex tool that was never
| designed for aggressive caching.
|
| Just search the bugs for "hash":
|
| https://github.com/ccache/ccache/issues?q=is%3Aissue+hash
| +is...
| imtringued wrote:
| You can't cache based on the file contents alone. You
| will also need to cache based on all OS/compiler
| queries/variables/settings that the preprocessor depends
| on, since the header files might generate completely
| different content based on what ifdef gets triggered.
| mananaysiempre wrote:
| And that's not impossible, just tedious. One tricky (and
| often unimportant) part is negative dependencies--when
| the build depends on the fact that a header or library
| can _not_ be found in a particular directory on a search
| path (which happens all the time, if you think about it).
| As far as I know, no compilers will cooperate with you on
| this, so build systems that try to get this right have to
| trace the compiler's system calls to be sure (Tup does
| something like this) or completely control and hash
| absolutely everything that the compiler could possibly
| see (Nix and IIUC Bazel).
| zorgmonkey wrote:
| In C++ the __has_include preprocessor expression has been
| standardized since C++17, I'm not certain if C has
| standardized it yet though.
| mananaysiempre wrote:
| It's not about that, that's not relevant to ccache at
| all. (And yes, C23 does have __has_include, though not a
| lot of compilers have C23 yet.) It's about having
| potentially conflicting headers in the source file's
| directory, in your -I directories, and in your
| /usr/include directories.
|
| Suppose a previous compile correctly resolved <libfoo.h>
| to /usr/include/libfoo.h, and that file remains
| unchanged, but since that time you've installed a private
| build of libfoo such that a new compile would instead
| resolve that to ~/.local/include/libfoo.h. What you want
| is to record not just that your compile opened
| /usr/include/libfoo.h ("positive dependencies" you get
| with -MD et al.), but that it tried
| $GITHOME/include/libfoo.h, ~/.local/include/libfoo.h,
| etc. before that and failed ("negative dependencies"), so
| that if any of those appear later you can force a
| recompile.
| zorgmonkey wrote:
| Oh yeah that can cause lots of weird problems. I've run
| into that sort of issue a lot when cross-compiling, cause
| often then you might have a system copy of a library and
| a different version for the target, that can be a real
| pain.
| amelius wrote:
| Maybe run every build version in their own container?
| gjvc wrote:
| please read the documentation before dispensing
| uninformed advice like this -- it works using the output
| of the preprocessor and optionally, file paths
| kzrdude wrote:
| Cpython also has a limited stable abi and cp3X-abi3 wheels are
| compatible across multiple versions of Python.
|
| https://docs.python.org/3/c-api/stable.html
| mardifoufs wrote:
| But it is very limited. Understandably so, as they don't want
| to ossify the internal APIs, but it still is so limited that
| you can't actually build anything just using just that API as
| far as I know.
| kzrdude wrote:
| I checked now - polars is built using py abi3 wheels, and
| for me that means that you can build something substantial
| using the stable ABI! :)
|
| See https://pypi.org/project/polars/1.9.0/#files cp38-abi3
| wheels means they are compatible with cpython 3.8 or later.
| masklinn wrote:
| You can already build a single wheel as long as you only target
| cpython, if your needs fit with the limited / stable abi
| (abi3).
|
| While pypy and graal have API support they don't have abi /
| abi3 support, so they still have to be built on their own (and
| per version I think).
| filmor wrote:
| As others have said, this has been supported since the
| limited/stable APIs were introduced. What this adds is a way of
| implementing a Python extension that can be loaded in (not just
| compiled for, which is already an improvement!) different
| Python implementations, namely CPython, Pypy and GraalVM.
| Stem0037 wrote:
| It would be interesting to see benchmarks comparing HPy
| extensions to equivalent Cython/pybind11 implementations in terms
| of performance and development time.
| normanthreep wrote:
| tangentially related question: is there something as simple as
| luajit's ffi for python? as in: give it a c header, load the
| shared library, it simply makes structs usable and functions
| callable.
| lukego wrote:
| Yeah, cffi.
| nly wrote:
| cppyy does this for C++
| pkkm wrote:
| cffi is closest to what you described.
| actinium226 wrote:
| I'm a little unclear as to how this fits in with libraries like
| PyBind11 or nanobind? It seems like those libraries would need to
| be rewritten (or new libraries with the same goals created) in
| order to use this in the same way?
| gghoop wrote:
| I'm interested in calling go from python, gopy generates python
| bindings to cgo. Maybe HPy<->cgo would have less overhead.
| crabbone wrote:
| It's a no-go at this point, if you want this on MS Windows. CGo
| on MS Windows uses MinGW, while CPython uses MSVC. It's very
| hard to make this work due to name mangling.
|
| I.e. you can do this for Python from MSYS2, for example, but
| not for the one your users will likely have.
| masklinn wrote:
| Use IPC. Go wilfully set itself apart from and against the C
| ABI, it's generally not worth fighting against that.
| xiaodai wrote:
| Is this thing "official"?
| trkannr wrote:
| After cpyext and cffi, this is the third attempt, largely driven
| by PyPy people, to get a C-API that people want to use.
|
| If they succeed and keep the CPython "leaders" who ruined the
| development experience and social structure of CPython out of
| PyPy, PyPy might get interesting. If they don't keep them out,
| those "leaders" will merrily sink yet another project.
| filmor wrote:
| cffi replaces ctypes, which is a completely different thing.
| cpyext is a reimplementation of the Python C-API, so no attempt
| at improving the API.
|
| HPy on CPython uses the existing C-API under the hood, so there
| is zero need to build up some keep someone out...
| kagerl wrote:
| cffi is used to wrap c libraries. Only a masochist would use
| ctypes to wrap a whole library. While both are technically
| FFIs, it does not make sense to compare them. From a
| conceptual perspective, cffi was written to replace the C-API
| for C modules.
| VagabundoP wrote:
| Expand here on your use of double quotes and the subtext of
| your comment if you please.
| pkkm wrote:
| Very happy to see that these issues are getting attention now. I
| think that the Python language being so centered on one
| implementation is a long-term threat to its success. Web servers,
| command-line programs, and embedded devices have different
| requirements: high post-warmup throughput, fast startup, low
| memory usage. They aren't necessarily best served by the same
| implementation. If this project succeeds in replacing Python's C
| API with something that doesn't expose implementation details,
| such as whether the implementation uses reference counting, that
| could make it easier both to maintain alternative
| implementations, and to experiment with new techniques in
| CPython.
| ashvardanian wrote:
| Hey!
|
| First of all, cool to see some activity on this front!
|
| I've written a fair share of pure CPython bindings and regularly
| post about implementing them with minimal overhead
| (<https://ashvardanian.com/posts/discount-on-keyword-
| arguments...>) and would love to share a few recommendations,
| questions, and concerns :)
|
| Just a suggestion to help you grow--I'd restructure the landing
| page (<https://hpyproject.org/>) and the README of the repo
| (<https://github.com/hpyproject/hpy>). It could benefit from some
| examples to clarify the "Nicer API" bullet point. Maybe these
| could be taken from the API documentation page
| (<https://docs.hpyproject.org/en/latest/api.html>). The page
| could also be more convincing with some supporting stats in favor
| of PyPy, GraalPython, and other Python runtimes. A reader like me
| might not be sure if they have enough usage and are stable
| enough.
|
| Avoiding singletons and having encapsulated context objects like
| `HPyContext` is definitely a great thing to have, especially in
| the multi-threaded Python future or in complex environments with
| multiple sub-interpreters. But this doesn't really solve the
| problem if, under the hood, the `HPyContext` still redirects to
| CPython's singleton.
|
| I've also looked at the linked benchmarks
| (<https://pypy.org/posts/2019/12/hpy-kick-off-sprint-
| report-18...>). They are dated from 2019, five years ago, and
| already mention CPython's `METH_FASTCALL` fast calling
| convention, but it seems like they are not compared to it. In
| either case, parsing arguments from one "ll" string specifier is
| hardly a detailed benchmark if the underlying magic isn't
| explained. I occasionally do one-off benchmarks as well, but it's
| better to describe the principle--why the thing is supposed to be
| faster. For example, if you're concerned about performance, you'd
| just parse the arguments directly from the tuple without string
| formatters--like this: <https://github.com/ashvar
| danian/SimSIMD/blob/80cc4bcaddbdee9a0c0e991e13376c234aff3b3f/pyth
| on/lib.c#L929-L1066>
|
| It's more error-prone, but it would be cool to see if a high-
| level solution can achieve under a 10% latency penalty.
|
| Hope this is useful :)
| fforflo wrote:
| Something I don't see being mentioned in the comments: What's a
| really frustrating part of working with the C API? Setting up the
| compile/link flags!
|
| The python3-config works generally well but is only available at
| the OS level. But you don't want to mess with that (e.g., to
| access pip-installed packages). Beyond that, everything is a
| mess! python3 -m venv doesn't even bother creating such a script.
| anaconda/miniconda? Don't even try!.
|
| So every package pollutes their build scripts with many hardcoded
| `python3 -c "import sys: print..."` calls.
|
| I've opened a CPython/PR that may help a bit by adding `python3
| -m sysconfig --json` flag [0]
|
| [0] https://github.com/python/cpython/pull/123318
___________________________________________________________________
(page generated 2024-10-06 23:00 UTC)