[HN Gopher] Introducing pgzx: create PostgreSQL extensions using...
___________________________________________________________________
Introducing pgzx: create PostgreSQL extensions using Zig
Author : eatonphil
Score : 107 points
Date : 2024-03-21 14:39 UTC (8 hours ago)
(HTM) web link (xata.io)
(TXT) w3m dump (xata.io)
| canadiantim wrote:
| very cool. More amazing work from xata. Special shoutout to
| pgroll too!
| michelpp wrote:
| This looks very cool and it's nice to see some finicky details
| like PG_GETARG abstracted away.
|
| Are there any plans to make pgzx a Trusted Language so that Zig
| functions could be written with CREATE FUNCTION? Rust recently
| achieved this milestone and AWS RDS now supports PL/Rust.
|
| Being a trusted language would mean it could be used to create
| Trusted Language Extensions (TLEs) and could be installed on
| cloud hosted databases without filesystem access.
| tudorg wrote:
| It would be amazing if we could could use Zig for TLEs, and
| we'll look into it, but I worry it might not be possible
| because Zig can't guarantee memory safety in the same way that
| Rust can.
| zombodb wrote:
| If you build it, they'll come. The only limit is yourself!
| tudorg wrote:
| I understood that reference, zombo :)
| zombodb wrote:
| Y'all have built something cool. I'm excited to see where
| it goes from here.
|
| If there's anything the pgrx team can do to help, just
| let us know.
| bear9628 wrote:
| No TLE support is a not a goal. Languages/extensions in
| Postgres should only be marked as trusted if you can guarantee
| that they do not access Postgres internals or even the file
| filesystem. With Zig and in pgzx in particular we can not
| guarantee this to be the case.
|
| With pgzx we provide you a number of utilities on top of the C
| API. But depending on the extension you are working on you
| might heavily use the Postgres internals and their C APIs as
| is.
|
| Postgres has support for a number of languages. Some having
| their own strenghts or weaknesses. One strength of Zig is its
| closeness to C, while providing a many quality of live
| improvements over C (optional types instead of null pointer,
| generic data structures, ...). This makes it a formidable
| choice for extensions that want to integrate deeply with the
| Postgres internals.
| anarazel wrote:
| If I suddenly had unlimited resources and attention, I'd
| really like to provide infrastructure to write trusted
| functions by running them via a webassembly runtime, with a
| small injected safe surface for accessing postgres
| functionality. That'd make it a lot easier to write trusted
| PLs in a variety of languages. Still probably needs a bit of
| per-language sugar on top, but a lot less than doing it from
| scratch.
|
| Edit: Minor grammar fix
| bear9628 wrote:
| Hehe :)
|
| I also have WASM in mind when thinking about future TLE
| support :)
| michelpp wrote:
| I don't know much about it, but I think this has been
| tried?
|
| https://dylibso.com/blog/pg-extism/
|
| Looking at the repo though it looks like it hasn't been
| updated in a while.
| nilslice wrote:
| we're actually working on a new version of this, which
| _does_ support TLE! if you're interested, please join
| https://extism.org/discord and we can get you a preview.
| bear9628 wrote:
| Oh wow, that is really cool.
| shay_ker wrote:
| I have a dumb, and likely annoying, question: why isn't Rust the
| clear choice? I think it has to do with Zig's "interop with C"
| but this is very abstract for me. Is there a code example that
| shows something that's super easy in Zig but a PITA in Rust?
|
| I've heard that Rust's types and compiler can be frustrating to
| deal with as well. Is Zig a more "joyful" programming language?
| minimaxir wrote:
| There is a separate project to build Postgres extensions with
| Rust: https://github.com/pgcentralfoundation/pgrx
| nextaccountic wrote:
| And lots of interesting extensions use it, like
|
| https://github.com/tembo-io/pgmq
|
| https://github.com/zombodb/zombodb
|
| https://github.com/supabase/pg_jsonschema
| giovannibonetti wrote:
| I've heard that when working with complex data structures with
| Rust you often need to write unsafe code, which is a lot more
| painful to work with in Rust. Once in you are in unsafe land
| anyway, some people say Zig is more convenient.
| ccleve wrote:
| I haven't worked with pgzx, but it's possible that memory
| management is easier. Zig uses memory arenas, and so does
| Postgres. If one can map directly to the other, it would be a
| huge win.
|
| With Rust/pgrx (which I have used, extensively) memory
| integration is more difficult. There's pg memory, and there's
| Rust memory, and they're not the same, and you have to play
| games to pass data between them. Lifetime issues crop up. Rust
| might be able to solve the problem in the future with custom
| allocators, but it's just not there yet.
| bear9628 wrote:
| > I haven't worked with pgzx, but it's possible that memory
| management is easier. Zig uses memory arenas, and so does
| Postgres. If one can map directly to the other, it would be a
| huge win.
|
| Yes. This was indeed a great motivator for using Zig. It was
| quite easy to integrate Zig with the Postgres memory
| management. This way we can use the Zig standard library or
| other Zig libraries without a second thought. Another
| advantage is that in Postgres memory context cleanup and
| error handling are somewhat well integrated with each other.
| This gives us some peace of mind as almost any Postgres
| function you might want to use in your extension is likely to
| raise an exception.
| tudorg wrote:
| It's for sure the question that a lot of people have. Initially
| we had more details about that in the blog post, but we didn't
| want it to become a "Zig vs Rust" blog post, so we kept it to a
| minimum.
|
| I will expand just a little bit more here:
|
| First, I think the fact that Rust can be a trusted language for
| Postgres is a huge advantage, and I am excited about it! I hope
| we will have the chance to use it and contribute to pgrx as
| well.
|
| Postgres is not only written in C, but it has developed its own
| particular style of C code. For example: arena memory allocator
| (memory contexts), exceptions via setjmp/longjmp (ereport),
| single-threaded communicating via shared memory, SPI for
| accessing the data via SQL, etc. Some of these mechanisms kind
| of conflict with the way Rust likes to do things.
|
| Because of the above, pgrx has to do harder work to expose
| functionality. It's possible, just a lot of work. In Zig, we
| can use pretty much anything directly or with small wrappers.
|
| If you need to call into C a lot, you need to use unsafe, and
| that negates some of the memory safety advantages of Rust.
| Also, Rust unsafe code is harder to write.
| weinzierl wrote:
| Here is talk and slides about how to create postgres extensions
| in Rust.
|
| https://rustlab.it/talks/teaching-an-old-dog-new-tricks-exte...
| jedisct1 wrote:
| PostgreSQL is such an amazing piece of software, that keeps being
| more awesome release after release. It has almost become a full-
| featured operating system at that point.
|
| And the ability to write extensions in Zig is very nice.
| slt2021 wrote:
| Postgres creator Stonebraker is working on an actual operating
| system to run database workloads :)
|
| https://en.wikipedia.org/wiki/DBOS
| nop_slide wrote:
| Tangential, first time bumping into Xata. What is the difference
| between Xata and Neon?
|
| Their site says
|
| > Xata is the only serverless data platform for PostgreSQL.
|
| I thought Neon was serverless postgres too?
|
| Edit: This looks really dope actually, going to play around with
| it later. Really dig the free tier and the built in search from
| the get go.
| tudorg wrote:
| We're referring more to the "data platform" part. What we mean
| by a "Postgres data platform" (and arguably it's not an widely
| accepted term, so take it with a grain of salt) is that you get
| not only Postgres, but secondary data stores around it, more
| tightly integrated than in a typical cloud provider:
|
| - Elasticsearch, to which we replicate data automatically via
| CDC
|
| - Blob storage for file attachments, we use S3 to store the
| files and Cloudflare to cache them on CDN
|
| - (in the future) Caching
|
| - (in the future) Time-series databases, again with automatic
| replication from Postgres
|
| That said, you can use just the Postgres part, and we are very
| happy if you do.
| nop_slide wrote:
| Edited my comment above, but this looks really rad. I made an
| account and going to try porting a side project DB to it
| later!
|
| Particularly we need to improve our search and I've been lazy
| about tweaking our ts_vector setup.
| nextaccountic wrote:
| > Elasticsearch
|
| Do you use zombodb?
|
| https://github.com/zombodb/zombodb
| bear9628 wrote:
| You can get an overview of the platform in this Post:
| https://xata.io/blog/serverless-postgres-platform#the-
| platfo...
|
| Zombodb is a really cool project, but no, we don't use it.
| We use logical replication and triggers to also capture
| schema changes with your records into an event stream. The
| event stream is used to send your data to Elasticsearch
| (and create/update the index). See:
| https://xata.io/blog/serverless-postgres-platform#logical-
| re...
|
| Stay tuned, we are planning to open source this component
| as well.
| tristan957 wrote:
| As far as Postgres goes, Xata is just Amazon Aurora under the
| hood if I read their last blog post right.
|
| (I work at Neon)
| anarazel wrote:
| It's completely besides the point, but part of the complexity of
| the char_count example is just due to the C version for some
| reason copying the input strings into newly allocated memory,
| which the zig version doesn't :)
| tudorg wrote:
| This is fair, and it's a bit embarrassing but I copied the C
| version from an older tutorial without looking at it critical
| enough. Still, the Zig version gets the arguments as normal
| function parameters and doesn't need to call `PG_GETARG_*`.
| e12e wrote:
| I notice that the c version looks like it uses memcopy as a
| convoluted typecast from a pointer to _text_ (presumably a
| postgres typedef?) to a pointer to _char_ (null terminated?)?
| Could this actually have a purpose?
|
| Ed: I don't think PG uses c strings? Looks like memcopy might
| be needed for call-by-reference, return-by-value - but not
| really in the simple function that only walks the array?
|
| https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C.
| ..
| anarazel wrote:
| I wonder what things you'd like to see exposed by postgres to
| make something like this easier.
|
| E.g. you have code to maintain the module magic and function
| info, for both of which there are some plans to change the
| contents in the next few major versions. If we were to provide
| zig consumable ways of creating the struct from C headers, would
| that help?
| bear9628 wrote:
| One of the reasons why we are using Zig is that we can consume
| the C API mostly as is.
|
| When importing the C headers Zig translates the C headers into
| Zig declarations. Unfortunately this is not always possible for
| C macros, and the reason why we have to maintain those structs.
| And this is where we have to step in with Custom Zig code. But
| most of the time we actually consume the C APIs as is.
|
| > If we were to provide zig consumable ways of creating the
| struct from C headers, would that help?
|
| Yes, that would be awesome. I'm curious how they will look like
| in the future.
|
| We have had the most difficulties with the module magic,
| function info, and varatt/Datum macros. Fortunately you have to
| solve the module magic/function info "only once". The Datum
| conversion and VARATT macros are more troublesome. We have some
| conversion support for a number of common zig types. But
| ideally we would like users to be able to use the C APIs as is,
| while we provide some type directed default conversions for
| convenience.
|
| The main problems we've been facing with C macro translations
| are type conversions/casts in macros, especially if the
| underyling struct heavily uses union (for example VARATT
| macros). In some cases translating inline functions instead of
| macros might work better, due to the translator having more
| type information available. We fixed some of the translations
| manually. You can find them in varatt.zig and datum.zig (where
| we opted to implement the text to cstring translation
| ourselves).
|
| Data structures like lists, slist, dlist, hash tables are quite
| consumable as is. We have some typed wrappers for those and
| provide iterators. Macros with control flow can not be reused,
| but I think this is fair, especially as the foreach macros are
| a very common C patterns. All in all we have had no troubles
| with them.
| anarazel wrote:
| Hi,
|
| > > If we were to provide zig consumable ways of creating the
| struct from C headers, would that help?
|
| > Yes, that would be awesome.
|
| I don't know much about zig and won't have a whole lot of
| time to learn - but if you can outline what is required to
| make C macros [un]usable, it might be possible to improve
| something. Either on its own, or as part of future work.
|
| > I'm curious how they will look like in the future.
|
| There's quite a few things.
|
| For one, I'd like to introduce a faster function calling
| infrastructure for the simple cases (mainly small number of
| arguments, without SRF support). That'll need to be declared
| in the function info struct.
|
| For another, eventually I want to support a different
| encoding for variable length types. Including making it
| reasonably efficient to have variable-length integers.
|
| > We have had the most difficulties with the module magic,
| function info, and varatt/Datum macros. Fortunately you have
| to solve the module magic/function info "only once". The
| Datum conversion and VARATT macros are more troublesome. We
| have some conversion support for a number of common zig
| types. But ideally we would like users to be able to use the
| C APIs as is, while we provide some type directed default
| conversions for convenience.
|
| Ugh, the varatt stuff doesn't look easily maintainable long-
| term. It looks like you just need it for
| getDatumTextSliceZ()?
|
| At the same time, I don't really know why you need it? Most
| of this should be doable via C functions, and the parts that
| are not, you could easily wrap yourself - you already seem to
| have some C code as part of pgzx.
| bear9628 wrote:
| Edit: I wasn't finished before submitting the response by
| accident :)
|
| > > > If we were to provide zig consumable ways of creating
| the struct from C headers, would that help? > > Yes, that
| would be awesome. > I don't know much about zig and won't
| have a whole lot of time to learn - but if you can outline
| what is required to make C macros [un]usable, it might be
| possible to improve something. Either on its own, or as
| part of future work.
|
| Hm. Sometimes it is difficult to tell until you try to use
| a macro. This is because the compiler ignores code (no type
| checking) that is not used. Difficult to explain, but
| assume that you write a program that emits typed code that
| is eventually compiled. This is what enables comptime, and
| "best effort" C header imports.
|
| The toolchain tries to convert macros into inline
| functions. That means any macro that contains some form of
| control flow or opens/closes a code block can't be used.
| Most obvious ones are the foreach loops, PG_TRY and friends
| or the PG_RETURN_X macros (luckily we can just use the
| XGetDatum functions).
|
| Union types are difficult as I said. But maybe this is
| rather a Zig problem.
|
| Sometimes using consts. For example when working with the
| varattrib variants we have bit wise operations and shift
| for example:
|
| ``` #define VARSIZE_4B(PTR) \ ((((varattrib_4b *)
| (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF) ```
|
| Now the 2 and the bit pattern might be translated into
| different types (e.g. int), which might not be compatible
| with va_header (which is an uint32). Sometimes the types
| for the constants look ok, sometimes not. Maybe this is
| something that could still improved in Zig, not sure. I
| haven't tried this, but I wonder what would happen if I
| annotate the types for the constatns in the macro (which
| might not make them more readible :) ).
|
| We later decided to allow mixing C with Zig code in case we
| need some kind of "complex" wrapping in C. This might not
| be fully ideal, but fortunately Zig is also a C compiler
| which allows us to fallback to C if we find something to
| complicated.
|
| > For one, I'd like to introduce a faster function calling
| infrastructure for the simple cases (mainly small number of
| arguments, without SRF support). That'll need to be
| declared in the function info struct.
|
| This sounds great. In pgzx we actually allow developers to
| capture the function call info as argument in their
| function implementation (not shown in our examples). For
| example if someone wants to use the collation, do some
| checks on nargs, implement a function with variable number
| of arguments.
|
| But out of the box we try to derive input and return types
| and conversions at compile time. I would have to see how
| the new API looks like, but I think we still would be able
| to continue to automatically derive the conversions to
| extract the arguments into values in Zig.
|
| > Ugh, the varatt stuff doesn't look easily maintainable
| long-term. It looks like you just need it for
| getDatumTextSliceZ()?
|
| > At the same time, I don't really know why you need it?
| Most of this should be doable via C functions, and the
| parts that are not, you could easily wrap yourself - you
| already seem to have some C code as part of pgzx.
|
| True. We introduced C into the code base later in our
| development. The project is still very new and we might
| revisit some choices on the Datum encoding.
|
| The `getDatumTextSliceZ` actually resembles the
| `text_to_cstring` function, which we might want to use in
| the future instead. In `Zig` a string is a slice, which is
| a fat pointer (pointer + length field). The type `[:0]const
| u8` represents a slice with 0 terminator (fun fact, Zig
| gives you a stack traces if you forget to write the
| terminator into your buffer). Initially we implemented this
| function directly so we can directly initialize the fat
| pointer without having to get the string length after doing
| the conversion.
|
| We added C to our code base later in time to allow us to
| wrap simple cases more easily without having to reproduce
| Postgres code in Zig. I guess we should revisit
| `getDatumTextSliceZ` :). Either have a small C wrapper over
| `text_to_cstring` that also returns the length or just bite
| the bullet and do a `strlen` after.
|
| Another motivator to try to fix the VARATT/VARDATA macros
| was to allow developers to use those in their own
| extensions. Looking at some extensions in contrib we find
| e.g. `VARDATA_ANY` or `VARSIZE_ANY_EXHDR` being used quite
| a bit.
| ww520 wrote:
| This is a fantastic news. I got into Zig in the last two weeks
| and it has be a very pleasant language. The comptime is an
| ingenious invention. It enables type safe meta-programming and
| easy generic support, in the same language syntax. The native
| support for sub-byte types like u2, u3, or u7 make packing data
| very easy. The native vector support on SIMD makes SIMD level
| parallel programming like child play. The language is looking
| very good.
___________________________________________________________________
(page generated 2024-03-21 23:01 UTC)