[HN Gopher] Introducing pgzx: create PostgreSQL extensions using...
       ___________________________________________________________________
        
       Introducing pgzx: create PostgreSQL extensions using Zig
        
       Author : eatonphil
       Score  : 107 points
       Date   : 2024-03-21 14:39 UTC (8 hours ago)
        
 (HTM) web link (xata.io)
 (TXT) w3m dump (xata.io)
        
       | canadiantim wrote:
       | very cool. More amazing work from xata. Special shoutout to
       | pgroll too!
        
       | michelpp wrote:
       | This looks very cool and it's nice to see some finicky details
       | like PG_GETARG abstracted away.
       | 
       | Are there any plans to make pgzx a Trusted Language so that Zig
       | functions could be written with CREATE FUNCTION? Rust recently
       | achieved this milestone and AWS RDS now supports PL/Rust.
       | 
       | Being a trusted language would mean it could be used to create
       | Trusted Language Extensions (TLEs) and could be installed on
       | cloud hosted databases without filesystem access.
        
         | tudorg wrote:
         | It would be amazing if we could could use Zig for TLEs, and
         | we'll look into it, but I worry it might not be possible
         | because Zig can't guarantee memory safety in the same way that
         | Rust can.
        
           | zombodb wrote:
           | If you build it, they'll come. The only limit is yourself!
        
             | tudorg wrote:
             | I understood that reference, zombo :)
        
               | zombodb wrote:
               | Y'all have built something cool. I'm excited to see where
               | it goes from here.
               | 
               | If there's anything the pgrx team can do to help, just
               | let us know.
        
         | bear9628 wrote:
         | No TLE support is a not a goal. Languages/extensions in
         | Postgres should only be marked as trusted if you can guarantee
         | that they do not access Postgres internals or even the file
         | filesystem. With Zig and in pgzx in particular we can not
         | guarantee this to be the case.
         | 
         | With pgzx we provide you a number of utilities on top of the C
         | API. But depending on the extension you are working on you
         | might heavily use the Postgres internals and their C APIs as
         | is.
         | 
         | Postgres has support for a number of languages. Some having
         | their own strenghts or weaknesses. One strength of Zig is its
         | closeness to C, while providing a many quality of live
         | improvements over C (optional types instead of null pointer,
         | generic data structures, ...). This makes it a formidable
         | choice for extensions that want to integrate deeply with the
         | Postgres internals.
        
           | anarazel wrote:
           | If I suddenly had unlimited resources and attention, I'd
           | really like to provide infrastructure to write trusted
           | functions by running them via a webassembly runtime, with a
           | small injected safe surface for accessing postgres
           | functionality. That'd make it a lot easier to write trusted
           | PLs in a variety of languages. Still probably needs a bit of
           | per-language sugar on top, but a lot less than doing it from
           | scratch.
           | 
           | Edit: Minor grammar fix
        
             | bear9628 wrote:
             | Hehe :)
             | 
             | I also have WASM in mind when thinking about future TLE
             | support :)
        
             | michelpp wrote:
             | I don't know much about it, but I think this has been
             | tried?
             | 
             | https://dylibso.com/blog/pg-extism/
             | 
             | Looking at the repo though it looks like it hasn't been
             | updated in a while.
        
               | nilslice wrote:
               | we're actually working on a new version of this, which
               | _does_ support TLE! if you're interested, please join
               | https://extism.org/discord and we can get you a preview.
        
               | bear9628 wrote:
               | Oh wow, that is really cool.
        
       | shay_ker wrote:
       | I have a dumb, and likely annoying, question: why isn't Rust the
       | clear choice? I think it has to do with Zig's "interop with C"
       | but this is very abstract for me. Is there a code example that
       | shows something that's super easy in Zig but a PITA in Rust?
       | 
       | I've heard that Rust's types and compiler can be frustrating to
       | deal with as well. Is Zig a more "joyful" programming language?
        
         | minimaxir wrote:
         | There is a separate project to build Postgres extensions with
         | Rust: https://github.com/pgcentralfoundation/pgrx
        
           | nextaccountic wrote:
           | And lots of interesting extensions use it, like
           | 
           | https://github.com/tembo-io/pgmq
           | 
           | https://github.com/zombodb/zombodb
           | 
           | https://github.com/supabase/pg_jsonschema
        
         | giovannibonetti wrote:
         | I've heard that when working with complex data structures with
         | Rust you often need to write unsafe code, which is a lot more
         | painful to work with in Rust. Once in you are in unsafe land
         | anyway, some people say Zig is more convenient.
        
         | ccleve wrote:
         | I haven't worked with pgzx, but it's possible that memory
         | management is easier. Zig uses memory arenas, and so does
         | Postgres. If one can map directly to the other, it would be a
         | huge win.
         | 
         | With Rust/pgrx (which I have used, extensively) memory
         | integration is more difficult. There's pg memory, and there's
         | Rust memory, and they're not the same, and you have to play
         | games to pass data between them. Lifetime issues crop up. Rust
         | might be able to solve the problem in the future with custom
         | allocators, but it's just not there yet.
        
           | bear9628 wrote:
           | > I haven't worked with pgzx, but it's possible that memory
           | management is easier. Zig uses memory arenas, and so does
           | Postgres. If one can map directly to the other, it would be a
           | huge win.
           | 
           | Yes. This was indeed a great motivator for using Zig. It was
           | quite easy to integrate Zig with the Postgres memory
           | management. This way we can use the Zig standard library or
           | other Zig libraries without a second thought. Another
           | advantage is that in Postgres memory context cleanup and
           | error handling are somewhat well integrated with each other.
           | This gives us some peace of mind as almost any Postgres
           | function you might want to use in your extension is likely to
           | raise an exception.
        
         | tudorg wrote:
         | It's for sure the question that a lot of people have. Initially
         | we had more details about that in the blog post, but we didn't
         | want it to become a "Zig vs Rust" blog post, so we kept it to a
         | minimum.
         | 
         | I will expand just a little bit more here:
         | 
         | First, I think the fact that Rust can be a trusted language for
         | Postgres is a huge advantage, and I am excited about it! I hope
         | we will have the chance to use it and contribute to pgrx as
         | well.
         | 
         | Postgres is not only written in C, but it has developed its own
         | particular style of C code. For example: arena memory allocator
         | (memory contexts), exceptions via setjmp/longjmp (ereport),
         | single-threaded communicating via shared memory, SPI for
         | accessing the data via SQL, etc. Some of these mechanisms kind
         | of conflict with the way Rust likes to do things.
         | 
         | Because of the above, pgrx has to do harder work to expose
         | functionality. It's possible, just a lot of work. In Zig, we
         | can use pretty much anything directly or with small wrappers.
         | 
         | If you need to call into C a lot, you need to use unsafe, and
         | that negates some of the memory safety advantages of Rust.
         | Also, Rust unsafe code is harder to write.
        
         | weinzierl wrote:
         | Here is talk and slides about how to create postgres extensions
         | in Rust.
         | 
         | https://rustlab.it/talks/teaching-an-old-dog-new-tricks-exte...
        
       | jedisct1 wrote:
       | PostgreSQL is such an amazing piece of software, that keeps being
       | more awesome release after release. It has almost become a full-
       | featured operating system at that point.
       | 
       | And the ability to write extensions in Zig is very nice.
        
         | slt2021 wrote:
         | Postgres creator Stonebraker is working on an actual operating
         | system to run database workloads :)
         | 
         | https://en.wikipedia.org/wiki/DBOS
        
       | nop_slide wrote:
       | Tangential, first time bumping into Xata. What is the difference
       | between Xata and Neon?
       | 
       | Their site says
       | 
       | > Xata is the only serverless data platform for PostgreSQL.
       | 
       | I thought Neon was serverless postgres too?
       | 
       | Edit: This looks really dope actually, going to play around with
       | it later. Really dig the free tier and the built in search from
       | the get go.
        
         | tudorg wrote:
         | We're referring more to the "data platform" part. What we mean
         | by a "Postgres data platform" (and arguably it's not an widely
         | accepted term, so take it with a grain of salt) is that you get
         | not only Postgres, but secondary data stores around it, more
         | tightly integrated than in a typical cloud provider:
         | 
         | - Elasticsearch, to which we replicate data automatically via
         | CDC
         | 
         | - Blob storage for file attachments, we use S3 to store the
         | files and Cloudflare to cache them on CDN
         | 
         | - (in the future) Caching
         | 
         | - (in the future) Time-series databases, again with automatic
         | replication from Postgres
         | 
         | That said, you can use just the Postgres part, and we are very
         | happy if you do.
        
           | nop_slide wrote:
           | Edited my comment above, but this looks really rad. I made an
           | account and going to try porting a side project DB to it
           | later!
           | 
           | Particularly we need to improve our search and I've been lazy
           | about tweaking our ts_vector setup.
        
           | nextaccountic wrote:
           | > Elasticsearch
           | 
           | Do you use zombodb?
           | 
           | https://github.com/zombodb/zombodb
        
             | bear9628 wrote:
             | You can get an overview of the platform in this Post:
             | https://xata.io/blog/serverless-postgres-platform#the-
             | platfo...
             | 
             | Zombodb is a really cool project, but no, we don't use it.
             | We use logical replication and triggers to also capture
             | schema changes with your records into an event stream. The
             | event stream is used to send your data to Elasticsearch
             | (and create/update the index). See:
             | https://xata.io/blog/serverless-postgres-platform#logical-
             | re...
             | 
             | Stay tuned, we are planning to open source this component
             | as well.
        
         | tristan957 wrote:
         | As far as Postgres goes, Xata is just Amazon Aurora under the
         | hood if I read their last blog post right.
         | 
         | (I work at Neon)
        
       | anarazel wrote:
       | It's completely besides the point, but part of the complexity of
       | the char_count example is just due to the C version for some
       | reason copying the input strings into newly allocated memory,
       | which the zig version doesn't :)
        
         | tudorg wrote:
         | This is fair, and it's a bit embarrassing but I copied the C
         | version from an older tutorial without looking at it critical
         | enough. Still, the Zig version gets the arguments as normal
         | function parameters and doesn't need to call `PG_GETARG_*`.
        
           | e12e wrote:
           | I notice that the c version looks like it uses memcopy as a
           | convoluted typecast from a pointer to _text_ (presumably a
           | postgres typedef?) to a pointer to _char_ (null terminated?)?
           | Could this actually have a purpose?
           | 
           | Ed: I don't think PG uses c strings? Looks like memcopy might
           | be needed for call-by-reference, return-by-value - but not
           | really in the simple function that only walks the array?
           | 
           | https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C.
           | ..
        
       | anarazel wrote:
       | I wonder what things you'd like to see exposed by postgres to
       | make something like this easier.
       | 
       | E.g. you have code to maintain the module magic and function
       | info, for both of which there are some plans to change the
       | contents in the next few major versions. If we were to provide
       | zig consumable ways of creating the struct from C headers, would
       | that help?
        
         | bear9628 wrote:
         | One of the reasons why we are using Zig is that we can consume
         | the C API mostly as is.
         | 
         | When importing the C headers Zig translates the C headers into
         | Zig declarations. Unfortunately this is not always possible for
         | C macros, and the reason why we have to maintain those structs.
         | And this is where we have to step in with Custom Zig code. But
         | most of the time we actually consume the C APIs as is.
         | 
         | > If we were to provide zig consumable ways of creating the
         | struct from C headers, would that help?
         | 
         | Yes, that would be awesome. I'm curious how they will look like
         | in the future.
         | 
         | We have had the most difficulties with the module magic,
         | function info, and varatt/Datum macros. Fortunately you have to
         | solve the module magic/function info "only once". The Datum
         | conversion and VARATT macros are more troublesome. We have some
         | conversion support for a number of common zig types. But
         | ideally we would like users to be able to use the C APIs as is,
         | while we provide some type directed default conversions for
         | convenience.
         | 
         | The main problems we've been facing with C macro translations
         | are type conversions/casts in macros, especially if the
         | underyling struct heavily uses union (for example VARATT
         | macros). In some cases translating inline functions instead of
         | macros might work better, due to the translator having more
         | type information available. We fixed some of the translations
         | manually. You can find them in varatt.zig and datum.zig (where
         | we opted to implement the text to cstring translation
         | ourselves).
         | 
         | Data structures like lists, slist, dlist, hash tables are quite
         | consumable as is. We have some typed wrappers for those and
         | provide iterators. Macros with control flow can not be reused,
         | but I think this is fair, especially as the foreach macros are
         | a very common C patterns. All in all we have had no troubles
         | with them.
        
           | anarazel wrote:
           | Hi,
           | 
           | > > If we were to provide zig consumable ways of creating the
           | struct from C headers, would that help?
           | 
           | > Yes, that would be awesome.
           | 
           | I don't know much about zig and won't have a whole lot of
           | time to learn - but if you can outline what is required to
           | make C macros [un]usable, it might be possible to improve
           | something. Either on its own, or as part of future work.
           | 
           | > I'm curious how they will look like in the future.
           | 
           | There's quite a few things.
           | 
           | For one, I'd like to introduce a faster function calling
           | infrastructure for the simple cases (mainly small number of
           | arguments, without SRF support). That'll need to be declared
           | in the function info struct.
           | 
           | For another, eventually I want to support a different
           | encoding for variable length types. Including making it
           | reasonably efficient to have variable-length integers.
           | 
           | > We have had the most difficulties with the module magic,
           | function info, and varatt/Datum macros. Fortunately you have
           | to solve the module magic/function info "only once". The
           | Datum conversion and VARATT macros are more troublesome. We
           | have some conversion support for a number of common zig
           | types. But ideally we would like users to be able to use the
           | C APIs as is, while we provide some type directed default
           | conversions for convenience.
           | 
           | Ugh, the varatt stuff doesn't look easily maintainable long-
           | term. It looks like you just need it for
           | getDatumTextSliceZ()?
           | 
           | At the same time, I don't really know why you need it? Most
           | of this should be doable via C functions, and the parts that
           | are not, you could easily wrap yourself - you already seem to
           | have some C code as part of pgzx.
        
             | bear9628 wrote:
             | Edit: I wasn't finished before submitting the response by
             | accident :)
             | 
             | > > > If we were to provide zig consumable ways of creating
             | the struct from C headers, would that help? > > Yes, that
             | would be awesome. > I don't know much about zig and won't
             | have a whole lot of time to learn - but if you can outline
             | what is required to make C macros [un]usable, it might be
             | possible to improve something. Either on its own, or as
             | part of future work.
             | 
             | Hm. Sometimes it is difficult to tell until you try to use
             | a macro. This is because the compiler ignores code (no type
             | checking) that is not used. Difficult to explain, but
             | assume that you write a program that emits typed code that
             | is eventually compiled. This is what enables comptime, and
             | "best effort" C header imports.
             | 
             | The toolchain tries to convert macros into inline
             | functions. That means any macro that contains some form of
             | control flow or opens/closes a code block can't be used.
             | Most obvious ones are the foreach loops, PG_TRY and friends
             | or the PG_RETURN_X macros (luckily we can just use the
             | XGetDatum functions).
             | 
             | Union types are difficult as I said. But maybe this is
             | rather a Zig problem.
             | 
             | Sometimes using consts. For example when working with the
             | varattrib variants we have bit wise operations and shift
             | for example:
             | 
             | ``` #define VARSIZE_4B(PTR) \ ((((varattrib_4b *)
             | (PTR))->va_4byte.va_header >> 2) & 0x3FFFFFFF) ```
             | 
             | Now the 2 and the bit pattern might be translated into
             | different types (e.g. int), which might not be compatible
             | with va_header (which is an uint32). Sometimes the types
             | for the constants look ok, sometimes not. Maybe this is
             | something that could still improved in Zig, not sure. I
             | haven't tried this, but I wonder what would happen if I
             | annotate the types for the constatns in the macro (which
             | might not make them more readible :) ).
             | 
             | We later decided to allow mixing C with Zig code in case we
             | need some kind of "complex" wrapping in C. This might not
             | be fully ideal, but fortunately Zig is also a C compiler
             | which allows us to fallback to C if we find something to
             | complicated.
             | 
             | > For one, I'd like to introduce a faster function calling
             | infrastructure for the simple cases (mainly small number of
             | arguments, without SRF support). That'll need to be
             | declared in the function info struct.
             | 
             | This sounds great. In pgzx we actually allow developers to
             | capture the function call info as argument in their
             | function implementation (not shown in our examples). For
             | example if someone wants to use the collation, do some
             | checks on nargs, implement a function with variable number
             | of arguments.
             | 
             | But out of the box we try to derive input and return types
             | and conversions at compile time. I would have to see how
             | the new API looks like, but I think we still would be able
             | to continue to automatically derive the conversions to
             | extract the arguments into values in Zig.
             | 
             | > Ugh, the varatt stuff doesn't look easily maintainable
             | long-term. It looks like you just need it for
             | getDatumTextSliceZ()?
             | 
             | > At the same time, I don't really know why you need it?
             | Most of this should be doable via C functions, and the
             | parts that are not, you could easily wrap yourself - you
             | already seem to have some C code as part of pgzx.
             | 
             | True. We introduced C into the code base later in our
             | development. The project is still very new and we might
             | revisit some choices on the Datum encoding.
             | 
             | The `getDatumTextSliceZ` actually resembles the
             | `text_to_cstring` function, which we might want to use in
             | the future instead. In `Zig` a string is a slice, which is
             | a fat pointer (pointer + length field). The type `[:0]const
             | u8` represents a slice with 0 terminator (fun fact, Zig
             | gives you a stack traces if you forget to write the
             | terminator into your buffer). Initially we implemented this
             | function directly so we can directly initialize the fat
             | pointer without having to get the string length after doing
             | the conversion.
             | 
             | We added C to our code base later in time to allow us to
             | wrap simple cases more easily without having to reproduce
             | Postgres code in Zig. I guess we should revisit
             | `getDatumTextSliceZ` :). Either have a small C wrapper over
             | `text_to_cstring` that also returns the length or just bite
             | the bullet and do a `strlen` after.
             | 
             | Another motivator to try to fix the VARATT/VARDATA macros
             | was to allow developers to use those in their own
             | extensions. Looking at some extensions in contrib we find
             | e.g. `VARDATA_ANY` or `VARSIZE_ANY_EXHDR` being used quite
             | a bit.
        
       | ww520 wrote:
       | This is a fantastic news. I got into Zig in the last two weeks
       | and it has be a very pleasant language. The comptime is an
       | ingenious invention. It enables type safe meta-programming and
       | easy generic support, in the same language syntax. The native
       | support for sub-byte types like u2, u3, or u7 make packing data
       | very easy. The native vector support on SIMD makes SIMD level
       | parallel programming like child play. The language is looking
       | very good.
        
       ___________________________________________________________________
       (page generated 2024-03-21 23:01 UTC)