hngopher.com

       [HN Gopher] Features I'd Like in PostgreSQL
       ___________________________________________________________________
        
       Features I'd Like in PostgreSQL
        
       Author : todsacerdoti
       Score  : 73 points
       Date   : 2023-01-28 18:51 UTC (4 hours ago)
        
 (HTM) web link (gilslotd.com)
 (TXT) w3m dump (gilslotd.com)
        
       | Alir3z4 wrote:
       | I love PostgreSQL, it's the database I go for even for really
       | tiny projects and data needs even when SQLite could be used, but
       | one thing that makes me annoyed is the upgrading.
        
       | captrb wrote:
       | A built-in timestamp type that stores both the UTC time, the
       | originating timezone IANA name, and the timezone offset. Without
       | loss.
       | 
       | "it was 2:01am on November 5th, in America/Los_Angeles and the
       | offset was -7"
        
         | doctor_eval wrote:
         | That's a weird use case but you could just create a UDT to do
         | it. You need the timestamptz value, the timestamp and the
         | Timezone. Super expensive but -\\_(tsu)_/-
        
         | crazygringo wrote:
         | Why?
         | 
         | You can do this yourself with two fields. And it would be
         | extremely redundant to be constantly storing timezone together
         | with offset (violates database normalization) -- really you'd
         | want the second field to be a key pointing to an index of
         | timezones with offsets in a separate table.
         | 
         | But most of all it pretty much destroys normal assumptions
         | regarding sorting and collation, and introduces an
         | idiosyncratic "logic" of its own. Is 2:01am at the same moment
         | in America/Los_Angeles and in America/Vancouver the same moment
         | in time, or not? (There are several more edge cases easy to
         | imagine regarding how equality and sorting might be
         | calculated.)
        
         | derefr wrote:
         | 1. What would you use this for?
         | 
         | 2. What is the collation (sorting algorithm) for this datatype?
        
         | svieira wrote:
         | Without loss requires the version of the timezones data too.
         | 
         | "it was 2:01am on November 5th, in America/Los_Angeles and the
         | offset was -7 (retrieved from ICU v97)"
        
           | captrb wrote:
           | Interesting point. I don't think I have that particular use
           | case, but I can definitely imagine it. I'd think especially
           | with dates in the far future and the timezone offsets are
           | altered between the time of recording and the time value
           | itself?
        
             | giraffe_lady wrote:
             | Individual timezones are pretty stable but a handful change
             | every year in some way, often switching how they observe
             | DST or something similar. If you have a truly global
             | userbase and this actually matters, you'll definitely hit
             | them.
        
       | valzam wrote:
       | I hope postgres adopts easily disabling an index instead of
       | deleting it. MySQL has this feature and it's very useful for
       | testing whether an index can be deleted on production load.
       | Transactional DDL is amazing already, disabling indexes would be
       | a great addition for low-risk performance tuning.
        
       | whitepoplar wrote:
       | I'd love to see:
       | 
       | 1) _Simple_ , easy to use, high-availability features.
       | 
       | 2) Horizontal scalability without having to resort to an
       | extension like Citus.
       | 
       | 3) Built-in connection pooler.
       | 
       | 4) Query pinning.
       | 
       | 5) Good auto-tuning capability.
        
         | seedless-sensat wrote:
         | Points 1-3 exactly describe my pains operating Postgres at
         | scale
        
         | threeseed wrote:
         | It is kind of ridiculous that the first three haven't been
         | sorted by now.
         | 
         | And over time it will increasingly relegate PostgreSQL to being
         | for development only with production use being handled by wire-
         | compatible databases e.g. Aurora.
        
         | ahachete wrote:
         | > 2) Horizontal scalability without having to resort to an
         | extension like Citus.
         | 
         | Just curious, what would save you having the solution in-core?
         | Installation, sure, but that's a one-off possibly in your
         | deployment code. "CREATE EXTENSION citus" and add that to
         | postgresql.conf? Sure, but not too much work for me. The rest
         | (commands to actually create the nodes, do the sharding itself)
         | are something I cannot imagine being different or simpler if
         | with an in-core solution.
         | 
         | What am I missing?
        
           | threeseed wrote:
           | It means that when you upgrade PostgreSQL you don't have to
           | worry if the extensions are compatible and have been fully
           | tested not just for functionality but security etc as well.
           | 
           | And most importantly it means you don't have to worry if that
           | extension will move to a freemium model which (a) often has
           | important features out of your price range and (b) is
           | generally unacceptable in enterprise environments.
        
         | morrbo wrote:
         | Never agreed with something so much more in my life. Points 1-3
         | in particular. I understand their logic of not wanting to
         | "pollute" the core products with these features.... but it's
         | sort of past the point now where you're starting to expect this
         | stuff in a modern database product (instead of add-ons, hacks,
         | and outdated blog posts)
        
         | LtWorf wrote:
         | What would the proprietary vendors sell then?
        
       | dudeinjapan wrote:
       | Array foreign keys and indexes on arrays/JSON nested fields. When
       | using MongoDB, having this eliminates most need for join tables.
       | (*This may have already been implemented, I haven't used SQL
       | recently.)
        
         | aidos wrote:
         | I think they have indexes on json (or any selectable thing?).
         | 
         | You could probably implement the foreign key using generated
         | columns these days. Ah I guess you mean instead of having the
         | association table? That would definitely be nice in some cases.
        
       | tezza wrote:
       | 1) "NOTIFY" should be eligible for prepared statements to assist
       | in the case where untrusted values is part of the "channel" or
       | "payload".
       | 
       | I had to write some ugly stored procedures to attempt sanitising
       | myself
       | 
       | 2) Query batching where queries are submitted to a database
       | together and then results are returned. If this could be done
       | without a connection per query that would be awesome. The use
       | case is a global truth database needing 1 write to 20 reads... If
       | the server is in the UK and the client is in Fiji then batching
       | the 20 reads together is a massive latency win.
        
         | ntarora wrote:
         | Can't you do number 2 with Postgres pipelining? Here's an
         | article from bit.io (serverless Postgres w/ data repos) their
         | example is inserts but also works for read ops.
         | 
         | https://innerjoin.bit.io/the-distance-dilemma-measuring-and-...
         | https://www.postgresql.org/docs/current/libpq-pipeline-mode....
        
       | petepete wrote:
       | I'd love to see native temporal tables at some point. Being able
       | to step back in time without relying on extra code or a plugin
       | would be great.
       | 
       | https://learn.microsoft.com/en-us/sql/relational-databases/t...
        
         | eliaspro wrote:
         | Temporal tables are IMHO a profound gamechanger. Once you
         | understood how to use them, you'll have a completely different
         | view on your data models/db schemas and your data lifecycle.
         | 
         | It makes so many things far easier, having a temporal relation
         | for every record and saves a lot of headaches that would
         | otherwise usually be dealt with on the application level.
        
         | captrb wrote:
         | Half my job is doing this in reverse by accumulating deltas
         | over streams and materializing the current state in various
         | caching tiers. I'd love this to be native feature in PG, but
         | for any my workloads it would need to support a lower-cost
         | archival-grade storage tier ala a blob store.
        
       | cosmotic wrote:
       | I'd like to see easier upgrades between major versions, or at
       | least published docker images that can do the upgrade. As it
       | stands, it's a huge hassle to get both binaries installed on the
       | same system. This is a task almost everyone has to do
       | periodically, it should be easier.
        
         | doctor_eval wrote:
         | Yes this is a huge problem. I wish PG could contain whatever it
         | needs to do in place upgrades from supported versions.
         | Statically compiled builds of previous versions of pg_upgrade?
        
       | foreigner wrote:
       | Allow trailing commas in SELECT and IN. Pretty please?
        
         | valzam wrote:
         | Omg yes please. I get that it might be slightly harder to parse
         | but any language that doesn't support it inevitably annoys me.
         | In pg and json you end up with runtime errors, programming
         | languages lead to bigger diffs than necessary (adding an entry
         | to a static list is a 2 line change instead of 1 line)
        
           | defanor wrote:
           | > programming languages lead to bigger diffs than necessary
           | (adding an entry to a static list is a 2 line change instead
           | of 1 line)
           | 
           | I think in ML-family languages it's fairly common to format
           | like this, which avoids bigger diffs:                 foo = [
           | 1             , 2             ]
           | 
           | In C-style languages something similar goes as well, though
           | perhaps used less commonly:                 int x[] = { 1
           | , 2       };
        
             | giraffe_lady wrote:
             | This is how I format sql code when I have control over it,
             | have for years. People often recoil in horror, sometimes
             | they ponder for a minute then switch over themselves.
             | 
             | It works very easily and consistently to center around the
             | space and lead with the comma:                 select id
             | , name            , address            , ts         from
             | table        where condition          and etc;
        
               | lgas wrote:
               | Doesn't this just move the problem to when you add a
               | longer statement like "INNER JOIN"? Then the entire SQL
               | statement becomes the diff?
        
       | sergiotapia wrote:
       | I wish there were a easy way for Postgres to tell me "hey you've
       | been doing a ton of queries with this field that has no index."
        
         | thinkingkong wrote:
         | You can use the slow query log which is very helpful. Lots of
         | logging tools will injest it properly and you can aggregate
         | queries that are ran frequently. Having an index or not is just
         | one optimization.
        
           | ahachete wrote:
           | Many of those queries may run sub-second (typical lower-end
           | value for log_min_duration_statement), so they won't get
           | logged. Yet, if called at high frequency, may represent a
           | notable % of your CPU and I/O. The slow query log is not
           | enough in many cases.
        
       | richbell wrote:
       | I would also like to see "friendly" SQL syntaxes like what DuckDB
       | offers, but I doubt they'd add it without an update to the
       | standards themselves.
       | 
       | https://duckdb.org/2022/05/04/friendlier-sql.html
        
         | aftbit wrote:
         | Oh thanks for sharing this! I love all of those, and most seem
         | like they'd be easy sugar over the existing syntax. The biggest
         | missing feature from those that I would really enjoy in data
         | exploration tasks (though not in PROD) would be automatic JOIN
         | ON selection based on foreign keys.
         | 
         | Example:                   SELECT users.id, COUNT(*)
         | FROM users         JOIN orders ON AUTO         WHERE
         | orders.created_at > NOW() - '7 day'::interval         GROUP BY
         | ALL
         | 
         | This would only work if there was an obvious path to do the
         | join. In this case, I'm imagining that the `orders` table might
         | have a `user_id` column which is a foreign key into the `users`
         | table.
        
       | SigmundA wrote:
       | >Reduce the memory usage of prepared queries
       | 
       | Yes query plan reuse like every other db, this still blows me
       | away PG replans every time unless you explicitly prepare and
       | that's still per connection.
       | 
       | Better full-text scoring is one for me that's missing in that
       | list, TF/IDF or BM25 please see:
       | https://github.com/postgrespro/rum
        
         | doctor_eval wrote:
         | As I understand it this is supposed to be done in the client
         | libraries rather than in the server. It's not that it doesn't
         | reuse query plans, it just doesn't do it in core.
        
       | thdc wrote:
       | For running queries that modify data, I will start with BEGIN,
       | run the query, maybe run some checks, then COMMIT or ROLLBACK
       | depending on if it did what I hoped it would do.
       | 
       | Are there any downsides to that outside of more typing?
       | 
       | Ah I see the `-i-am-a-dummy` also affects queries with large
       | result set sizes.
        
         | silverlyra wrote:
         | Your comment led me to realize that I could run psql with -v
         | AUTOCOMMIT=0 (or add that to ~/.psqlrc) to achieve most of the
         | safety net I've been wanting. My fear has been forgetting the
         | BEGIN.
        
         | lordgilman wrote:
         | (Author here) I think the purpose of the large result limiting
         | is to avoid swamping the terminal with output. The whole set of
         | --i-am-a-dummy features is to give a saving chance to the
         | hackers who are typing away without any forethought into a
         | prompt in autocommit mode. Using transactions, copy and pasting
         | complete queries, having your coworker double-check your work
         | etc give you enough foresight that you'll not have a need for
         | it.
        
           | doctor_eval wrote:
           | I agree with the GP - rename "commit" to "save" if you have
           | to, disable autocommit, and use a paginator. Voila! Dummy
           | mode.
           | 
           | I'm definitely not a PG dummy but that's what I do anyway.
        
       | x4m wrote:
       | > Reduce the memory usage of prepared queries
       | 
       | FWIW Odyssey supports prepared statements in transaction pooling.
        
       | t-writescode wrote:
       | Loose index scan. Its lack is why your "select distinct" is slow.
        
       | bornfreddy wrote:
       | SHOW DATABASES; and SHOW TABLES;, please. DESC would be nice too.
       | And let it work everywhere, not just in CLI (like \d & co. do).
        
         | leo8 wrote:
         | This a thousand times. I have to ALWAYS google that one big
         | query for showing me all the databases. (I almost always
         | connect with pgAdmin)
        
       | derefr wrote:
       | > VALIDATE 1:m
       | 
       | Rather than this, why not do more with existing table REFERENCES
       | metadata? For example, why can't I have a covering index across
       | data from two tables, joined through a foreign-key column with a
       | pre-established REFERENCES foreign-key constraint against the
       | other table, where the REFERENCES constraint then keeps
       | everything in place to make that work (i.e. ensuring that
       | vacuuming/clustering either table also vacuums/clusters the other
       | together in a transaction in order to rewrite the LSNs in the
       | combined index correctly, etc.)
        
         | lordgilman wrote:
         | (Author here) This is also a neat idea but wouldn't help with
         | subqueries, views or other computed relations getting joined on
         | each other which is a major motivation of the validation
         | proposal.
        
           | derefr wrote:
           | I could see that being a motivation, but isn't that exactly
           | the use-case where it'd also be a bad idea, overhead-wise?
           | 
           | With a static DDL constraint on join shape, an assertion
           | about such shape can be pre-validated at insert time, to be
           | always-valid at query time, such that queries can then
           | pass/fail such assertions during their "compilation" (query
           | planning) step.
           | 
           | Without such a static constraint, you have to instead insert
           | a merge node in the query plan (same as what a DISTINCT
           | clause does) in order to normalize the input row-tuple-stream
           | and fail on the first non-normalized row-tuple seen.
           | 
           | You _could_ pre-guarantee success in limited cases (e.g. it
           | 's going to be a 1:N join if the LHS of the join has a UNIQUE
           | constraint on the single column being joined against); but
           | you're not going to have those guarantees in most cases of
           | "computed relations."
        
       | jayski wrote:
       | I'd like to have per database WALs, instead of the WAL being for
       | everything.
       | 
       | I have no idea what this entails or if it's even possible, I only
       | know it would make my life easier.
        
         | forinti wrote:
         | That's interesting. I have a separate cluster setup without
         | archiving because some databases don't need it. It's so simple
         | to run various clusters side-by-side on the same host that I
         | don't think having compartmentalised wals would be easier to
         | manage.
        
           | derefr wrote:
           | Yeah, but I want cheap in-memory joins between the WAL-
           | isolated datasets. I.e. "multi-world" MVCC concurrency, where
           | a TX is locking in a separate min_xid for dataset A, B, C,
           | etc. for the lifetime of the TX -- but without this being a
           | big distributed-systems vector-clocks problem, because it's
           | all happening in a single process.
           | 
           | Why? Being able to run a physical replica that loads WAL from
           | multiple primaries that each have independent data you want
           | to work with, for one thing. (Yes, logical replication
           | "solves" this problem -- but then you can't offload index
           | building to the primary, which is often half the point of a
           | physical replication setup.)
        
       | bratao wrote:
       | I'm waiting for a compact data-structure such as RocksDB. I store
       | TBs of data and MySQL and MyRocks are miles ahead on this
       | department compared to Postgres. I prefer to run from sharding
       | and distributed solutions for now.
        
         | riku_iki wrote:
         | I am curious what specifically makes RocksDB compact comparing
         | to Postgress? How you decided it is more compact?
        
         | nightfly wrote:
         | Postgres on top of ZFS with compression on gives about 4x more
         | efficient storage for the numerically/time-series centric
         | database I help manage
         | 
         | Edit: Current on-disk size is 5.34TiB and logical size is
         | 20.3TiB so closer to 3.8x savings really
        
       | masklinn wrote:
       | Join and index hints.
       | 
       | I know the postgres devs don't like them, and that the query
       | planner _should_ be good enough that they 're not needed, but
       | it's not, and it regularly fucks up.
        
         | riku_iki wrote:
         | You usually can manage this by dividing your query into
         | subqueries each creating some temp table, so you have control
         | over how joining actually happens.
        
           | aeyes wrote:
           | If you do this be careful what your temp_buffers is set to so
           | that your temp tables don't spill to disk. If you are on a
           | network file system, for example on AWS RDS, writing big (a
           | few hundred MB) temp tables to disk will stall all
           | transactions.
           | 
           | The same is of course true when you have big joins that don't
           | fit in work_mem but default size of this will be much larger.
           | 
           | I can usually fix bad plans with CTEs, no need to get much
           | fancier. And the problem is often caused by schema design
           | where you have a mapping table of two tables in the middle
           | and your join is N-M-M where the planner has no information
           | about the relationship between the two outer tables.
        
           | SigmundA wrote:
           | Or there could just be a hint that says do it in source code
           | order:
           | 
           | https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-
           | tr...
           | 
           | There could also be hints that tell what kind of join to use
           | and which indexes to use!
        
           | derefr wrote:
           | It's a shame, then, that there's no way to define at-first-
           | purely-in-memory tables, which only "spill" to disk if they
           | cause your query to exceed work_mem.
           | 
           | Within PL/pgSQL, CREATE TEMPORARY TABLE is still (sometimes a
           | lot!) slower than just SELECTing an array_agg(...) INTO a
           | variable, and then `SELECT ... FROM unnest(that_variable) AS
           | t` to scan over it. (And CTEs with MATERIALIZED are really
           | just CREATE TEMPORARY TABLE in disguise, so that's no help.)
        
             | taffer wrote:
             | > It's a shame, then, that there's no way to define at-
             | first-purely-in-memory tables, which only "spill" to disk
             | if they cause your query to exceed work_mem.
             | 
             | But isn't this exactly how temp tables work? A temp tabe
             | lives in memory and only spills to disk if it exceeds
             | _temp_buffers_ [1].
             | 
             | [1] https://www.postgresql.org/docs/current/runtime-config-
             | resou...
        
               | riku_iki wrote:
               | that one is for some access buffers and not actual temp
               | tables?
        
               | taffer wrote:
               | No _temp_buffers_ is for temp tables and materialized
               | CTEs. Maybe you are confusing it with _shared_buffers_?
        
               | riku_iki wrote:
               | I am trying to say that your link says it is for access
               | buffer for accessing temp table, and it doesn't say
               | actual temp table is stored in that buffer and not
               | flushed on the disk.
        
               | derefr wrote:
               | Huh, I think you're right... but it's still slower! I've
               | definitely measured this effect in practice.
               | 
               | Just spitballing here -- I think the difference _might_
               | come from where the _metadata_ required to treat the
               | table  "as a table" in queries has to be entered into,
               | and the overhead (esp. in terms of locking) required to
               | do so.
               | 
               | Or, perhaps, it _might_ come from the serialization
               | overhead of converting  "view" row-tuples (whose contents
               | might be merged together from several actual material
               | tables / function results) into flattened fully-
               | materialized row-tuples... which, presumably, emitting
               | data into an array-typed PL/pgSQL variable might get to
               | skip, since the handles to the constituent data can be
               | held inside the array and thunked later on when needed.
        
               | taffer wrote:
               | Temp tables are per session, there shouldn't be any
               | locking involved.
        
         | derefr wrote:
         | I would go further: give me a PL/ language (or an equivalent
         | bytecode-abstract-machine abstraction) that lets me program --
         | at least in a read-only capacity -- directly against the
         | access-method handles, such that the DB's table heap-file
         | pages, index B-tree nodes, locks, etc. are "objects" I can
         | manipulate, pass to functions, and navigate graph-wise (i.e.
         | ask a table for its partitions; ask a partition for its access-
         | method; type-assert the access-method as a heap file; ask for
         | the pages as a random-access array; probe some pages for row-
         | tuples; iterate those, de-serializing and de-toasting them as a
         | type-cast; and then implement some efficient search or sampling
         | on those tuples. Basically the code you'd write in a PG
         | extension to interact with the storage engine on that level,
         | but limited to only "safe" actions against the storage, and so
         | able to be directly user-exposed.)
         | 
         | The closest analogy I know of to that, is how you work with ETS
         | tables in Erlang. I want to send the RDBMS code that operates
         | at that level!
         | 
         | Actually, I presume that SQLite would necessarily have some
         | low-level C interface that works like this -- but few people
         | seem to talk about it/be aware of it compared to its high-level
         | SQL-level interface.
        
           | jiggawatts wrote:
           | This is vaguely how the Microsoft "Jet" database engines
           | work. Microsoft Exchange uses this low-level query authoring
           | technique to achieve its scalability and performance goals.
           | This generally makes its performance consistent and
           | predictable.
           | 
           | There were some attempts back in the early 2000s to move
           | Exchange over to use the SQL Server RDBMS engine, and they
           | added a bunch of features to enable this kind of low-level
           | control. Not just join hints: you could _force_ specific
           | query plans by specifying the plan XML document for that
           | query. See: https://learn.microsoft.com/en-
           | us/sql/t-sql/queries/hints-tr...
           | 
           | This wasn't good enough however, and Exchange still uses the
           | Jet database.
           | 
           | Something that might be interesting is an RDBMS "as a
           | library", where instead of poking it with ASCII text queries,
           | you get a full programming API surface where you can do
           | exactly the type of thing you propose: perform arbitrary
           | walks through data structures, develop custom indexes, or
           | whatever.
        
           | cdcarter wrote:
           | SQLite compiles SQL to bytecode, and then executes that
           | bytecode against the database. However, there's no public
           | interface for creating/running bytecode directly, instead of
           | as a result of a compiled statement. You almost certainly
           | COULD do what you're trying to achieve, but the SQLite
           | author's have specifically called it out as a bad idea -
           | https://sqlite.org/forum/info/c695cbe47b955076 - since
           | bytecode representation can change from release to release in
           | a way that would only matter to the compiler (or to your
           | weird hacked in interface). Meaning, non-portable.
        
         | SigmundA wrote:
         | +1, planner is great and all, but I know better sometimes, let
         | me tell it so and have it listen.
         | 
         | Nothing like the planner deciding it knows better at some
         | random time in production because of data changes.
        
       | forinti wrote:
       | I would like to see packages and tools to make migration from
       | Oracle easier.
        
         | alexisread wrote:
         | Good luck with that! Oracle have never made anything that they
         | can't extract money from. In this case look to their attempts
         | to kill logminer (which most CDC systems eg. Debezium use) and
         | force you to pay for Goldengate hub/microservices/cloud.
         | 
         | Not tried it, but https://github.com/bersler/OpenLogReplicator
         | Might work. Redpanda is easy to set up (kafka-compatible
         | alternative).
        
       | GregDavidson wrote:
       | Inline SQL functions so I can modularize my code without a
       | massive performance penalty. Foreign key references to system
       | tables so I can reliably store, e.g. function oids.
        
       | Ralfp wrote:
       | I would love to see more languages in their fulltext search by
       | default. But it looks like currently its ,,if you want language
       | contribute it" kind of deal and I've found this outside of my
       | depth.
        
       | ceeam wrote:
       | Reordering columns.
        
         | muhehe wrote:
         | I dont know. I mean...I'd like that too, but is this really
         | more important than other mentioned features (in this thread or
         | article)? What your reason? (I'm just curious)
        
       ___________________________________________________________________
       (page generated 2023-01-28 23:00 UTC)