[HN Gopher] Waiting for PostgreSQL 14 - SEARCH and CYCLE clauses
___________________________________________________________________
Waiting for PostgreSQL 14 - SEARCH and CYCLE clauses
Author : todsacerdoti
Score : 263 points
Date : 2021-02-04 12:17 UTC (10 hours ago)
(HTM) web link (www.depesz.com)
(TXT) w3m dump (www.depesz.com)
| truth_seeker wrote:
| Love it.
|
| Also it would be cool if the computation can be pushed down in
| case of using remote db tables using fdw or db link.
| WJW wrote:
| Is it just me or is this a severe case of "if you only have a
| database, everything looks like a SQL query"? Don't get me wrong,
| I think it's incredible that Postgres is flexible enough to do
| this, but there are better data structures for graphs than
| tables.
|
| If you need to do this at relatively small scale, just load the
| data into memory and don't bother with doing it in the database.
| If you need to do this at large scale, get a dedicated graph
| database. I guess I just don't see the niche where doing it in
| Postgres fits in. Anyone with a real world example? I'd love to
| be proven wrong here since it's definitely cool.
| marcosdumay wrote:
| It's a case of "I want to keep all my data on the same place"
| because nobody has the attention span to manage a relational
| database, a document database, an hierarchical database, and a
| key-value database (did I miss any?). All that while you lose
| atomicity, because your data is now distributed.
|
| Also, data often changes in form, and you don't want to rewrite
| everything to get the data from a different place just because
| its type changed.
| kstrauser wrote:
| I've heard it said of Python that it's the second best
| language for doing anything. For one specific task, there's
| probably something more tailored to the purpose. For
| integrating lots of different tasks into a single project,
| there's nothing that's better at doing everything _pretty
| well_.
|
| I see PostgreSQL the same way (although I don't think there's
| a better RDBMS). It's not the best document database, but
| it's pretty good. It's not the best key-value store, but it's
| pretty good. It's not the best graph database, but it's
| pretty good. And here "pretty good" means "enough that you
| can build large, complex projects on top of it and expect
| that it will always work and do the right thing".
|
| Also, the converse is _not_ true: you can 't replace
| PostgreSQL with a K-V store, or a document database, or a
| graph database. It has a million convenient features that
| nothing outside of another ACID RDBMS could fully support
| without massive engineering efforts. Unless I _know_ beyond
| the shadow of doubt that I 'd need, say, a dedicated document
| database for a specific reason, I'd infinitely rather start
| on PostgreSQL and use it until we outgrow it, than start with
| something like MongoDB and find out that we actually need
| some ACID or relational features that don't exist in Mongo.
| PeterisP wrote:
| Thing is, if the primary source of truth is in the database,
| often it's beneficial to do the computation as close to the
| data as possible, as the bottleneck is likely to be I/O, not
| computation.
|
| "just load the data into memory" requires the database to
| provide to the client all the data that might be needed for
| that algorithm, which is more load on the DB server than
| scanning the (much smaller) subset of the data that actually
| gets traversed during the algorithm directly on the database
| itself and providing just the results, as that requires less
| I/O. That's the reason why it's often more efficient (even if
| we care only about the server-side load) to do filtering,
| aggregation and summarization for tabular data in the DB
| instead of loading all the data into a client and calculating
| it there, and the same applies for graph data.
| verelo wrote:
| Yeah i see value but tend to agree it's catering to an
| interesting segment that i don't understand either.
|
| More generally, I get concerned when too much functionality
| ends up in the db as it's often the hardest thing to scale, so
| I'd prefer to spread those cycles out elsewhere. I feel this is
| likely to be something someone relies on and causes a
| production db to get pretty warm at a bad time.
| Keyframe wrote:
| I agree, but PostgreSQL is one of those projects where I'd
| rather have the kitchen sink in and believe it will work,
| almost blindly.
| vvern wrote:
| It's not the hardest thing to scale anymore. Engineering
| teams are pretty hard to scale.
| hans_castorp wrote:
| If you have a mix of "normal" relational data and some
| hierarchical data then it absolutely makes sense to stick with
| a relational database.
|
| And why use a full blown graph database if just want to model
| (and query) a strictly hierarchical structure? e.g. categories
| that can contain categories or similar things.
|
| And it's part of the SQL standard, so it makes sense to add it
| to be more compliant with it.
| baq wrote:
| if i have the data in the DB, why would i bother to load it to
| memory if i can just query it to get what i want...?
|
| it's a generic DB which now started to provide DFS/BFS - you
| could argue that B-trees are way more advanced than this.
| derefr wrote:
| Are you aware of https://pgrouting.org/ ?
|
| As it turns out, Postgres _is_ a very effective /efficient
| graph database, and is used as such quite often.
|
| But to address a more core issue:
|
| > if you only have a database, everything looks like a SQL
| query
|
| Postgres, here, is just implementing SQL standard features. And
| SQL is supposed to be the be-all end-all _Standard_ Query
| Language.
|
| SQL makes no claims about being suited _only_ for querying
| relational databases. (It uses a lot of relational terminology,
| but that's because _relational algebra_ is the strict superset
| of other querying models, in the same sense that a Turing
| machine is the strict superset of other automata. If you want a
| fully-general query language that can be used to query
| _anything_ , it's going to need to have relational algebra
| _somewhere_ in it, at least for the cases where your query is
| _about_ relations and can't be expressed in any more compact
| /abstract way.)
|
| Graph databases could--and probably should!--expose SQL-based
| query parsers for their querying engine, along with the bevy of
| other more graph-focused languages they support. (Yes, SQL
| isn't as efficient of a _syntax_ for making graph queries than
| e.g. Datalog, but SQL can still _express_ said queries; and a
| query parser can turn those SQL-expressed graph queries into
| the very same query plans it'd build from a Datalog query.)
|
| And, if you want to be able to query a graph database using
| SQL, you need features like this in SQL. Or rather, to reverse
| that: given that the SQL committee wants SQL to be "the"
| standard for querying, they're going to add features like this
| to SQL to ensure that it _is_ a viable query language for
| things like graph databases to expose.
|
| That being said, you can think of this move by PostgreSQL less
| as "Postgres trying to be a graph database" and more as
| "Postgres ensuring that, when porting your standard SQL query--
| which you may have very well written originally to talk to an
| SQL-speaking graph database--that you don't have to rewrite the
| query, but can continue to use standard SQL features." Think of
| it like a "compatibility feature" allowing Postgres to "host"
| this foreign model.
|
| (Which is not to say Postgres is not _good_ at being a graph
| database. This "compatibility" is first-class! As I mentioned
| above.)
|
| ----------
|
| P.S.: The deeper issue is that nobody other than database
| maintainers is really aware of what exactly is _in_ the SQL
| standard; probably because it's a proprietary document that you
| have to pay for access to.
|
| On top of that, there's _also_ no secondary source that lays
| out what the SQL standard defines, vs. what various DBMSes
| actually implement. There's no CanIUse.com for SQL features.
|
| Really, there's not even anybody out there making blog-posts
| about newly-standardized not-yet-universally-supported SQL
| features, the way people make blog-posts about newly-
| standardized not-yet-universally-supported JS features.
|
| Instead, all most people know about SQL is the particular
| dialect _their_ DBMS of choice implements; and they only become
| aware that certain features are in SQL at all, when their DBMS-
| of-choice implements those features.
|
| It's all faintly ridiculous, and reeks of the siloed enterprise
| distribution model of the 1980s. (Remember back before POSIX
| was a thing, when shell scripts were often written to assume
| the flavour of Unix the writer was familiar with--usually the
| one created+maintained by the vendor of their mainframes and
| workstations? Somehow, despite having the POSIX equivalent--the
| SQL standard--we're still stuck in that _culture_ in database
| land.)
| robertlagrant wrote:
| > And SQL is supposed to be the be-all end-all Standard Query
| Language.
|
| It's Structured Query Language.
| bmn__ wrote:
| > There's no CanIUse.com for SQL features.
|
| Yes, there is. Read _Modern SQL_ , sections _concept_ and
| _feature_ , subsection _compatibility_. Examples:
|
| * https://modern-sql.com/concept/three-valued-
| logic#compatibil...
|
| * https://modern-sql.com/feature/case#compatibility
| jpalomaki wrote:
| Replicating state to multiple places can add quite much
| complexity to application development and operations.
| traceroute66 wrote:
| Its all well and good to sit there and say "get a dedicated
| graph database".
|
| But there are two problems with that :
|
| (a) In many cases, running a graph database will require
| getting involved with the monstrosity that is managing Java
| (since that is what most of them are written in). Managing Java
| is all well and good if you (or your team) have experience. For
| mere mortals its a nightmare to manage.
|
| (b) In the case of pretty much all graph databases, you only
| get basic features in the free / open-source version. If you
| want access to any genuinely useful features, you have to pay
| for "enterprise" with the associated $$$$$$$$$$.
|
| Meanwhile Postgres gives everyone a solid database, with all
| the features included out of the box, no pay to play.
|
| So what if you need to - shock horror - actually learn how a
| database works and write some proper SQL queries. The reason so
| many people have horror stories about SQL databases is because
| they let some inept developers loose on their RDBMS who treat a
| database like a black box dumping ground with zero thought
| about table design, index design, SQL queries or anything else.
|
| Rant over. ;-)
| pas wrote:
| Not to mention that dumping "big data" amounts of stuff into
| any system and running complex search and sort computations
| on them is still a complexity theoretically hard problem.
|
| SQL or dedicated graph DB won't save anyone from coughing up
| the resources.
|
| Sure, it's easy to look at SQL and see the big ugly hundred
| line queries, but they would be hundred line code for graph
| DBs too.
|
| It's not a big surprise that FB for example for a few years
| just treated everything that stored data as a fast NAND store
| with some compaction algorithm on top plus a fancy API. SQL
| is that. (There was a great talk about this but I haven't
| managed to find it :/ )
| WJW wrote:
| I don't mind writing SQL queries, quite the opposite in fact.
| In all my jobs so far I've been the go-to guy for rewriting
| queries to be more efficient. Don't know what you are ranting
| at tbh.
|
| It's just that the UNION ALL trick to fetch all related rows
| and then join them seems like it would need to rebuild (the
| relevant part of) the graph for each query and that simply
| "looks inefficient" compared to keeping all data in graph
| form all the time. If you are at a scale that you don't have
| to worry yet about CPU efficiency in your database layer, by
| all means just stick it in Postgres.
| matwood wrote:
| There are a few reasons why doing things like this in the RDBMS
| is a good thing for many use cases.
|
| First, there is a ton of already built tooling for dealing with
| an RDBMS. This alone is why I think every project should
| default to using an RDBMS and only move if a good case can be
| made.
|
| Second, like it or not, sql is the defacto language for
| querying the data. After 20+ years of writing software, I've
| used many languages, but sql (of course it has evolved) has
| been a constant across RDBMS. I've seen many service layers and
| front ends change over the years, all while sitting on the same
| RDBMS. Given the speed at which tech moves, it's pretty amazing
| when you think about it.
|
| Finally, for things like graphs or json, it's often the case
| that I only need handle it _in addition_ to my existing
| relational datastore. In some cases it might make sense to
| stand up an entire system for the use case (like a solr server
| for document searching), but many times I just need to store a
| bit of json or I have one or two graphs that need to be
| represented. In these cases, standing up an entire separate
| system adds unneeded complexity.
| hans_castorp wrote:
| > I've seen many service layers and front ends change over
| the years, all while sitting on the same RDBMS
|
| This.
|
| And the youngsters never believe me, when I tell them that
| the application that they are writing is not the only one and
| will not be the last one to access the database.
| halostatue wrote:
| My standard line for decades has been that the database is
| more important from the program. You can reconstruct how
| the program works from the database; you can't reconstruct
| the database from how the program works.
|
| A business might be interrupted if their programs accessing
| the database break. A business will _die_ if their database
| is lost.
| mattbuilds wrote:
| Reminds me of the Fred Brooks quote, "Show me your
| flowchart and conceal your tables, and I shall continue
| to be mystified. Show me your tables, and I won't usually
| need your flowchart; it'll be obvious."
|
| It always starts with the data.
| pmontra wrote:
| I'm currently staring at a PostgreSQL database created by
| a Django app with a lot of inheritance between models.
|
| My favorite /s is the table with a single column
| containing the id of the record itself.
|
| It's basically impossible to start from the database and
| understand what's going on. It will be a huge pain to
| interface to this database from something else.
|
| On the other side, I'm maintaining another Django app for
| the same customer, with saner tables (no inheritance) and
| that's on par with other ORMs.
|
| My take is never use inheritance with Django.
| touisteur wrote:
| Never use inheritance at all? I was thinking you could
| reconstruct part of the app stack by observing and
| clustering the requests and transactions. But then some
| people think join is the devil and you'd be back to
| square one. In a perfect world every app would have only
| views over what they need and prepared requests. But then
| CI/CD and all kind of tests can become very painful...
| kall wrote:
| > there is a ton of already built tooling
|
| Diving more into postgres recently, I'm kind of surprised
| there is not the kind of tooling I know from (frontend)
| development. I would have expected there to be a number of
| tools for: version control, CI/CD, (integration) testing,
| performance monitoring, safe migration, developing stored
| procedures (in vscode, with hot reload), data shuffling
| (snaphshots, rollbacks...). Both open source and startups,
| SaaS services with online dashboards etc.
|
| There are some ok options in these categories but nothing
| like the explosion of open source tooling out of bigcos and
| devtools/SaaS startups you see elsewhere.
|
| There's... aurora, prisma, some pg extensions to get
| performance info?
| underbluewaters wrote:
| Absolutely! I evaluated a bunch of migration management
| tools recently and was very disappointed. I needed
| something to develop and version stored procedures and just
| didn't find much. Ended up using graphile-migrate which has
| been great but also tightly bound to other parts of my
| stack. Would have expected more options from a 30+ year old
| technology.
| FrancoisBosun wrote:
| If you haven't, Sqitch[1] is a great tool.
|
| [1] https://sqitch.org
| ncrmro wrote:
| Just started using graphile migrate after hand rolling
| some shell scripts initially
|
| And working pretty nicely so far
| mdavidn wrote:
| These iterative queries convert hierarchical (not graph) data
| into relational data. That means queries can perform
| aggregations and joins against other relations, which is
| obviously useful in a relational database.
| mojzu wrote:
| I use recursive queries like these at work, for hierarchical
| accounts/groups/locations/equipment/etc. where each level can
| have permissions that cascade down or not. Loading the hundreds
| of rows into memory necessary to calculate what any particular
| user can see is very costly in terms of network time,
| particularly for deeply nested stuff. Much easier to do the
| legwork in SQL and return the relatively tiny result set. There
| are certainly other tools that could handle these relationships
| more gracefully, but they also come with the questions of code
| integration time/backups/redundancy/scaling/etc, which postgres
| already has good answers for
| donaldihunter wrote:
| This inspired me to install --HEAD which I have been considering
| doing for a while.
| emgo wrote:
| Cool feature, however I'm always worried of anything that
| encourages developers to push more of the business logic from the
| code towards database queries.
|
| Of course you have to balance that out with the cost/time of
| retrieving a possibly large amount of data for that computation,
| and computing locally, which is what these SEARCH and CYCLE
| features aim at preventing.
|
| Still, I'd think twice before jumping into using them, to ensure
| that making the queries more complex does bring enough bang for
| the buck.
| carlisle_ wrote:
| I'm always so impressed by Postgres' features and performance. I
| am unfortunately not as impressed with Postgres' operational
| model. As a disclaimer, I'm not highly familiar with very many
| databases, but it seems like Postgres sticks out as a poorly
| suited for robustly automated infrastructure.
|
| Maybe I'm totally wrong on this, looking forward to reading other
| perspectives.
| wdb wrote:
| Sounds interesting. Recently, I have been looking for a quick
| approach to make a clone of a production database for use in
| staging.
|
| The idea to avoid failing migrations by applying against a thin
| clone of production, as somehow migrations fail in production but
| work fine in staging. Anyone aware of a quick way to make a
| clone?
|
| Currently copying a 0.5TB db takes 3 hours while I am dreaming of
| like minutes.
| aeontech wrote:
| Look at pgsync or pgloader
|
| https://news.ycombinator.com/item?id=22676112
|
| There is also interesting work around using filesystem volume
| snapshots or clones for database cloning, but that requires
| infrastructural changes... I think there was a startup around
| that but I can't find the name :)
|
| https://www.sedlakovi.org/blog/2019/03/fast-postgres-snapsho...
| [deleted]
| samgranieri wrote:
| I love checking this site to see the cool stuff that makes it's
| way into postgres each year. Fantastic work to all the postgres
| committers, cheers all around.
|
| This feature looks really cool
| mattashii wrote:
| > Fantastic work to all the postgres committers, cheers all
| around.
|
| Not to forget the postgres contributors: only a portion of the
| people who write code for postgres are 'committers'. I do agree
| that the committers do amazing work and play an essential role
| in postgres' overall ecosystem, but we should not forget that a
| not insignificant part of postgres' code, code reviews and
| bugfixes is provided by non-committers.
| hestefisk wrote:
| Very valuable. Do other DBMSes have this feature? It's quite
| incredible how much you can put the database to work. So much
| less application code to write.
| szarnyasg wrote:
| The authors of paper "Graph Processing in RDBMSs" surveyed
| graph features in 2017 among 4 popular systems:
|
| http://sites.computer.org/debull/A17sept/p6.pdf (page 6, Table
| 1)
|
| They found that only Oracle supported the CYCLE keyword, DB2
| SQL Server, and Postgres (version 9 at the time) didn't. SQL
| Server now has a graph extension so that might have changed.
| MySQL has introduced support for WITH RECURSIVE in 2017 but it
| does not support the CYCLE keyword.
| hans_castorp wrote:
| I know that Oracle supports the CYCLE and the SEARCH clause in
| recursive CTEs
___________________________________________________________________
(page generated 2021-02-04 23:01 UTC)