hngopher.com

       [HN Gopher] Waiting for PostgreSQL 14 - SEARCH and CYCLE clauses
       ___________________________________________________________________
        
       Waiting for PostgreSQL 14 - SEARCH and CYCLE clauses
        
       Author : todsacerdoti
       Score  : 263 points
       Date   : 2021-02-04 12:17 UTC (10 hours ago)
        
 (HTM) web link (www.depesz.com)
 (TXT) w3m dump (www.depesz.com)
        
       | truth_seeker wrote:
       | Love it.
       | 
       | Also it would be cool if the computation can be pushed down in
       | case of using remote db tables using fdw or db link.
        
       | WJW wrote:
       | Is it just me or is this a severe case of "if you only have a
       | database, everything looks like a SQL query"? Don't get me wrong,
       | I think it's incredible that Postgres is flexible enough to do
       | this, but there are better data structures for graphs than
       | tables.
       | 
       | If you need to do this at relatively small scale, just load the
       | data into memory and don't bother with doing it in the database.
       | If you need to do this at large scale, get a dedicated graph
       | database. I guess I just don't see the niche where doing it in
       | Postgres fits in. Anyone with a real world example? I'd love to
       | be proven wrong here since it's definitely cool.
        
         | marcosdumay wrote:
         | It's a case of "I want to keep all my data on the same place"
         | because nobody has the attention span to manage a relational
         | database, a document database, an hierarchical database, and a
         | key-value database (did I miss any?). All that while you lose
         | atomicity, because your data is now distributed.
         | 
         | Also, data often changes in form, and you don't want to rewrite
         | everything to get the data from a different place just because
         | its type changed.
        
           | kstrauser wrote:
           | I've heard it said of Python that it's the second best
           | language for doing anything. For one specific task, there's
           | probably something more tailored to the purpose. For
           | integrating lots of different tasks into a single project,
           | there's nothing that's better at doing everything _pretty
           | well_.
           | 
           | I see PostgreSQL the same way (although I don't think there's
           | a better RDBMS). It's not the best document database, but
           | it's pretty good. It's not the best key-value store, but it's
           | pretty good. It's not the best graph database, but it's
           | pretty good. And here "pretty good" means "enough that you
           | can build large, complex projects on top of it and expect
           | that it will always work and do the right thing".
           | 
           | Also, the converse is _not_ true: you can 't replace
           | PostgreSQL with a K-V store, or a document database, or a
           | graph database. It has a million convenient features that
           | nothing outside of another ACID RDBMS could fully support
           | without massive engineering efforts. Unless I _know_ beyond
           | the shadow of doubt that I 'd need, say, a dedicated document
           | database for a specific reason, I'd infinitely rather start
           | on PostgreSQL and use it until we outgrow it, than start with
           | something like MongoDB and find out that we actually need
           | some ACID or relational features that don't exist in Mongo.
        
         | PeterisP wrote:
         | Thing is, if the primary source of truth is in the database,
         | often it's beneficial to do the computation as close to the
         | data as possible, as the bottleneck is likely to be I/O, not
         | computation.
         | 
         | "just load the data into memory" requires the database to
         | provide to the client all the data that might be needed for
         | that algorithm, which is more load on the DB server than
         | scanning the (much smaller) subset of the data that actually
         | gets traversed during the algorithm directly on the database
         | itself and providing just the results, as that requires less
         | I/O. That's the reason why it's often more efficient (even if
         | we care only about the server-side load) to do filtering,
         | aggregation and summarization for tabular data in the DB
         | instead of loading all the data into a client and calculating
         | it there, and the same applies for graph data.
        
         | verelo wrote:
         | Yeah i see value but tend to agree it's catering to an
         | interesting segment that i don't understand either.
         | 
         | More generally, I get concerned when too much functionality
         | ends up in the db as it's often the hardest thing to scale, so
         | I'd prefer to spread those cycles out elsewhere. I feel this is
         | likely to be something someone relies on and causes a
         | production db to get pretty warm at a bad time.
        
           | Keyframe wrote:
           | I agree, but PostgreSQL is one of those projects where I'd
           | rather have the kitchen sink in and believe it will work,
           | almost blindly.
        
           | vvern wrote:
           | It's not the hardest thing to scale anymore. Engineering
           | teams are pretty hard to scale.
        
         | hans_castorp wrote:
         | If you have a mix of "normal" relational data and some
         | hierarchical data then it absolutely makes sense to stick with
         | a relational database.
         | 
         | And why use a full blown graph database if just want to model
         | (and query) a strictly hierarchical structure? e.g. categories
         | that can contain categories or similar things.
         | 
         | And it's part of the SQL standard, so it makes sense to add it
         | to be more compliant with it.
        
         | baq wrote:
         | if i have the data in the DB, why would i bother to load it to
         | memory if i can just query it to get what i want...?
         | 
         | it's a generic DB which now started to provide DFS/BFS - you
         | could argue that B-trees are way more advanced than this.
        
         | derefr wrote:
         | Are you aware of https://pgrouting.org/ ?
         | 
         | As it turns out, Postgres _is_ a very effective /efficient
         | graph database, and is used as such quite often.
         | 
         | But to address a more core issue:
         | 
         | > if you only have a database, everything looks like a SQL
         | query
         | 
         | Postgres, here, is just implementing SQL standard features. And
         | SQL is supposed to be the be-all end-all _Standard_ Query
         | Language.
         | 
         | SQL makes no claims about being suited _only_ for querying
         | relational databases. (It uses a lot of relational terminology,
         | but that's because _relational algebra_ is the strict superset
         | of other querying models, in the same sense that a Turing
         | machine is the strict superset of other automata. If you want a
         | fully-general query language that can be used to query
         | _anything_ , it's going to need to have relational algebra
         | _somewhere_ in it, at least for the cases where your query is
         | _about_ relations and can't be expressed in any more compact
         | /abstract way.)
         | 
         | Graph databases could--and probably should!--expose SQL-based
         | query parsers for their querying engine, along with the bevy of
         | other more graph-focused languages they support. (Yes, SQL
         | isn't as efficient of a _syntax_ for making graph queries than
         | e.g. Datalog, but SQL can still _express_ said queries; and a
         | query parser can turn those SQL-expressed graph queries into
         | the very same query plans it'd build from a Datalog query.)
         | 
         | And, if you want to be able to query a graph database using
         | SQL, you need features like this in SQL. Or rather, to reverse
         | that: given that the SQL committee wants SQL to be "the"
         | standard for querying, they're going to add features like this
         | to SQL to ensure that it _is_ a viable query language for
         | things like graph databases to expose.
         | 
         | That being said, you can think of this move by PostgreSQL less
         | as "Postgres trying to be a graph database" and more as
         | "Postgres ensuring that, when porting your standard SQL query--
         | which you may have very well written originally to talk to an
         | SQL-speaking graph database--that you don't have to rewrite the
         | query, but can continue to use standard SQL features." Think of
         | it like a "compatibility feature" allowing Postgres to "host"
         | this foreign model.
         | 
         | (Which is not to say Postgres is not _good_ at being a graph
         | database. This "compatibility" is first-class! As I mentioned
         | above.)
         | 
         | ----------
         | 
         | P.S.: The deeper issue is that nobody other than database
         | maintainers is really aware of what exactly is _in_ the SQL
         | standard; probably because it's a proprietary document that you
         | have to pay for access to.
         | 
         | On top of that, there's _also_ no secondary source that lays
         | out what the SQL standard defines, vs. what various DBMSes
         | actually implement. There's no CanIUse.com for SQL features.
         | 
         | Really, there's not even anybody out there making blog-posts
         | about newly-standardized not-yet-universally-supported SQL
         | features, the way people make blog-posts about newly-
         | standardized not-yet-universally-supported JS features.
         | 
         | Instead, all most people know about SQL is the particular
         | dialect _their_ DBMS of choice implements; and they only become
         | aware that certain features are in SQL at all, when their DBMS-
         | of-choice implements those features.
         | 
         | It's all faintly ridiculous, and reeks of the siloed enterprise
         | distribution model of the 1980s. (Remember back before POSIX
         | was a thing, when shell scripts were often written to assume
         | the flavour of Unix the writer was familiar with--usually the
         | one created+maintained by the vendor of their mainframes and
         | workstations? Somehow, despite having the POSIX equivalent--the
         | SQL standard--we're still stuck in that _culture_ in database
         | land.)
        
           | robertlagrant wrote:
           | > And SQL is supposed to be the be-all end-all Standard Query
           | Language.
           | 
           | It's Structured Query Language.
        
           | bmn__ wrote:
           | > There's no CanIUse.com for SQL features.
           | 
           | Yes, there is. Read _Modern SQL_ , sections _concept_ and
           | _feature_ , subsection _compatibility_. Examples:
           | 
           | * https://modern-sql.com/concept/three-valued-
           | logic#compatibil...
           | 
           | * https://modern-sql.com/feature/case#compatibility
        
         | jpalomaki wrote:
         | Replicating state to multiple places can add quite much
         | complexity to application development and operations.
        
         | traceroute66 wrote:
         | Its all well and good to sit there and say "get a dedicated
         | graph database".
         | 
         | But there are two problems with that :
         | 
         | (a) In many cases, running a graph database will require
         | getting involved with the monstrosity that is managing Java
         | (since that is what most of them are written in). Managing Java
         | is all well and good if you (or your team) have experience. For
         | mere mortals its a nightmare to manage.
         | 
         | (b) In the case of pretty much all graph databases, you only
         | get basic features in the free / open-source version. If you
         | want access to any genuinely useful features, you have to pay
         | for "enterprise" with the associated $$$$$$$$$$.
         | 
         | Meanwhile Postgres gives everyone a solid database, with all
         | the features included out of the box, no pay to play.
         | 
         | So what if you need to - shock horror - actually learn how a
         | database works and write some proper SQL queries. The reason so
         | many people have horror stories about SQL databases is because
         | they let some inept developers loose on their RDBMS who treat a
         | database like a black box dumping ground with zero thought
         | about table design, index design, SQL queries or anything else.
         | 
         | Rant over. ;-)
        
           | pas wrote:
           | Not to mention that dumping "big data" amounts of stuff into
           | any system and running complex search and sort computations
           | on them is still a complexity theoretically hard problem.
           | 
           | SQL or dedicated graph DB won't save anyone from coughing up
           | the resources.
           | 
           | Sure, it's easy to look at SQL and see the big ugly hundred
           | line queries, but they would be hundred line code for graph
           | DBs too.
           | 
           | It's not a big surprise that FB for example for a few years
           | just treated everything that stored data as a fast NAND store
           | with some compaction algorithm on top plus a fancy API. SQL
           | is that. (There was a great talk about this but I haven't
           | managed to find it :/ )
        
           | WJW wrote:
           | I don't mind writing SQL queries, quite the opposite in fact.
           | In all my jobs so far I've been the go-to guy for rewriting
           | queries to be more efficient. Don't know what you are ranting
           | at tbh.
           | 
           | It's just that the UNION ALL trick to fetch all related rows
           | and then join them seems like it would need to rebuild (the
           | relevant part of) the graph for each query and that simply
           | "looks inefficient" compared to keeping all data in graph
           | form all the time. If you are at a scale that you don't have
           | to worry yet about CPU efficiency in your database layer, by
           | all means just stick it in Postgres.
        
         | matwood wrote:
         | There are a few reasons why doing things like this in the RDBMS
         | is a good thing for many use cases.
         | 
         | First, there is a ton of already built tooling for dealing with
         | an RDBMS. This alone is why I think every project should
         | default to using an RDBMS and only move if a good case can be
         | made.
         | 
         | Second, like it or not, sql is the defacto language for
         | querying the data. After 20+ years of writing software, I've
         | used many languages, but sql (of course it has evolved) has
         | been a constant across RDBMS. I've seen many service layers and
         | front ends change over the years, all while sitting on the same
         | RDBMS. Given the speed at which tech moves, it's pretty amazing
         | when you think about it.
         | 
         | Finally, for things like graphs or json, it's often the case
         | that I only need handle it _in addition_ to my existing
         | relational datastore. In some cases it might make sense to
         | stand up an entire system for the use case (like a solr server
         | for document searching), but many times I just need to store a
         | bit of json or I have one or two graphs that need to be
         | represented. In these cases, standing up an entire separate
         | system adds unneeded complexity.
        
           | hans_castorp wrote:
           | > I've seen many service layers and front ends change over
           | the years, all while sitting on the same RDBMS
           | 
           | This.
           | 
           | And the youngsters never believe me, when I tell them that
           | the application that they are writing is not the only one and
           | will not be the last one to access the database.
        
             | halostatue wrote:
             | My standard line for decades has been that the database is
             | more important from the program. You can reconstruct how
             | the program works from the database; you can't reconstruct
             | the database from how the program works.
             | 
             | A business might be interrupted if their programs accessing
             | the database break. A business will _die_ if their database
             | is lost.
        
               | mattbuilds wrote:
               | Reminds me of the Fred Brooks quote, "Show me your
               | flowchart and conceal your tables, and I shall continue
               | to be mystified. Show me your tables, and I won't usually
               | need your flowchart; it'll be obvious."
               | 
               | It always starts with the data.
        
               | pmontra wrote:
               | I'm currently staring at a PostgreSQL database created by
               | a Django app with a lot of inheritance between models.
               | 
               | My favorite /s is the table with a single column
               | containing the id of the record itself.
               | 
               | It's basically impossible to start from the database and
               | understand what's going on. It will be a huge pain to
               | interface to this database from something else.
               | 
               | On the other side, I'm maintaining another Django app for
               | the same customer, with saner tables (no inheritance) and
               | that's on par with other ORMs.
               | 
               | My take is never use inheritance with Django.
        
               | touisteur wrote:
               | Never use inheritance at all? I was thinking you could
               | reconstruct part of the app stack by observing and
               | clustering the requests and transactions. But then some
               | people think join is the devil and you'd be back to
               | square one. In a perfect world every app would have only
               | views over what they need and prepared requests. But then
               | CI/CD and all kind of tests can become very painful...
        
           | kall wrote:
           | > there is a ton of already built tooling
           | 
           | Diving more into postgres recently, I'm kind of surprised
           | there is not the kind of tooling I know from (frontend)
           | development. I would have expected there to be a number of
           | tools for: version control, CI/CD, (integration) testing,
           | performance monitoring, safe migration, developing stored
           | procedures (in vscode, with hot reload), data shuffling
           | (snaphshots, rollbacks...). Both open source and startups,
           | SaaS services with online dashboards etc.
           | 
           | There are some ok options in these categories but nothing
           | like the explosion of open source tooling out of bigcos and
           | devtools/SaaS startups you see elsewhere.
           | 
           | There's... aurora, prisma, some pg extensions to get
           | performance info?
        
             | underbluewaters wrote:
             | Absolutely! I evaluated a bunch of migration management
             | tools recently and was very disappointed. I needed
             | something to develop and version stored procedures and just
             | didn't find much. Ended up using graphile-migrate which has
             | been great but also tightly bound to other parts of my
             | stack. Would have expected more options from a 30+ year old
             | technology.
        
               | FrancoisBosun wrote:
               | If you haven't, Sqitch[1] is a great tool.
               | 
               | [1] https://sqitch.org
        
               | ncrmro wrote:
               | Just started using graphile migrate after hand rolling
               | some shell scripts initially
               | 
               | And working pretty nicely so far
        
         | mdavidn wrote:
         | These iterative queries convert hierarchical (not graph) data
         | into relational data. That means queries can perform
         | aggregations and joins against other relations, which is
         | obviously useful in a relational database.
        
         | mojzu wrote:
         | I use recursive queries like these at work, for hierarchical
         | accounts/groups/locations/equipment/etc. where each level can
         | have permissions that cascade down or not. Loading the hundreds
         | of rows into memory necessary to calculate what any particular
         | user can see is very costly in terms of network time,
         | particularly for deeply nested stuff. Much easier to do the
         | legwork in SQL and return the relatively tiny result set. There
         | are certainly other tools that could handle these relationships
         | more gracefully, but they also come with the questions of code
         | integration time/backups/redundancy/scaling/etc, which postgres
         | already has good answers for
        
       | donaldihunter wrote:
       | This inspired me to install --HEAD which I have been considering
       | doing for a while.
        
       | emgo wrote:
       | Cool feature, however I'm always worried of anything that
       | encourages developers to push more of the business logic from the
       | code towards database queries.
       | 
       | Of course you have to balance that out with the cost/time of
       | retrieving a possibly large amount of data for that computation,
       | and computing locally, which is what these SEARCH and CYCLE
       | features aim at preventing.
       | 
       | Still, I'd think twice before jumping into using them, to ensure
       | that making the queries more complex does bring enough bang for
       | the buck.
        
       | carlisle_ wrote:
       | I'm always so impressed by Postgres' features and performance. I
       | am unfortunately not as impressed with Postgres' operational
       | model. As a disclaimer, I'm not highly familiar with very many
       | databases, but it seems like Postgres sticks out as a poorly
       | suited for robustly automated infrastructure.
       | 
       | Maybe I'm totally wrong on this, looking forward to reading other
       | perspectives.
        
       | wdb wrote:
       | Sounds interesting. Recently, I have been looking for a quick
       | approach to make a clone of a production database for use in
       | staging.
       | 
       | The idea to avoid failing migrations by applying against a thin
       | clone of production, as somehow migrations fail in production but
       | work fine in staging. Anyone aware of a quick way to make a
       | clone?
       | 
       | Currently copying a 0.5TB db takes 3 hours while I am dreaming of
       | like minutes.
        
         | aeontech wrote:
         | Look at pgsync or pgloader
         | 
         | https://news.ycombinator.com/item?id=22676112
         | 
         | There is also interesting work around using filesystem volume
         | snapshots or clones for database cloning, but that requires
         | infrastructural changes... I think there was a startup around
         | that but I can't find the name :)
         | 
         | https://www.sedlakovi.org/blog/2019/03/fast-postgres-snapsho...
        
       | [deleted]
        
       | samgranieri wrote:
       | I love checking this site to see the cool stuff that makes it's
       | way into postgres each year. Fantastic work to all the postgres
       | committers, cheers all around.
       | 
       | This feature looks really cool
        
         | mattashii wrote:
         | > Fantastic work to all the postgres committers, cheers all
         | around.
         | 
         | Not to forget the postgres contributors: only a portion of the
         | people who write code for postgres are 'committers'. I do agree
         | that the committers do amazing work and play an essential role
         | in postgres' overall ecosystem, but we should not forget that a
         | not insignificant part of postgres' code, code reviews and
         | bugfixes is provided by non-committers.
        
       | hestefisk wrote:
       | Very valuable. Do other DBMSes have this feature? It's quite
       | incredible how much you can put the database to work. So much
       | less application code to write.
        
         | szarnyasg wrote:
         | The authors of paper "Graph Processing in RDBMSs" surveyed
         | graph features in 2017 among 4 popular systems:
         | 
         | http://sites.computer.org/debull/A17sept/p6.pdf (page 6, Table
         | 1)
         | 
         | They found that only Oracle supported the CYCLE keyword, DB2
         | SQL Server, and Postgres (version 9 at the time) didn't. SQL
         | Server now has a graph extension so that might have changed.
         | MySQL has introduced support for WITH RECURSIVE in 2017 but it
         | does not support the CYCLE keyword.
        
         | hans_castorp wrote:
         | I know that Oracle supports the CYCLE and the SEARCH clause in
         | recursive CTEs
        
       ___________________________________________________________________
       (page generated 2021-02-04 23:01 UTC)