[HN Gopher] Hierarchical Structures in PostgreSQL (2020)
       ___________________________________________________________________
        
       Hierarchical Structures in PostgreSQL (2020)
        
       Author : belter
       Score  : 180 points
       Date   : 2021-06-25 15:10 UTC (7 hours ago)
        
 (HTM) web link (hoverbear.org)
 (TXT) w3m dump (hoverbear.org)
        
       | mjburgess wrote:
       | Is any one aware of a language (or dsl) which will compile to
       | recursive CTEs?
       | 
       | It'd be helpful to see queries at the domain level to understand
       | how they fit into a problem; rather than this weird embedding of
       | some implicit idea.
        
         | handrous wrote:
         | There's the AGE plugin:
         | 
         | https://age.apache.org
         | 
         | Dunno how close it is to being ready for production, but it
         | lets you use Cypher in PostgreSQL, which is a pretty damn nice
         | language for doing many of the things you'd use recursive CTEs
         | for. I don't much care for Neo4j (the "home" database for the
         | Cypher language) but Cypher is good.
        
         | jitl wrote:
         | I spent some time looking for this specifically to translate
         | Datalog or Prolog to SQL. There's theoretical work out there
         | but I didn't find anything plug and play.
        
           | refset wrote:
           | Not exactly plug and play, but for BigQuery there's now
           | Logica https://opensource.googleblog.com/2021/04/logica-
           | organizing-...
        
       | simonw wrote:
       | This is cool - I hadn't seen the materialized view + CTE trick
       | before, or the ltree extension.
       | 
       | I've implemented this in the past using the classic adjacency
       | list, materialized path or nested sets mechanisms - all by using
       | Django Treebeard, which offers a very robust implementation of
       | all three of them: https://django-
       | treebeard.readthedocs.io/en/latest/
        
       | petepete wrote:
       | There's a good chapter on this in SQL Antipatterns, it covers
       | four or five different approaches in depth.
       | 
       | I always just use a parent_id and recursive cte - it's plenty
       | fast even without materialised views.
       | 
       | https://pragprog.com/titles/bksqla/sql-antipatterns/
        
         | dspillett wrote:
         | Careful with CTEs in older postgres. Until the most recent
         | versions they were an optimisation fence (predicate push-down
         | would not pass through, meaning the CTE would often result in
         | an index or table scan, potentially multiple times in the
         | recursive case. This can be a huge problem over large data,
         | though obviously in many circumstances with small enough data
         | it is still efficient enough.
        
           | paulddraper wrote:
           | That concern doesn't really apply to recursive CTEs though.
           | 
           | Or rather, recursive CTEs inherently require that you select
           | only what you need.
           | 
           | A little difference that the non-recursive case of "I'll just
           | use a CTE for same functionality but better readability"
           | which can deceptively bite you prior to PostgreSQL 12.
        
       | vore wrote:
       | This is use case dependent, but one of my favorite methods for
       | implementing e.g. threaded comments is closure tables:
       | create table post_paths (           descendant_id bigint not null
       | references posts (id) on delete cascade on update cascade,
       | ancestor_id bigint not null references posts (id) on delete
       | cascade on update cascade,           depth int not null,
       | primary key (descendant_id, ancestor_id)       );
       | 
       | At the cost of O(n^2) rows per hierarchy, it will allow you to do
       | some pretty nifty things, most notably being able to get a count
       | of all descendants of a parent in O(n) time. Updating entries may
       | be painful, though.
        
       | _boffin_ wrote:
       | If this topic interests people, take a look at Joe Celko's book
       | called Trees and Hierarchies: https://www.amazon.com/Hierarchies-
       | Smarties-Kaufmann-Managem...
        
         | WalterGR wrote:
         | Available on what was formerly O'Reilly Safari:
         | https://learning.oreilly.com/library/view/joe-celkos-trees/9...
        
       | butonic wrote:
       | A great overview of the pros and cons of different approaches is
       | given in https://de.slideshare.net/billkarwin/models-for-
       | hierarchical...
        
         | jasonwatkinspdx wrote:
         | This is a great slide deck and directly follows what I learned
         | about this over the years. In nearly every case I think you
         | want a flattened table of the hierarchy. I hadn't heard the
         | name closure table before so thanks.
        
           | tejtm wrote:
           | Not sure the age on that slide deck but Sqlite has definitely
           | supported recursive table expressions for years now. In some
           | ways it is more permissive with syntax allowed in the
           | recursive portion than Postgres.
        
             | zozbot234 wrote:
             | Yes, many RDBMS's have gotten support for recursive CTE's
             | as of recently. It's definitely not a niche feature
             | anymore.
        
               | jasonwatkinspdx wrote:
               | Yeah, and it's a straightforward standardized way to
               | solve this.
               | 
               | But I still think it's useful to know the flattening
               | idea. I've seen it show up other places like map reduce.
               | It's a useful general idea imo.
        
         | highpost wrote:
         | https://www.youtube.com/watch?v=wuH5OoPC3hA
        
           | belter wrote:
           | As there is no context, FYI for others: The video link is the
           | presentation just above.
        
       | dln_eintr wrote:
       | Most hierarchies are shallow in practice. Keep it simple.
        
         | hu3 wrote:
         | This. Throughout my years it was very common to see juniors
         | trying to model static website dropdown menus as recursive
         | database tables.
         | 
         | In practice if a menu has more than 3 levels the user
         | experience will suffer. I keep them at 2 submenus max, possibly
         | 1 if I can get away with it.
         | 
         | Not to mention most menus are not dynamic, meaning they can be
         | just a JSON file or simple HTML.
        
           | jayknight wrote:
           | This is wise, but in the healthcare field, there are some
           | pretty huge trees of things that you need to deal with
           | sometimes. I've been involved with building out a structure a
           | lot like MeSH[0] and some disease trees similar to ICD. Some
           | of my implementations I would definitely do differently now
           | because both the tools and my experience have improved.
           | MeSH's "addresses" even match the ltree syntax, so it would
           | probably make a lot of sense to use that.
           | 
           | [0] https://www.nlm.nih.gov/mesh/intro_trees.html
        
             | hu3 wrote:
             | I feel your pain. I consulted in a hospital management
             | software and they had trees spanning 6+ levels deep and it
             | had to be dynamic too.
        
         | pphysch wrote:
         | tl;dr: I agree and healthy human organizations should never
         | scale beyond ~5 degrees of hierarchy, which is totally
         | manageable via basic recursive JOINs in a RDBMS without fancy
         | stored procedures or graph theory.
         | 
         | I like to use Dunbar's Number (100-250) to approximate the
         | levels of heirarchy in human organizations. The idea is that
         | these organizations are most efficient when organizational
         | layers don't exceed ~150 elements, due to the implementation
         | details of the human brain.
         | 
         | Basically, you can do log_{150}(N) to get a very rough idea of
         | how complex the organization of N people should be. This works
         | for small startups and entire countries. Of course, startups
         | should probably get comfortable with the idea of teams well
         | before hitting >100 employees. Teams can then scale into
         | departments (with new subteams), and once there are many
         | departments, add regional layer, strategic/executive layer, and
         | so on.
         | 
         | One interesting fact is that the USA population has roughly
         | increased by a multiple of Dunbar's number since its
         | organizational structure was codified in its Constitution.
         | Perhaps time for another look?
        
           | mumblemumble wrote:
           | I would argue that Dunbar's Number is the wrong number to use
           | for this. At least not by just naively dropping it in.
           | 
           | Remember, it represents the _total_ number of stable social
           | relationships a person can maintain. If you 're looking to
           | allow your employees to have personal lives, you'll want to
           | leave ample room for their family and friends.
           | 
           | Maybe an important question to ask is, how much of your
           | employees' social-emotional carrying capacity is it
           | appropriate to consume? If 10%, then 15-25 is your number. If
           | 20%, then 30-50 is your number.
        
             | pphysch wrote:
             | It's definitely a big ballpark, but I think Dunbar's Number
             | is a good place to start. If you have managers spending 10%
             | or less of their lives on management, I don't think the
             | organization will be very healthy. Management should be a
             | high commitment, high compensation role.
             | 
             | It's also definitely a upper limit rather than lower limit.
             | Big bureaucracies with many layers of management and small
             | teams can work well, but no one can _really_ individually
             | manage 1000 subordinates.
        
           | munificent wrote:
           | _> One interesting fact is that the USA population has
           | roughly increased by a multiple of Dunbar 's number since its
           | organizational structure was codified in its Constitution.
           | Perhaps time for another look?_
           | 
           | I have had that exact same idle thought.
           | 
           | In 1813, each of the 182 US Representatives represented on
           | average ~40,000 Americans. Today, each of our 435 Reps stands
           | for about ~760,000 people. That's over an order of magnitude
           | growth. To keep the same rate of representation, we'd need to
           | have over 8,000 Representatives, which is clearly too large a
           | body to get anything done.
           | 
           | So we're probably well beyond the point where we could
           | benefit from a large House of Subrepresentatives and then a
           | smaller House of Superrepresentatives that aggregate them.
        
             | mulmen wrote:
             | Well isn't that why we have limited federal government and
             | defer most decisions to the states?
        
               | beardedetim wrote:
               | Who should then defer to county/parish, who then defer to
               | city, who then defer to neighborhood
        
               | mulmen wrote:
               | Right, that scales better than 8000 representatives in
               | the house.
        
       | st3fan wrote:
       | Pro tip: please don't use comic sans (or derivative) for serious
       | blog posts.
        
       | gnarbarian wrote:
       | This is the kind of thing I thought graph databases were good at.
       | postgres extensions for this would be very interesting.
        
         | setr wrote:
         | Most datasets have relationships forming a graph, but few need
         | to be analyzed as a graph. Graph databases specialize for the
         | latter usecase, with large datasets.
         | 
         | But you could always traverse graphs with recursive SQL -- it's
         | just less pleasant, and perhaps less performant -- but often
         | that's all you need (and often people confuse having a graph
         | relationship with needing a graph-specialized database)
        
         | zozbot234 wrote:
         | Postgres can handle a lot of generic 'graph' use cases. Unless
         | you have _very_ specialized needs, focusing on complex
         | algorithms and very big network data, it 's not at all clear
         | that you'll need more than that.
        
           | makr17 wrote:
           | And even then, it really depends on what you're trying to do.
           | 
           | My use case a few years ago was such that _every_ dedicated
           | graph engine I tried would just OOM on the test set of ~10M
           | nodes, which I attributed to insufficient pruning. Postgres
           | was able to handle it, and still worked on the production set
           | of ~1.5B nodes. The SQL took some tuning, and wasn't
           | _pretty_, but it did _work_ where nothing else did...
        
       | codingclaws wrote:
       | Peaches 'n' Stink [1] uses ltree for nested comments. All the
       | PostgreSQL is here [2]. The first query that uses ltree is on
       | line 452 [3].
       | 
       | It was hard to decide on the ltree "path" format. There's a
       | "path" value for each comment and it is what sets up the
       | hierarchy. What I ended up using was this (root level comments):
       | 
       | 1.aaa
       | 
       | 1.aab
       | 
       | 1.aac
       | 
       | 1.aad
       | 
       | .
       | 
       | .
       | 
       | .
       | 
       | 1.aaz
       | 
       | 1.aba
       | 
       | 1.abb
       | 
       | The "1" is the post ID. Sub-comments under the first comment
       | above would be:
       | 
       | 1.aaa.aaa
       | 
       | 1.aaa.aab
       | 
       | 1.aaa.aac
       | 
       | The reason why I used this "path" format is because if I sort
       | alphabetically then the comments will be in the correct
       | "vertical" order to display chronologically.
       | 
       | [1] https://www.peachesnstink.com
       | 
       | [2] https://github.com/ferg1e/peaches-n-
       | stink/blob/master/db/ind...
       | 
       | [3] https://github.com/ferg1e/peaches-n-
       | stink/blob/master/db/ind...
        
       | I_am_tiberius wrote:
       | I use ltree with UUIDs. What's really annoying is that I needed
       | to replace the '-' characters of the UUIDs to '_' in order to
       | store the hierarchy in the ltree column.
        
       | qxmat wrote:
       | If you use Postgres + Redshift on AWS (via foreign data wrapper
       | or dblink) it's worth noting that AWS has released recursive CTE
       | support in Redshift. But with that said 9 levels of UNION ALL
       | ought to be enough for anyone.
        
       | jawns wrote:
       | This seems a lot cleaner than the suggestions on Stack Overflow!
       | 
       | https://stackoverflow.com/questions/54907495/postgresql-recu...
       | 
       | Edit: Sorry for being unclear. By "this" I meant the ltree
       | solution that's listed second in the blog post.
        
         | [deleted]
        
         | polote wrote:
         | The first suggestion of the post is exactly the same as the one
         | of Stack Overflow.
        
       | Unbeliever69 wrote:
       | I feel that a discussion of ltree is incomplete without
       | discussion of manipulating the tree. e.g. splicing, cutting,
       | moving, deleting branches [1] etc. Also, how ltree uses Gist
       | indexes [2].
       | 
       | Reference: [1] http://patshaughnessy.net/2017/12/14/manipulating-
       | trees-usin... [2] http://patshaughnessy.net/2017/12/15/looking-
       | inside-postgres...
        
       | vyrotek wrote:
       | ltree looks interesting. I've used MSSQL's HierarchyID
       | extensively for this sort of thing. Are they similar or is there
       | something else like HierarchyID for PostgreSQL?
        
       ___________________________________________________________________
       (page generated 2021-06-25 23:00 UTC)