[HN Gopher] Hierarchical Structures in PostgreSQL (2020)
___________________________________________________________________
Hierarchical Structures in PostgreSQL (2020)
Author : belter
Score : 180 points
Date : 2021-06-25 15:10 UTC (7 hours ago)
(HTM) web link (hoverbear.org)
(TXT) w3m dump (hoverbear.org)
| mjburgess wrote:
| Is any one aware of a language (or dsl) which will compile to
| recursive CTEs?
|
| It'd be helpful to see queries at the domain level to understand
| how they fit into a problem; rather than this weird embedding of
| some implicit idea.
| handrous wrote:
| There's the AGE plugin:
|
| https://age.apache.org
|
| Dunno how close it is to being ready for production, but it
| lets you use Cypher in PostgreSQL, which is a pretty damn nice
| language for doing many of the things you'd use recursive CTEs
| for. I don't much care for Neo4j (the "home" database for the
| Cypher language) but Cypher is good.
| jitl wrote:
| I spent some time looking for this specifically to translate
| Datalog or Prolog to SQL. There's theoretical work out there
| but I didn't find anything plug and play.
| refset wrote:
| Not exactly plug and play, but for BigQuery there's now
| Logica https://opensource.googleblog.com/2021/04/logica-
| organizing-...
| simonw wrote:
| This is cool - I hadn't seen the materialized view + CTE trick
| before, or the ltree extension.
|
| I've implemented this in the past using the classic adjacency
| list, materialized path or nested sets mechanisms - all by using
| Django Treebeard, which offers a very robust implementation of
| all three of them: https://django-
| treebeard.readthedocs.io/en/latest/
| petepete wrote:
| There's a good chapter on this in SQL Antipatterns, it covers
| four or five different approaches in depth.
|
| I always just use a parent_id and recursive cte - it's plenty
| fast even without materialised views.
|
| https://pragprog.com/titles/bksqla/sql-antipatterns/
| dspillett wrote:
| Careful with CTEs in older postgres. Until the most recent
| versions they were an optimisation fence (predicate push-down
| would not pass through, meaning the CTE would often result in
| an index or table scan, potentially multiple times in the
| recursive case. This can be a huge problem over large data,
| though obviously in many circumstances with small enough data
| it is still efficient enough.
| paulddraper wrote:
| That concern doesn't really apply to recursive CTEs though.
|
| Or rather, recursive CTEs inherently require that you select
| only what you need.
|
| A little difference that the non-recursive case of "I'll just
| use a CTE for same functionality but better readability"
| which can deceptively bite you prior to PostgreSQL 12.
| vore wrote:
| This is use case dependent, but one of my favorite methods for
| implementing e.g. threaded comments is closure tables:
| create table post_paths ( descendant_id bigint not null
| references posts (id) on delete cascade on update cascade,
| ancestor_id bigint not null references posts (id) on delete
| cascade on update cascade, depth int not null,
| primary key (descendant_id, ancestor_id) );
|
| At the cost of O(n^2) rows per hierarchy, it will allow you to do
| some pretty nifty things, most notably being able to get a count
| of all descendants of a parent in O(n) time. Updating entries may
| be painful, though.
| _boffin_ wrote:
| If this topic interests people, take a look at Joe Celko's book
| called Trees and Hierarchies: https://www.amazon.com/Hierarchies-
| Smarties-Kaufmann-Managem...
| WalterGR wrote:
| Available on what was formerly O'Reilly Safari:
| https://learning.oreilly.com/library/view/joe-celkos-trees/9...
| butonic wrote:
| A great overview of the pros and cons of different approaches is
| given in https://de.slideshare.net/billkarwin/models-for-
| hierarchical...
| jasonwatkinspdx wrote:
| This is a great slide deck and directly follows what I learned
| about this over the years. In nearly every case I think you
| want a flattened table of the hierarchy. I hadn't heard the
| name closure table before so thanks.
| tejtm wrote:
| Not sure the age on that slide deck but Sqlite has definitely
| supported recursive table expressions for years now. In some
| ways it is more permissive with syntax allowed in the
| recursive portion than Postgres.
| zozbot234 wrote:
| Yes, many RDBMS's have gotten support for recursive CTE's
| as of recently. It's definitely not a niche feature
| anymore.
| jasonwatkinspdx wrote:
| Yeah, and it's a straightforward standardized way to
| solve this.
|
| But I still think it's useful to know the flattening
| idea. I've seen it show up other places like map reduce.
| It's a useful general idea imo.
| highpost wrote:
| https://www.youtube.com/watch?v=wuH5OoPC3hA
| belter wrote:
| As there is no context, FYI for others: The video link is the
| presentation just above.
| dln_eintr wrote:
| Most hierarchies are shallow in practice. Keep it simple.
| hu3 wrote:
| This. Throughout my years it was very common to see juniors
| trying to model static website dropdown menus as recursive
| database tables.
|
| In practice if a menu has more than 3 levels the user
| experience will suffer. I keep them at 2 submenus max, possibly
| 1 if I can get away with it.
|
| Not to mention most menus are not dynamic, meaning they can be
| just a JSON file or simple HTML.
| jayknight wrote:
| This is wise, but in the healthcare field, there are some
| pretty huge trees of things that you need to deal with
| sometimes. I've been involved with building out a structure a
| lot like MeSH[0] and some disease trees similar to ICD. Some
| of my implementations I would definitely do differently now
| because both the tools and my experience have improved.
| MeSH's "addresses" even match the ltree syntax, so it would
| probably make a lot of sense to use that.
|
| [0] https://www.nlm.nih.gov/mesh/intro_trees.html
| hu3 wrote:
| I feel your pain. I consulted in a hospital management
| software and they had trees spanning 6+ levels deep and it
| had to be dynamic too.
| pphysch wrote:
| tl;dr: I agree and healthy human organizations should never
| scale beyond ~5 degrees of hierarchy, which is totally
| manageable via basic recursive JOINs in a RDBMS without fancy
| stored procedures or graph theory.
|
| I like to use Dunbar's Number (100-250) to approximate the
| levels of heirarchy in human organizations. The idea is that
| these organizations are most efficient when organizational
| layers don't exceed ~150 elements, due to the implementation
| details of the human brain.
|
| Basically, you can do log_{150}(N) to get a very rough idea of
| how complex the organization of N people should be. This works
| for small startups and entire countries. Of course, startups
| should probably get comfortable with the idea of teams well
| before hitting >100 employees. Teams can then scale into
| departments (with new subteams), and once there are many
| departments, add regional layer, strategic/executive layer, and
| so on.
|
| One interesting fact is that the USA population has roughly
| increased by a multiple of Dunbar's number since its
| organizational structure was codified in its Constitution.
| Perhaps time for another look?
| mumblemumble wrote:
| I would argue that Dunbar's Number is the wrong number to use
| for this. At least not by just naively dropping it in.
|
| Remember, it represents the _total_ number of stable social
| relationships a person can maintain. If you 're looking to
| allow your employees to have personal lives, you'll want to
| leave ample room for their family and friends.
|
| Maybe an important question to ask is, how much of your
| employees' social-emotional carrying capacity is it
| appropriate to consume? If 10%, then 15-25 is your number. If
| 20%, then 30-50 is your number.
| pphysch wrote:
| It's definitely a big ballpark, but I think Dunbar's Number
| is a good place to start. If you have managers spending 10%
| or less of their lives on management, I don't think the
| organization will be very healthy. Management should be a
| high commitment, high compensation role.
|
| It's also definitely a upper limit rather than lower limit.
| Big bureaucracies with many layers of management and small
| teams can work well, but no one can _really_ individually
| manage 1000 subordinates.
| munificent wrote:
| _> One interesting fact is that the USA population has
| roughly increased by a multiple of Dunbar 's number since its
| organizational structure was codified in its Constitution.
| Perhaps time for another look?_
|
| I have had that exact same idle thought.
|
| In 1813, each of the 182 US Representatives represented on
| average ~40,000 Americans. Today, each of our 435 Reps stands
| for about ~760,000 people. That's over an order of magnitude
| growth. To keep the same rate of representation, we'd need to
| have over 8,000 Representatives, which is clearly too large a
| body to get anything done.
|
| So we're probably well beyond the point where we could
| benefit from a large House of Subrepresentatives and then a
| smaller House of Superrepresentatives that aggregate them.
| mulmen wrote:
| Well isn't that why we have limited federal government and
| defer most decisions to the states?
| beardedetim wrote:
| Who should then defer to county/parish, who then defer to
| city, who then defer to neighborhood
| mulmen wrote:
| Right, that scales better than 8000 representatives in
| the house.
| st3fan wrote:
| Pro tip: please don't use comic sans (or derivative) for serious
| blog posts.
| gnarbarian wrote:
| This is the kind of thing I thought graph databases were good at.
| postgres extensions for this would be very interesting.
| setr wrote:
| Most datasets have relationships forming a graph, but few need
| to be analyzed as a graph. Graph databases specialize for the
| latter usecase, with large datasets.
|
| But you could always traverse graphs with recursive SQL -- it's
| just less pleasant, and perhaps less performant -- but often
| that's all you need (and often people confuse having a graph
| relationship with needing a graph-specialized database)
| zozbot234 wrote:
| Postgres can handle a lot of generic 'graph' use cases. Unless
| you have _very_ specialized needs, focusing on complex
| algorithms and very big network data, it 's not at all clear
| that you'll need more than that.
| makr17 wrote:
| And even then, it really depends on what you're trying to do.
|
| My use case a few years ago was such that _every_ dedicated
| graph engine I tried would just OOM on the test set of ~10M
| nodes, which I attributed to insufficient pruning. Postgres
| was able to handle it, and still worked on the production set
| of ~1.5B nodes. The SQL took some tuning, and wasn't
| _pretty_, but it did _work_ where nothing else did...
| codingclaws wrote:
| Peaches 'n' Stink [1] uses ltree for nested comments. All the
| PostgreSQL is here [2]. The first query that uses ltree is on
| line 452 [3].
|
| It was hard to decide on the ltree "path" format. There's a
| "path" value for each comment and it is what sets up the
| hierarchy. What I ended up using was this (root level comments):
|
| 1.aaa
|
| 1.aab
|
| 1.aac
|
| 1.aad
|
| .
|
| .
|
| .
|
| 1.aaz
|
| 1.aba
|
| 1.abb
|
| The "1" is the post ID. Sub-comments under the first comment
| above would be:
|
| 1.aaa.aaa
|
| 1.aaa.aab
|
| 1.aaa.aac
|
| The reason why I used this "path" format is because if I sort
| alphabetically then the comments will be in the correct
| "vertical" order to display chronologically.
|
| [1] https://www.peachesnstink.com
|
| [2] https://github.com/ferg1e/peaches-n-
| stink/blob/master/db/ind...
|
| [3] https://github.com/ferg1e/peaches-n-
| stink/blob/master/db/ind...
| I_am_tiberius wrote:
| I use ltree with UUIDs. What's really annoying is that I needed
| to replace the '-' characters of the UUIDs to '_' in order to
| store the hierarchy in the ltree column.
| qxmat wrote:
| If you use Postgres + Redshift on AWS (via foreign data wrapper
| or dblink) it's worth noting that AWS has released recursive CTE
| support in Redshift. But with that said 9 levels of UNION ALL
| ought to be enough for anyone.
| jawns wrote:
| This seems a lot cleaner than the suggestions on Stack Overflow!
|
| https://stackoverflow.com/questions/54907495/postgresql-recu...
|
| Edit: Sorry for being unclear. By "this" I meant the ltree
| solution that's listed second in the blog post.
| [deleted]
| polote wrote:
| The first suggestion of the post is exactly the same as the one
| of Stack Overflow.
| Unbeliever69 wrote:
| I feel that a discussion of ltree is incomplete without
| discussion of manipulating the tree. e.g. splicing, cutting,
| moving, deleting branches [1] etc. Also, how ltree uses Gist
| indexes [2].
|
| Reference: [1] http://patshaughnessy.net/2017/12/14/manipulating-
| trees-usin... [2] http://patshaughnessy.net/2017/12/15/looking-
| inside-postgres...
| vyrotek wrote:
| ltree looks interesting. I've used MSSQL's HierarchyID
| extensively for this sort of thing. Are they similar or is there
| something else like HierarchyID for PostgreSQL?
___________________________________________________________________
(page generated 2021-06-25 23:00 UTC)