[HN Gopher] PostgreSQL and Its Annoying Crosstab
       ___________________________________________________________________
        
       PostgreSQL and Its Annoying Crosstab
        
       Author : leononame
       Score  : 40 points
       Date   : 2024-04-04 10:24 UTC (2 days ago)
        
 (HTM) web link (blog.aurelianix.com)
 (TXT) w3m dump (blog.aurelianix.com)
        
       | mbork_pl wrote:
       | I admit that I only skimmed the article, but I'm wondering if
       | psql's \crosstabview would help?
        
         | sgarman wrote:
         | OP says it needs to be a function in sql and backslash commands
         | are psql commands not queries. Probably would not suite their
         | case.
        
           | leononame wrote:
           | Yes, that's exactly right. Other requirements around this
           | whole thing force us to do this in a function and not through
           | psql. We've actually been thinking about using
           | pl{perl,python,rust} instead because this approach works, but
           | is annoying.
        
       | nicholasjon wrote:
       | Just ran into the fiddlyness of crosstab this week attempting to
       | extend an existing query that used it. I ended up converting the
       | query to use filters instead, which was a failure on my part to
       | get through that fiddlyness -- but I'm consoling myself by saying
       | it's a little more explicit this way for the next person. :)
        
       | TheRealDunkirk wrote:
       | AND the function is hard-limited to 1600 columns. Don't ask me
       | how I know; you already know.
        
       | account-5 wrote:
       | I don't know postgresql but is it worse that sqlite? MS Access on
       | the other hand is easy!
        
         | to11mtm wrote:
         | IIRC PostgreSQL is the preferred 'reference' for when features
         | are added to SQLite.
         | 
         | It's a different sort of beast than SQLite/Access.
         | 
         | SQLite, Access, and FirebirdDB in 'local' mode are examples of
         | 'file based databases' (Access of course is normally used via
         | the application, but you can absolutely get to it in other
         | ways.)
         | 
         | Postgres, MySQL/MariaDB, Microsoft SQL Server (MSSQL), Oracle,
         | and FirebirdDB in 'client/server' mode, involve having a
         | dedicated server (or cluster of servers) that store the data
         | and run the actual SQL logic.
         | 
         | Hope that helps?
        
           | account-5 wrote:
           | Haha, probably should have worded my previous post better;
           | karma taking a pounding due to that, I think.
           | 
           | I meant in reference to Crosstab query, not the databases
           | themselves.
        
       | mjevans wrote:
       | https://www.postgresql.org/docs/current/tablefunc.html#TABLE...
       | 
       | Given what this function is doing, a limit on the output seems
       | reasonable. Others have mentioned 1600 columns, which seems like
       | a bit of a strange number, but might correlate internally to a
       | maximum tuple of pointers to the source data.
       | 
       | In a quick skim of various searches I don't see any good
       | alternatives to crosstab (AKA pivot) for such monstrously large
       | use cases. Nearly all the toy examples include months, or
       | possibly days where the aggregate inputs collect summary
       | statistics. At 1600+ columns, it's probably better to re-process
       | the data and convert it from tabular to columnar format in
       | whatever way best fits the particular data.
        
       | bionhoward wrote:
       | Maybe you could use column_name, dtype, min, max columns and
       | sidestep the nulls, but you'd have to convert "min" and "max" to
       | text so they're all the same type?
       | 
       | Could also split the table by dtype outside of sql to ditch null
       | values without needing to convert non-null ranges to text, but
       | then you split your data into multiple tables, maybe that's
       | inconvenient to break table schemas into chunks?
        
       | makmanalp wrote:
       | In a similar vein, here is my very similar rant and a teeny tiny
       | sqlalchemy plugin that deals with the grossness a bit if you're
       | using python. Not entirely sure if it still works, was floored to
       | realize I'd written this a decade ago! But the sqlalchemy API
       | tends to be quite stable.
       | 
       | https://github.com/makmanalp/sqlalchemy-crosstab-postgresql
        
         | tommasoamici wrote:
         | Thanks for your work on that!
         | 
         | I recently implemented a pivot table widget at work and it was
         | mostly based on your code.
         | 
         | It was a while ago, but I remember it worked with very few
         | changes.
        
       | wswope wrote:
       | In case it saves anyone a headache: Pivoting with Postgres' JSON
       | features (json_build_object and json_object_agg, specifically) is
       | usually less difficult than crosstab IME.
       | 
       | I've used it for exactly this same sorta metadata logging.
        
         | singingfish wrote:
         | Interesting, thanks! On a similar but unrelated note, I think
         | instead of MERGE INTOs, self-updating CTEs are a better bet for
         | similar reasons.
        
       | kanobi wrote:
       | Whenever I needed to do a pivot in postgres, I used this approach
       | that is described in this stackoverflow anwser:
       | 
       | https://stackoverflow.com/questions/20618323/create-a-pivot-...
       | 
       | So for example when you have a table like described (column_name,
       | meta_key, value), you would create a query like this:
       | SELECT         column_name,         MAX(CASE WHEN
       | meta_key='total_rows' THEN value ELSE NULL END) AS total_rows,
       | MAX(CASE WHEN meta_key='not_null_count' THEN value ELSE NULL END)
       | AS not_null_count,         -- for all other metrics....
       | FROM tall_table       GROUP BY 1
       | 
       | (edit: formatting)
        
         | jasonpbecker wrote:
         | Most of the time I'm using `filter ... where` for cases like
         | these... for example                 select
         | column_name,         MAX(value) FILTER (where
         | meta_key='total_rows') as total_row,         MAX(value) FILTER
         | (where meta_key='not_null_count') as not_null_count,
         | ROUND(SUM (amount_in_cents) FILTER (WHERE EXTRACT(MONTH   FROM
         | TIMESTAMP '2006-01-01 03:04:05) = 1) / 100.0, 2) as
         | 'january_sub_total'       FROM table       GROUP BY column_name
        
       | ttfkam wrote:
       | Someday Postgres will get support for PIVOT/UNPIVOT, but today is
       | sadly not that day.
       | 
       | https://learn.microsoft.com/en-us/sql/t-sql/queries/from-usi...
       | 
       | https://blogs.oracle.com/sql/post/how-to-convert-rows-to-col...
        
       ___________________________________________________________________
       (page generated 2024-04-06 23:00 UTC)