hngopher.com

       [HN Gopher] What I wish someone told me about Postgres
       ___________________________________________________________________
        
       What I wish someone told me about Postgres
        
       Author : todsacerdoti
       Score  : 281 points
       Date   : 2024-11-12 00:59 UTC (22 hours ago)
        
 (HTM) web link (challahscript.com)
 (TXT) w3m dump (challahscript.com)
        
       | onoke wrote:
       | Haha, good job. :)
        
       | whiskey-one wrote:
       | Really great post! It belongs on a reading list somewhere for
       | everyone who is using Postgres independently or as a part of
       | their stack.
        
       | andrewstuart wrote:
       | Dump your schema, drop the entire thing into ChatGPT or Claude
       | and ask it to write your Postgres query.
       | 
       | Then ask it to rewrite the query three different ways and explain
       | the pros and cons of each.
       | 
       | Do the same for any existing queries in your system ... drop in
       | the schema then the query and ask for an analysis and suggested
       | improvements.
        
         | cyanydeez wrote:
         | then have corporate knock on your teams window asking why all
         | it's IP ended up in a Chinese competitor?
        
           | junaru wrote:
           | Schema != data.
        
             | PittleyDunkin wrote:
             | It's certainly data to the extent that it's copyrightable.
             | I doubt this sentiment would stand up in court.
        
             | jrpelkonen wrote:
             | "Show me your flowcharts and conceal your tables, and I
             | shall continue to be mystified. Show me your tables, and I
             | won't usually need your flowcharts; they'll be obvious." -
             | Fred Brooks
        
               | josephg wrote:
               | Yep, but if you think your database schema is "secret
               | sauce", you're fooling yourself. You can almost always
               | guess the database schema if you're given 10 minutes to
               | play with the application.
        
             | hobs wrote:
             | Generally your schema is your code, which IS your IP - your
             | data is often NOT your IP (and often might just be a list
             | of facts, a thing which is not copyrightable)
        
         | bilekas wrote:
         | People have too much faith in LLM's at the moment. This might
         | be able to give some insights, but if you're trying to LEARN
         | something in detail, such as the pros and cons on a particular
         | custom query on your unuiqe-ish schema.. The LLM will be prone
         | to making stuff up by design, it wasn't trained on YOUR data..
         | 
         | It will give some good basic results and probably some
         | functional queries, but it misses the mark with the finer
         | points of optimization etc. If you don't know how to properly
         | build the queries, go learn it properly with confidence that
         | what you're learning is actually reliable.
        
         | cpursley wrote:
         | This is similar to what I've done with my sass where most of
         | the logic lives in Postgres. I went through much of the logic
         | and refactored it with GPT a while back and improved it
         | significantly. It's soooo good with SQL. Like insanely good.
         | And these days I'm using Claude for all things code, which is
         | significantly better than GPT.
        
           | sgarland wrote:
           | Agreed that LLMs are quite good at SQL (and relational
           | algebra). However, ask yourself this: how many programming
           | languages do you know to at least a comfortable level? If N >
           | 1, why not add another (SQL)? It's not a difficult language,
           | which is in part why LLMs are so good at it - it's easy to
           | predict the next token, because there are so few of them.
        
             | cpursley wrote:
             | I know SQL pretty well (enough to write complex procedures)
             | and prefer it over doing biz logic in application code
             | (yuck ORMs). But SQL is just such a wonky syntax that I
             | appreciate any help that I can get.
        
       | cyanydeez wrote:
       | I'd say I tend to ignore the standard docs because they rarely
       | have examples and rely on the arcane procedure of trying to
       | decipher the super command options with all it's "[OR
       | THIS|THAT]".
       | 
       | I assume _someone_ can read this pseudo programming, but it's not
       | me.
        
         | Clubber wrote:
         | FWIW square brackets indicate an optional parameter and the
         | pipe inside indicates all the parameters available; but I
         | agree, I don't particularly care for that notation and prefer
         | actual running SQL examples.
        
           | hobs wrote:
           | Railroad-esque diagrams can be weird but they say a lot in a
           | very short space, I would highly recommend spending a little
           | extra time working on thinking through them, they are
           | everywhere!
        
             | Clubber wrote:
             | >Railroad-esque diagrams
             | 
             | Now and forever I will think of railroads when I see these.
        
               | hobs wrote:
               | Well, you should! But, I didn't invent anything there,
               | that's what they are named :)
               | 
               | Syntax diagrams (or railroad diagrams) are a way to
               | represent a context-free grammar. They represent a
               | graphical alternative to Backus-Naur form, EBNF,
               | Augmented Backus-Naur form, and other text-based grammars
               | as metalanguages...
               | https://en.wikipedia.org/wiki/Syntax_diagram
        
         | paulryanrogers wrote:
         | Sometimes I find it annoying but mostly it works well for me.
         | I've come to embrace the find feature and visually scanning
         | over any parenthetical stuff.
         | 
         | The alternative is they have to break it up into several quasi
         | examples, each with their own optional modifiers.
        
         | PhilippGille wrote:
         | I love the diagrams from the SQLite documentation:
         | https://www.sqlite.org/syntaxdiagrams.html
        
           | christophilus wrote:
           | SQLite diagrams plus examples are pretty much the way I wish
           | all of my documentation worked.
        
           | o11c wrote:
           | I've been bitten by those before because they are _not_
           | generated from the actual syntax-parsing code, and thus are
           | sometimes out of sync and wrong (or at least misleading).
        
           | teddyh wrote:
           | Those are commonly called "railroad diagrams": <https://en.wi
           | kipedia.org/w/index.php?title=Syntax_diagram&ol...>
        
         | assbuttbuttass wrote:
         | I'm looking at the postgres docs now, and they certainly have
         | the arcane formal syntax definitions, but they also have plenty
         | of useful examples https://www.postgresql.org/docs/current/sql-
         | altertable.html#...
        
         | smartbit wrote:
         | PostgreSQL Administration Cookbook series served me well
         | 
         | https://www.oreilly.com/library/view/postgresql-16-administr...
        
       | tomcam wrote:
       | I wrote this to help beginners:
       | https://tomcam.github.io/postgres/
        
       | nodakai wrote:
       | That nested JSON query operator chains such as
       | json_col->'foo'->'bar'->>'baz' internally return (copy) entire
       | sub-objects at each level and can be much slower than
       | jsonb_path_query(json_col, '$.foo.bar.baz') for large JSONB data
       | 
       | ... although I haven't had the chance to verify this myself
        
         | hans_castorp wrote:
         | You can also use #>> operator for that:
         | json_col #>> '{foo,bar,baz}'
        
         | mdaniel wrote:
         | I got nerd-sniped on this, because I actually hadn't heard that
         | before and would be horrified if it were true. It took some
         | non-trivial digging to even get down into the "well, what does
         | foo->>'bar' even map to in C?" level. I for sure am not
         | claiming certainty, but based merely on
         | "getIthJsonbValueFromContainer" <https://sourcegraph.com/github
         | .com/postgres/postgres@REL_17_...> it does seem that they do
         | palloc copies for at least some of the JSONB calls
        
           | anarazel wrote:
           | Postgres does have infrastructure to avoid this in cases
           | where the result is reused, and that's used in other places,
           | e.g. array constructors / accessors. But not for jsonb at the
           | moment.
        
       | munk-a wrote:
       | While postgres is indeed case sensitive usually writing queries
       | with keywords in all caps is an effort to increase legibility for
       | visual pattern matching. It absolutely isn't needed but if I'm
       | debugging a query of yours I will send it through my prettifier
       | so that I can breeze through your definitions without getting
       | hung up on minor weird syntax things.
       | 
       | It's like prettification in any other language - visual
       | structures that we can quickly recognize (like consistent
       | indentation levels) make us waste less time on comprehension of
       | the obvious so we can focus on what's important.
       | 
       | The only thing I really object to is
       | "actuallyUsingCaseInIdentifiers" I never want to see columns that
       | require double quotes for me to inspect on cli.
        
         | archsurface wrote:
         | My understanding is that the caps were syntax highlighting on
         | monochrome screens; no longer needed with colour. Can't provide
         | a reference, it's an old memory.
        
           | _Wintermute wrote:
           | Most of the SQL I write is within a string of another
           | programming language, so it's essentially monochrome unless
           | there's some really fancy syntax highlighting going on.
        
             | jaredklewis wrote:
             | Aside, but jetbrains IDEs seem to have some way to detect
             | embedded sql and highlight it. I don't remember configuring
             | anything to get this feature.
        
               | wmfiv wrote:
               | More than highlight they'll do schema validation against
               | inline SQL strings also.
        
               | formerly_proven wrote:
               | If you're working with some established framework and
               | project structure their IDEs pull that information out of
               | that, otherwise you'll need to at least tell it the
               | dialect, but if you e.g. configure the database as a data
               | source in the IDE you'll get full schema xref.
        
               | tomjen3 wrote:
               | Another aside: that is true for a huge range of
               | programming languages as well as things like HTML. I
               | believe it can automatically add \ to " in strings when
               | those strings are marked to the IDE as HTML.
        
               | TRiG_Ireland wrote:
               | VS Code does (did?) detect embedded SQL in PHP and
               | correctly colour it, but only if it's on a single line.
               | Any linebreaks and the colour is turned off. Also, if
               | you're using prepared statements and have an @label, and
               | that label is at the end of the string (so immediately
               | followed by a closing quote), the SQL colouring continues
               | into the rest of the PHP beyond the string. So it's
               | important that single-line SQL statements ending in a
               | @label be edited into multi-line statements to turn off
               | the broken SQL colouring. Odd.
        
               | munk-a wrote:
               | PHP strings tend to have better syntax highlighting with
               | here/now docs (i.e. starting with `<<<TOKEN`). I've found
               | SublimeText to have excellent SQL detection when using
               | these tokens to delineate queries (and the syntax lends
               | itself well to block strings anyways).
        
               | yen223 wrote:
               | This is Jetbrain's "language injection" feature if you
               | want to look it up. It works with any languages that the
               | IDE supports, and like a sibling comment mentioned it
               | does more than syntax highlighting.
               | 
               | https://www.jetbrains.com/help/idea/using-language-
               | injection...
        
             | WorldMaker wrote:
             | I find that in worst cases I can always copy and paste to a
             | quick temporary buffer that is highlighted. I might be
             | doing that naturally anyway if I'm trying to debug it, just
             | to run it in a Data IDE of my choice, but sometimes even
             | just using a scratch VS Code "Untitled" file can be useful
             | (it's SQL auto-detect is usually good enough, but switching
             | to SQL is easy enough if it doesn't auto-detect).
        
           | CoastalCoder wrote:
           | I think the "color is all we need" idea makes sense in
           | proportion to how many of our tools actually support
           | colorization.
           | 
           | E.g., the last time I used the psql program, I don't think it
           | had colorization of the SQL, despite running in a color-
           | capable terminal emulator.
           | 
           | It probably doesn't help that terminal colors are a bit of
           | mess. E.g., piping colored output through 'less' can result
           | in some messy control-character rendering rather than having
           | the desired effect.
        
             | dllthomas wrote:
             | > piping colored output through 'less' can result in some
             | messy control-character rendering rather than having the
             | desired effect.
             | 
             | It can, but -G or -R can fix that.
        
             | mixmastamyk wrote:
             | Try pgcli for color and completion.
        
               | renewiltord wrote:
               | Good tip, thank you.
        
           | vundercind wrote:
           | Like sigils in that regard. Perl-type sigils are extremely
           | nice... if you're editing in Notepad or some ancient vi
           | without syntax highlighting and the ability to ID references
           | on request. Little point to them, if you've got more-capable
           | tools.
        
         | avg_dev wrote:
         | i agree; but i use caps in my codebase, and lowercase when
         | testing things out manually, just for ease of typing.
        
           | munk-a wrote:
           | Ditto - if I'm throwing out an inspection query just to get a
           | sense of what kind of data is in a column I won't bother with
           | proper readable syntax (i.e. a `select distinct status from
           | widgets`). I only really care about code I'll need to reread.
        
         | emmanuel_1234 wrote:
         | Any recommendation for a prettifier / SQL linter?
        
           | gnulinux wrote:
           | I'm curious about this for DuckDB [1]. In the last couple
           | months or so I've been using DuckDB as a one-step solution to
           | all problems I solve. In fact my development environment
           | rarely requires anything other than Python and DuckDB (and
           | some Rust if native code is necessary). DuckDB is an insanely
           | fast and featureful analytic db. It'd be nice to have a
           | linter, formatter etc specifically for DuckDB.
           | 
           | There is sqlfluff etc but I'm curious what people use.
           | 
           | [1] DuckDB SQL dialect is very close to Postgres, it's
           | compatible in many ways but has some extra QOL features
           | related to analytics, and lacks a few features like `vacuum
           | full`;
        
             | ptman wrote:
             | sqlfluff is better than
             | https://github.com/darold/pgFormatter , but it can get
             | confused at times.
        
           | homebrewer wrote:
           | IDEA if you want to use it for other things (or any other
           | JetBrains IDE). Nothing comes close feature-wise.
           | 
           | If you don't:
           | 
           | - https://www.depesz.com/2022/09/21/prettify-sql-queries-
           | from-...
           | 
           | - https://gitlab.com/depesz/pg-sql-prettyprinter
           | 
           | Or https://paste.depesz.com for one-off use.
        
             | jillyboel wrote:
             | i think the idea sql prettifier is pretty silly sometimes.
             | it really likes indenting stuff to make sure things are
             | aligned, which often results in dozens of whitespaces
        
               | homebrewer wrote:
               | It makes it easy to distinguish null vs not null columns
               | and other similar things, so I personally don't mind.
        
         | wiredfool wrote:
         | I find all caps identifiers wind up just looking like
         | interchangeable blocks, where lowercase have word shapes. So
         | all caps just slows down my reading.
        
           | WorldMaker wrote:
           | I feel similarly and I also have friends with Dyslexia with
           | even stronger opinions on it. All caps in addition to being
           | "shouting" to my ancient internet-using brain (and thus
           | _rude_ in most cases), creates big similar rectangular blocks
           | as word shapes and is such a big speed bump to reading speed
           | for everyone (whether or not they notice it). For some of my
           | friends with Dyslexia that have a huge tough time with word
           | shapes at the best of times, all caps can be a hard stop
           | "cannot read" blocker for them. They say it is like trying to
           | read a redacted document where someone just made rectangular
           | black marker cross outs.
           | 
           | Personally, given SQL's intended similarity to English, I
           | find that I like English "sentence case" for it, with the
           | opening keyword starting with a capital letter and nearly
           | every remaining letter lower case (except for Proper Nouns,
           | the truly case-sensitive parts of SQL like table names).
           | Sentence case has been helpful to me in the past in spotting
           | things like missing semicolons in dialects like Postgres'
           | that require them, and/or near keywords like `Merge` that
           | require them or helping to visually make sure the `select`
           | under a `Merge` is intended as a clause rather than starting
           | a new "sentence".
        
             | akira2501 wrote:
             | > I find that I like English "sentence case" for it,
             | 
             | I could go either way, but if you want to go back and
             | modify a query, this makes it more difficult for me. I just
             | use a block syntax for my queries:                   SELECT
             | *         FROM     the_table         WHERE    some_column =
             | 12         AND      other_column IS NOT NULL         ORDER
             | BY order_column;
        
           | andrei_says_ wrote:
           | Also sql editors like datagrip color the sql syntax very
           | well.
        
           | yen223 wrote:
           | I've always found it funny that SQL was designed the way it
           | is to be as close to natural English as possible, but then
           | they went ahead and made everything all-caps
        
             | paulryanrogers wrote:
             | Some old terminals didn't have lower case. Like 1960s era
        
         | marcosdumay wrote:
         | Well, as long as you aren't imposing the noisy syntax into
         | everybody by pushing the case-change back into the code...
         | 
         | But getting some editor that highlights the SQL will completely
         | solve your issue.
        
           | munk-a wrote:
           | I think fighting in PRs over syntax preferences is pretty
           | useless so dev shops should generally have code style
           | guidelines to help keep things consistent. In my company we
           | use all caps casing since we have momentum in that direction
           | but I think that decision can be reasonable in either
           | direction _as long as it 's consistent_ - it's like tabs vs.
           | spaces... I've worked in companies with both preferences, I
           | just configure my editor to auto-pretty code coming out and
           | auto-lint code going in and never worry about it.
        
         | o11c wrote:
         | Note that case handling is a place where postgres (which folds
         | to lowercase) violates the standard (which folds to uppercase).
         | 
         | This is mostly irrelevant since you really shouldn't be mixing
         | quoted with unquoted identifiers, and introspection largely
         | isn't standardized.
        
           | yen223 wrote:
           | Given that other mainstream RDBMSes lets you configure how
           | case handling should happen, Postgres is arguably the closest
           | to the standard.
           | 
           | Usual caveat of how nobody sticks to the ANSI standard anyway
           | applies.
        
         | gwbas1c wrote:
         | It's really useful to know this when working with SQL
         | interactively.
         | 
         | Specifically, if I'm banging out an ad-hoc query that no one
         | will ever see, and I'm going to throw away, I don't worry about
         | casing.
         | 
         | Otherwise, for me, all SQL that's checked in gets the commands
         | in ALL CAPS.
        
         | jillyboel wrote:
         | what is your prettifier of choice for postgres?
        
         | zusammen wrote:
         | The all caps syntax also helps queries stand out as distinct
         | from typical source languages. It is often helpful, since SQL
         | tends to end up in all sorts of applications.
        
         | grahamplace wrote:
         | For checked-in SQL queries, we follow:
         | https://www.sqlstyle.guide/
         | 
         | The combination of all caps keywords + following "the river"
         | whitespace pattern dramatically improves readability in my
         | opinion
        
         | MetaWhirledPeas wrote:
         | > writing queries with keywords in all caps is an effort to
         | increase legibility for visual pattern matching
         | 
         | Considering most programming languages do just fine without ALL
         | CAPS KEYWORDS I'd say it's a _strange_ effort. I wish SQL didn
         | 't insist on being different this way.
         | 
         | I agree with you on prettification though. As long as the
         | repository chooses a prettifier you can view it _your way_ then
         | commit it _their way_. So that 's my advice: always demand
         | prettification for pull requests.
        
           | sensanaty wrote:
           | I don't write or read SQL too often, but I prefer the
           | ALL_CAPS because it usually lets me know if something is part
           | of the SQL syntax itself or a function or whatever, or if
           | it's referencing a table/column/etc.
           | 
           | Obviously not very foolproof, but automated
           | linters/prettifiers like the one in DataGrip do a good job
           | with this for every query I've ever thrown at it.
        
         | mannyv wrote:
         | The uppercase is usually there to show people what's SQL vs
         | what's custom to your database. In books it's usually set in
         | courier.
         | 
         | I thought he was talking about psql's case sensitivity with
         | table names, which is incredibly aggravating.
        
       | nasretdinov wrote:
       | Nice article! One thing I'd add is that almost all of it applies
       | to other MVCC databases like MySQL too. While some details might
       | be different, it too suffers from lon transactions, holds
       | metadata locks during ALTERs, etc, all the good stuff :).
        
       | avg_dev wrote:
       | this is really nice. i am glad the author put it together. i
       | didn't know the pg docs were 3200 pages long! i have been using
       | it for a while and try to learn as i go. i really do like the
       | docs. and i also like to read articles on various particular
       | subjects as i find a need to.
       | 
       | i think the author might find it helpful to readers to add a note
       | to
       | https://challahscript.com/what_i_wish_someone_told_me_about_...
       | that if someone is selecting for b alone, then an index on
       | columns (b, a) would work fine. i think this is kind of implied
       | when they talk about selecting on a alone, but maybe it wouldn't
       | hurt to be extra explicit.
       | 
       | (i didn't spend much time on the json/jsonb part since i rarely
       | use that stuff)
        
       | ellisv wrote:
       | I really wish developers cared more about normalization and stop
       | shoving everything into a JSON(b) column.
        
         | devin wrote:
         | Even if you care about it, you will still often wind up with a
         | junk drawer of JSONB. I don't really see it as a problem unless
         | people are writing bad queries against it instead of lifting
         | values out of the JSONB into their own columns, etc.
        
           | mnahkies wrote:
           | Yeah exactly, and I'll take a JSON(B) column over TEXT with
           | maybe valid serialised JSON, maybe RON, maybe something
           | completely random any day
        
         | randomdata wrote:
         | Most developers using these kinds of tools these days are
         | actually building their own database management systems, just
         | outsourcing the persistence to another DMBS, so there isn't a
         | strong imperative to think about good design so long as it
         | successfully satisfies the persistence need.
         | 
         | Whether we actually should be building DMBSes _on top_ of
         | DMBSes is questionable, but is the current state of affairs
         | regardless.
        
           | tightbookkeeper wrote:
           | A previous employer thought that sql databases didn't
           | understand graphs. So they made their own system for
           | serializing/deserializing graphs of objects into Postgres
           | 
           | . They never used queries and instead had their own in-memory
           | operators for traversing the graph, had to solve problems
           | like deleting an entry and removing all references, partial
           | graph updates.
           | 
           | And I still don't think it works.
        
         | theamk wrote:
         | This needs working schema migration process, including ability
         | to undo schema change if the new column tanks the performance
         | or breaks stuff.
         | 
         | If there are CLI tools involved, you also need to ensure you
         | can handle some downtime, or do synchronized version update
         | across company, or support both old and new schemas for a
         | while.
         | 
         | If a database is not part of team's primary product all of this
         | could be missing.
        
         | abraae wrote:
         | Long before databases could even store structured JSON data,
         | junior developers used to bikeshed viciously over the correct
         | degree of normalization.
         | 
         | More experienced developers knew that the correct answer was to
         | duplicate nothing (except for keys obviously) and then to
         | denormalize only with extreme reluctance.
         | 
         | Then databases like mongo came along and encouraged those
         | juniors by giving them something like a database, but where
         | normalization was difficult/irrelevant. The result was a brief
         | flowering of horrible database designs and unmaintainable crap
         | towers.
         | 
         | Now the pendulum has swing back and people have rediscovered
         | the virtues of a normalized database, but JSON columns provide
         | an escape hatch where those bad practices can still flower.
        
           | christophilus wrote:
           | Eh. JSON has its place. I have some stateful data that is
           | fairly transient in nature and which doesn't really matter
           | all that much if it gets lost / corrupted. It's the sort of
           | thing I'd throw into Redis if we had Redis in our stack. But
           | the only storage in my project is S3 and Postgres. Postgres
           | allows me to trivially query, update, analyze the usage of
           | the feature, etc. Normalization wouldn't buy me much, if
           | anything, for my use case, but it would make "save this stuff
           | for later" more of a nuisance (a sync across a bunch of rows
           | vs a single upsert).
           | 
           | That said, I've worked on projects that had almost no
           | normalization, and it was pure hell. I'm certainly not
           | arguing against normalizing; just saying that data blobs are
           | useful sometimes.
        
           | codr7 wrote:
           | Yeah, I'm def not taking a any more mongodb jobs if I can
           | avoid it.
           | 
           | I'm fine with using it for simple throw away stuff, but
           | deciphering someone else's ball of json is soul killing.
        
         | yen223 wrote:
         | There are two reasons to use a jsonb column:
         | 
         | 1. To store JSON. There's a pattern where when your webserver
         | calls into some third-party API, you store the raw API response
         | in a JSONB column, and then process the response from there.
         | This gives you an auditable paper trail if you need to debug
         | issues coming from that 3rd-party API.
         | 
         | 2. To store sum types. SQL not supporting sum types is arguably
         | the biggest deficiency when modelling data in SQL databases.
         | There are several workarounds - one of them being "just chuck
         | it in a JSONB column and validate it in the application" - but
         | none of the workarounds is particularly great.
        
           | zie wrote:
           | I would add:
           | 
           | 3. End-user extra fields. Stuff you don't care about, but
           | someone somewhere does.
        
       | marcosdumay wrote:
       | > Normalize your data unless you have a good reason not to
       | 
       | Ouch. You don't want to just say that and move on.
       | 
       | The author even linked to a page citing 10 different kinds of
       | normalization (11 with the "non-normalized"). Most people don't
       | even know what those are, and have no use for 7 of those. Do not
       | send people on wild-goose chases after those higher normal forms.
        
         | pavel_lishin wrote:
         | But the author did have a paragraph explaining, in general,
         | what they mean.
         | 
         | And they're right! I've had to fix a few issues of this in a
         | project I recently got moved to. There's almost never a reason
         | to duplicate data.
        
         | hobs wrote:
         | I guess this is targeted towards noobs, but the answer is
         | pretty much always 3rd normal form if you are clicking this and
         | are not sure.
        
           | makr17 wrote:
           | Interesting, I would have said Boyce-Codd unless you have a
           | good reason to vary in either direction.
        
         | cryptonector wrote:
         | The general rule is to normalize to the max, then denormalize
         | till you get the performance that you need.
        
           | jayknight wrote:
           | Anyone here use 6NF to avoid nulls? See
           | https://36chambers.wordpress.com/2021/10/22/avoiding-null-
           | wi...
        
             | cryptonector wrote:
             | Isn't 6NF essentially a flavor of EAV? I think essentially
             | it is.
             | 
             | 6NF means having one non-PK column, so that if the value
             | would be NULL then the row need not exist, and so the value
             | column can be NOT NULL.
             | 
             | But when you need all the columns of what you'd think of as
             | the <thing> then you need to go gather them from all those
             | rows in all those tables. It's a bit annoying. On the other
             | hand a proper EAV schema has its positives (though also its
             | negatives).
        
           | DanHulton wrote:
           | The one exception I'll make from the very start is "add
           | tenant identifier to every row, yes, even if it's linked to
           | another table that has tenant identifiers."
           | 
           | Sure, this means you will have some "unnecessary" `tenant_id`
           | columns in some tables that you could get through a relation,
           | but it saves you from _having_ to include that relation just
           | to limit by tenant, which you will some day almost be
           | guaranteed to want. (Especially if you want to enable row-
           | level security1 someday.)
           | 
           | 1 - https://www.postgresql.org/docs/current/ddl-
           | rowsecurity.html
        
             | getcrunk wrote:
             | Nice one! I'm working on an app the is user/consumer facing
             | but will eventually have a teams/bussiness offering.
             | Everything has a uid with it cus users are the core of your
             | app. But yea if your multitennant then tenants should be
             | treated like that as well.
        
             | magicalhippo wrote:
             | We do a variant of this across all our tables. If we have a
             | parent-child-child relationship, then all child tables,
             | regardless of depth, will have the parent id.
             | 
             | This way we can load up all the data needed to process an
             | order or invoice etc easily, without a ton of joins.
             | 
             | We don't do multi-tenant, instead separate databases per
             | tenant so far.
        
               | wadadadad wrote:
               | How is this working for you so far? Do you ever need to
               | report across the multiple tenants, and how do database
               | migrations go? I'm starting to look into this, and purely
               | for reporting and database migrations I'm leaning towards
               | multi-tenant.
        
           | marcosdumay wrote:
           | Oh, no, it's absolutely not.
           | 
           | It's usually to normalize into the 3rd form. But that's not
           | enough on some cases, that's too much on some other cases,
           | and the reason it breaks is performance about as often as
           | it's not.
        
           | yen223 wrote:
           | "normalise til it hurts, denormalise til it works"
        
       | pavel_lishin wrote:
       | A lot of these aren't postgres-specific. (null weirdness, index
       | column order, etc.)
       | 
       | For example, how nulls work - _especially how interact with
       | indexes and unique constraints_ - is also non-intuitive in mysql.
       | 
       | If you have a user table with a non-nullable email column and a
       | nullable username column, and a uniqueness constraint on
       | something like (email, username), you'll be able to insert
       | multiple identical emails with a null username into that table -
       | because a null isn't equivalent to another null.
        
         | anarazel wrote:
         | > If you have a user table with a non-nullable email column and
         | a nullable username column, and a uniqueness constraint on
         | something like (email, username), you'll be able to insert
         | multiple identical emails with a null username into that table
         | - because a null isn't equivalent to another null.
         | 
         | FWIW, since 15 postgres you can influence that behaviour with
         | NULLS [NOT] DISTINCT for constraints and unique indexes.
         | 
         | https://www.postgresql.org/docs/devel/sql-createtable.html#S...
         | 
         | EDIT: Added link
        
         | bpicolo wrote:
         | I think this is a good pragmatic default. The use case for the
         | alternative is much more rare.
        
       | the5avage wrote:
       | From my experience with a lot of hilarious SQL stuff I have seen
       | in the wild.
       | 
       | It would be a good start to read the paper of codd and trying to
       | understand what the relational model is. It's only 11 pages long
       | and doing that would reduce the suffering in this world.
        
       | samsartor wrote:
       | Instead of psql, I really like https://github.com/dbcli/pgcli
        
       | vivzkestrel wrote:
       | since we are on the topic and since your article clearly mentions
       | "Normalize your data unless you have a good reason not to" I had
       | to ask. I am trying to build a news aggregator and I have many
       | websites. Each of them has slightly different format. Even though
       | I use feedparser in python, it still doesn't change how some of
       | them put html text inside content and some of them break it down
       | into a separate media xml attribute while retaining only basic
       | textual summary inside a summary attribute. Do you think it makes
       | more sense to store a single rss item as a column inside postgres
       | or should it be stored after parsing it? I can see upsides and
       | downsides to both approaches. Store it as XML and you have the
       | ability to change your processing logic down the line for each
       | stored item but you lose the flexibility of querying metadata and
       | you also have to parse it on the fly every single time. Store it
       | in multiple columns after processing it and it may require
       | different logic for different websites + changing your overall
       | python processing logic requires a lot of thinking on how it
       | might affect some source. What do you guys recommend?
        
         | cryptonector wrote:
         | With _all_ RDBMSes the rule is  "normalize to the max, then
         | denormalize till you get the performance that you need".
        
         | krab wrote:
         | Just shooting without knowing your real needs - take this with
         | a grain of salt.
         | 
         | Store some parsed representation that makes it easier for you
         | to work with (probably normalized). Keep an archive of raw data
         | somewhere. That may be another column, table or even S3 bucket.
         | Don't worry about schema changes but you need to not lose the
         | original data. There are some pitfalls to schema migrations.
         | But the schema should be the representation that works for you
         | _at the moment_, otherwise it'll slow you down.
        
         | sea6ear wrote:
         | With the caveat that you probably shouldn't listen to me (or
         | anyone else on here) since you are the only one who knows how
         | much pain each choice will be ...
         | 
         | I think that given that you are not really dealing with
         | structured data - you've said that different sites have
         | different structures, and I assume even with processing, you
         | may not be able to generate identical metadata structures from
         | each entry.
         | 
         | I think I would go for one column of XML, plus maybe another
         | column that holds a parsed data structure that represents the
         | result of your processing (basically a cache holding the post-
         | processed version of each site). Hopefully that could be re-
         | evaluated by whatever language (Python?) you are using for your
         | application. That way you don't have to do the full parsing
         | each time you want to examine the entry, but you have access to
         | something that can quickly give you whatever metadata is
         | associated with it, but which doesn't tie you to the rigid
         | structure of a table based database.
         | 
         | Once you know what you are really doing with the data, then you
         | could add additional metadata columns that are more rigid, and
         | which can be queried directly in SQL as you identify patterns
         | that are useful for performance.
        
       | cryptonector wrote:
       | Watch out, for row/record values, if a column in the row/record
       | is NULL then IS NULL will be true! You want to use IS [NOT]
       | DISTINCT FROM NULL, full stop.
        
       | deadbabe wrote:
       | Don't create views that reference other views.
        
       | christophilus wrote:
       | I'd never stumbled across the "don't do this" wiki entry[0]
       | before. Very handy.
       | 
       | [0] https://wiki.postgresql.org/wiki/Don%27t_Do_This
        
         | datadrivenangel wrote:
         | This reminds me of SQL Anti-patterns, which is a book that
         | everyone who works with databases should read.
        
         | gwbas1c wrote:
         | Why don't they deprecate some of these features? If they're
         | such easy stumbling blocks, seems like it makes sense to
         | disable things like table inheritance in new schemas, and
         | require some kind of arcane setting to re-enable them.
        
           | __loam wrote:
           | Changing defaults can screw over existing users.
        
           | SoftTalker wrote:
           | Presumably there are rare exceptions where you DO want to do
           | the thing.
        
           | mixmastamyk wrote:
           | Several of the broken are SQL standard.
        
         | samarthr1 wrote:
         | That was a fun read, thanks!
         | 
         | Made me reconsider a few habits I picked up from MySQL land
        
       | teddyh wrote:
       | \pset null ''
        
       | RedShift1 wrote:
       | I can't horizontally scroll on mobile, can't see the full query
       | texts...
        
         | hibachrach wrote:
         | (author here) hmm it seems to work on my device. Are you
         | scrolling within the code block?
        
       | ahachete wrote:
       | > Most notably, 'null'::jsonb = 'null'::jsonb is true whereas
       | NULL = NULL is NULL
       | 
       | Because 'null' in the JSON spec is a literal value (a constant),
       | not SQL's NULL. Nothing to see here.
       | 
       | https://datatracker.ietf.org/doc/html/rfc7159
        
         | hibachrach wrote:
         | Right--it makes sense and shouldn't be changed, but it's a bit
         | unintuitive for newcomers.
        
       | zmmmmm wrote:
       | > It's possible that adding an index will do nothing
       | 
       | This is one of the more perplexing thing to me where Postgres
       | ideology is a bit too strong, or at least, the way it works is
       | too hard for me to understand (and I've tried - I'm not going to
       | claim I'm a genius but I'm also not a moron). I hear there may be
       | finally support for some kind of hints in upcoming versions,
       | which would be very welcome to me. I've spent way too much time
       | trying to divine the voodoo of why a slow query is not using
       | indexes when it seems obvious that it should.
        
       | sroerick wrote:
       | It's always a relief to read stuff articles this, realize I know
       | 90% of it, and I've deserved the jobs I've had.
       | 
       | Great and super useful notes
        
       | cynicalsecurity wrote:
       | People who use PostgreSQL instead of MySQL just want to suffer
       | while pretending "they are better than others".
        
       ___________________________________________________________________
       (page generated 2024-11-12 23:01 UTC)