[HN Gopher] What I wish someone told me about Postgres
___________________________________________________________________
What I wish someone told me about Postgres
Author : todsacerdoti
Score : 281 points
Date : 2024-11-12 00:59 UTC (22 hours ago)
(HTM) web link (challahscript.com)
(TXT) w3m dump (challahscript.com)
| onoke wrote:
| Haha, good job. :)
| whiskey-one wrote:
| Really great post! It belongs on a reading list somewhere for
| everyone who is using Postgres independently or as a part of
| their stack.
| andrewstuart wrote:
| Dump your schema, drop the entire thing into ChatGPT or Claude
| and ask it to write your Postgres query.
|
| Then ask it to rewrite the query three different ways and explain
| the pros and cons of each.
|
| Do the same for any existing queries in your system ... drop in
| the schema then the query and ask for an analysis and suggested
| improvements.
| cyanydeez wrote:
| then have corporate knock on your teams window asking why all
| it's IP ended up in a Chinese competitor?
| junaru wrote:
| Schema != data.
| PittleyDunkin wrote:
| It's certainly data to the extent that it's copyrightable.
| I doubt this sentiment would stand up in court.
| jrpelkonen wrote:
| "Show me your flowcharts and conceal your tables, and I
| shall continue to be mystified. Show me your tables, and I
| won't usually need your flowcharts; they'll be obvious." -
| Fred Brooks
| josephg wrote:
| Yep, but if you think your database schema is "secret
| sauce", you're fooling yourself. You can almost always
| guess the database schema if you're given 10 minutes to
| play with the application.
| hobs wrote:
| Generally your schema is your code, which IS your IP - your
| data is often NOT your IP (and often might just be a list
| of facts, a thing which is not copyrightable)
| bilekas wrote:
| People have too much faith in LLM's at the moment. This might
| be able to give some insights, but if you're trying to LEARN
| something in detail, such as the pros and cons on a particular
| custom query on your unuiqe-ish schema.. The LLM will be prone
| to making stuff up by design, it wasn't trained on YOUR data..
|
| It will give some good basic results and probably some
| functional queries, but it misses the mark with the finer
| points of optimization etc. If you don't know how to properly
| build the queries, go learn it properly with confidence that
| what you're learning is actually reliable.
| cpursley wrote:
| This is similar to what I've done with my sass where most of
| the logic lives in Postgres. I went through much of the logic
| and refactored it with GPT a while back and improved it
| significantly. It's soooo good with SQL. Like insanely good.
| And these days I'm using Claude for all things code, which is
| significantly better than GPT.
| sgarland wrote:
| Agreed that LLMs are quite good at SQL (and relational
| algebra). However, ask yourself this: how many programming
| languages do you know to at least a comfortable level? If N >
| 1, why not add another (SQL)? It's not a difficult language,
| which is in part why LLMs are so good at it - it's easy to
| predict the next token, because there are so few of them.
| cpursley wrote:
| I know SQL pretty well (enough to write complex procedures)
| and prefer it over doing biz logic in application code
| (yuck ORMs). But SQL is just such a wonky syntax that I
| appreciate any help that I can get.
| cyanydeez wrote:
| I'd say I tend to ignore the standard docs because they rarely
| have examples and rely on the arcane procedure of trying to
| decipher the super command options with all it's "[OR
| THIS|THAT]".
|
| I assume _someone_ can read this pseudo programming, but it's not
| me.
| Clubber wrote:
| FWIW square brackets indicate an optional parameter and the
| pipe inside indicates all the parameters available; but I
| agree, I don't particularly care for that notation and prefer
| actual running SQL examples.
| hobs wrote:
| Railroad-esque diagrams can be weird but they say a lot in a
| very short space, I would highly recommend spending a little
| extra time working on thinking through them, they are
| everywhere!
| Clubber wrote:
| >Railroad-esque diagrams
|
| Now and forever I will think of railroads when I see these.
| hobs wrote:
| Well, you should! But, I didn't invent anything there,
| that's what they are named :)
|
| Syntax diagrams (or railroad diagrams) are a way to
| represent a context-free grammar. They represent a
| graphical alternative to Backus-Naur form, EBNF,
| Augmented Backus-Naur form, and other text-based grammars
| as metalanguages...
| https://en.wikipedia.org/wiki/Syntax_diagram
| paulryanrogers wrote:
| Sometimes I find it annoying but mostly it works well for me.
| I've come to embrace the find feature and visually scanning
| over any parenthetical stuff.
|
| The alternative is they have to break it up into several quasi
| examples, each with their own optional modifiers.
| PhilippGille wrote:
| I love the diagrams from the SQLite documentation:
| https://www.sqlite.org/syntaxdiagrams.html
| christophilus wrote:
| SQLite diagrams plus examples are pretty much the way I wish
| all of my documentation worked.
| o11c wrote:
| I've been bitten by those before because they are _not_
| generated from the actual syntax-parsing code, and thus are
| sometimes out of sync and wrong (or at least misleading).
| teddyh wrote:
| Those are commonly called "railroad diagrams": <https://en.wi
| kipedia.org/w/index.php?title=Syntax_diagram&ol...>
| assbuttbuttass wrote:
| I'm looking at the postgres docs now, and they certainly have
| the arcane formal syntax definitions, but they also have plenty
| of useful examples https://www.postgresql.org/docs/current/sql-
| altertable.html#...
| smartbit wrote:
| PostgreSQL Administration Cookbook series served me well
|
| https://www.oreilly.com/library/view/postgresql-16-administr...
| tomcam wrote:
| I wrote this to help beginners:
| https://tomcam.github.io/postgres/
| nodakai wrote:
| That nested JSON query operator chains such as
| json_col->'foo'->'bar'->>'baz' internally return (copy) entire
| sub-objects at each level and can be much slower than
| jsonb_path_query(json_col, '$.foo.bar.baz') for large JSONB data
|
| ... although I haven't had the chance to verify this myself
| hans_castorp wrote:
| You can also use #>> operator for that:
| json_col #>> '{foo,bar,baz}'
| mdaniel wrote:
| I got nerd-sniped on this, because I actually hadn't heard that
| before and would be horrified if it were true. It took some
| non-trivial digging to even get down into the "well, what does
| foo->>'bar' even map to in C?" level. I for sure am not
| claiming certainty, but based merely on
| "getIthJsonbValueFromContainer" <https://sourcegraph.com/github
| .com/postgres/postgres@REL_17_...> it does seem that they do
| palloc copies for at least some of the JSONB calls
| anarazel wrote:
| Postgres does have infrastructure to avoid this in cases
| where the result is reused, and that's used in other places,
| e.g. array constructors / accessors. But not for jsonb at the
| moment.
| munk-a wrote:
| While postgres is indeed case sensitive usually writing queries
| with keywords in all caps is an effort to increase legibility for
| visual pattern matching. It absolutely isn't needed but if I'm
| debugging a query of yours I will send it through my prettifier
| so that I can breeze through your definitions without getting
| hung up on minor weird syntax things.
|
| It's like prettification in any other language - visual
| structures that we can quickly recognize (like consistent
| indentation levels) make us waste less time on comprehension of
| the obvious so we can focus on what's important.
|
| The only thing I really object to is
| "actuallyUsingCaseInIdentifiers" I never want to see columns that
| require double quotes for me to inspect on cli.
| archsurface wrote:
| My understanding is that the caps were syntax highlighting on
| monochrome screens; no longer needed with colour. Can't provide
| a reference, it's an old memory.
| _Wintermute wrote:
| Most of the SQL I write is within a string of another
| programming language, so it's essentially monochrome unless
| there's some really fancy syntax highlighting going on.
| jaredklewis wrote:
| Aside, but jetbrains IDEs seem to have some way to detect
| embedded sql and highlight it. I don't remember configuring
| anything to get this feature.
| wmfiv wrote:
| More than highlight they'll do schema validation against
| inline SQL strings also.
| formerly_proven wrote:
| If you're working with some established framework and
| project structure their IDEs pull that information out of
| that, otherwise you'll need to at least tell it the
| dialect, but if you e.g. configure the database as a data
| source in the IDE you'll get full schema xref.
| tomjen3 wrote:
| Another aside: that is true for a huge range of
| programming languages as well as things like HTML. I
| believe it can automatically add \ to " in strings when
| those strings are marked to the IDE as HTML.
| TRiG_Ireland wrote:
| VS Code does (did?) detect embedded SQL in PHP and
| correctly colour it, but only if it's on a single line.
| Any linebreaks and the colour is turned off. Also, if
| you're using prepared statements and have an @label, and
| that label is at the end of the string (so immediately
| followed by a closing quote), the SQL colouring continues
| into the rest of the PHP beyond the string. So it's
| important that single-line SQL statements ending in a
| @label be edited into multi-line statements to turn off
| the broken SQL colouring. Odd.
| munk-a wrote:
| PHP strings tend to have better syntax highlighting with
| here/now docs (i.e. starting with `<<<TOKEN`). I've found
| SublimeText to have excellent SQL detection when using
| these tokens to delineate queries (and the syntax lends
| itself well to block strings anyways).
| yen223 wrote:
| This is Jetbrain's "language injection" feature if you
| want to look it up. It works with any languages that the
| IDE supports, and like a sibling comment mentioned it
| does more than syntax highlighting.
|
| https://www.jetbrains.com/help/idea/using-language-
| injection...
| WorldMaker wrote:
| I find that in worst cases I can always copy and paste to a
| quick temporary buffer that is highlighted. I might be
| doing that naturally anyway if I'm trying to debug it, just
| to run it in a Data IDE of my choice, but sometimes even
| just using a scratch VS Code "Untitled" file can be useful
| (it's SQL auto-detect is usually good enough, but switching
| to SQL is easy enough if it doesn't auto-detect).
| CoastalCoder wrote:
| I think the "color is all we need" idea makes sense in
| proportion to how many of our tools actually support
| colorization.
|
| E.g., the last time I used the psql program, I don't think it
| had colorization of the SQL, despite running in a color-
| capable terminal emulator.
|
| It probably doesn't help that terminal colors are a bit of
| mess. E.g., piping colored output through 'less' can result
| in some messy control-character rendering rather than having
| the desired effect.
| dllthomas wrote:
| > piping colored output through 'less' can result in some
| messy control-character rendering rather than having the
| desired effect.
|
| It can, but -G or -R can fix that.
| mixmastamyk wrote:
| Try pgcli for color and completion.
| renewiltord wrote:
| Good tip, thank you.
| vundercind wrote:
| Like sigils in that regard. Perl-type sigils are extremely
| nice... if you're editing in Notepad or some ancient vi
| without syntax highlighting and the ability to ID references
| on request. Little point to them, if you've got more-capable
| tools.
| avg_dev wrote:
| i agree; but i use caps in my codebase, and lowercase when
| testing things out manually, just for ease of typing.
| munk-a wrote:
| Ditto - if I'm throwing out an inspection query just to get a
| sense of what kind of data is in a column I won't bother with
| proper readable syntax (i.e. a `select distinct status from
| widgets`). I only really care about code I'll need to reread.
| emmanuel_1234 wrote:
| Any recommendation for a prettifier / SQL linter?
| gnulinux wrote:
| I'm curious about this for DuckDB [1]. In the last couple
| months or so I've been using DuckDB as a one-step solution to
| all problems I solve. In fact my development environment
| rarely requires anything other than Python and DuckDB (and
| some Rust if native code is necessary). DuckDB is an insanely
| fast and featureful analytic db. It'd be nice to have a
| linter, formatter etc specifically for DuckDB.
|
| There is sqlfluff etc but I'm curious what people use.
|
| [1] DuckDB SQL dialect is very close to Postgres, it's
| compatible in many ways but has some extra QOL features
| related to analytics, and lacks a few features like `vacuum
| full`;
| ptman wrote:
| sqlfluff is better than
| https://github.com/darold/pgFormatter , but it can get
| confused at times.
| homebrewer wrote:
| IDEA if you want to use it for other things (or any other
| JetBrains IDE). Nothing comes close feature-wise.
|
| If you don't:
|
| - https://www.depesz.com/2022/09/21/prettify-sql-queries-
| from-...
|
| - https://gitlab.com/depesz/pg-sql-prettyprinter
|
| Or https://paste.depesz.com for one-off use.
| jillyboel wrote:
| i think the idea sql prettifier is pretty silly sometimes.
| it really likes indenting stuff to make sure things are
| aligned, which often results in dozens of whitespaces
| homebrewer wrote:
| It makes it easy to distinguish null vs not null columns
| and other similar things, so I personally don't mind.
| wiredfool wrote:
| I find all caps identifiers wind up just looking like
| interchangeable blocks, where lowercase have word shapes. So
| all caps just slows down my reading.
| WorldMaker wrote:
| I feel similarly and I also have friends with Dyslexia with
| even stronger opinions on it. All caps in addition to being
| "shouting" to my ancient internet-using brain (and thus
| _rude_ in most cases), creates big similar rectangular blocks
| as word shapes and is such a big speed bump to reading speed
| for everyone (whether or not they notice it). For some of my
| friends with Dyslexia that have a huge tough time with word
| shapes at the best of times, all caps can be a hard stop
| "cannot read" blocker for them. They say it is like trying to
| read a redacted document where someone just made rectangular
| black marker cross outs.
|
| Personally, given SQL's intended similarity to English, I
| find that I like English "sentence case" for it, with the
| opening keyword starting with a capital letter and nearly
| every remaining letter lower case (except for Proper Nouns,
| the truly case-sensitive parts of SQL like table names).
| Sentence case has been helpful to me in the past in spotting
| things like missing semicolons in dialects like Postgres'
| that require them, and/or near keywords like `Merge` that
| require them or helping to visually make sure the `select`
| under a `Merge` is intended as a clause rather than starting
| a new "sentence".
| akira2501 wrote:
| > I find that I like English "sentence case" for it,
|
| I could go either way, but if you want to go back and
| modify a query, this makes it more difficult for me. I just
| use a block syntax for my queries: SELECT
| * FROM the_table WHERE some_column =
| 12 AND other_column IS NOT NULL ORDER
| BY order_column;
| andrei_says_ wrote:
| Also sql editors like datagrip color the sql syntax very
| well.
| yen223 wrote:
| I've always found it funny that SQL was designed the way it
| is to be as close to natural English as possible, but then
| they went ahead and made everything all-caps
| paulryanrogers wrote:
| Some old terminals didn't have lower case. Like 1960s era
| marcosdumay wrote:
| Well, as long as you aren't imposing the noisy syntax into
| everybody by pushing the case-change back into the code...
|
| But getting some editor that highlights the SQL will completely
| solve your issue.
| munk-a wrote:
| I think fighting in PRs over syntax preferences is pretty
| useless so dev shops should generally have code style
| guidelines to help keep things consistent. In my company we
| use all caps casing since we have momentum in that direction
| but I think that decision can be reasonable in either
| direction _as long as it 's consistent_ - it's like tabs vs.
| spaces... I've worked in companies with both preferences, I
| just configure my editor to auto-pretty code coming out and
| auto-lint code going in and never worry about it.
| o11c wrote:
| Note that case handling is a place where postgres (which folds
| to lowercase) violates the standard (which folds to uppercase).
|
| This is mostly irrelevant since you really shouldn't be mixing
| quoted with unquoted identifiers, and introspection largely
| isn't standardized.
| yen223 wrote:
| Given that other mainstream RDBMSes lets you configure how
| case handling should happen, Postgres is arguably the closest
| to the standard.
|
| Usual caveat of how nobody sticks to the ANSI standard anyway
| applies.
| gwbas1c wrote:
| It's really useful to know this when working with SQL
| interactively.
|
| Specifically, if I'm banging out an ad-hoc query that no one
| will ever see, and I'm going to throw away, I don't worry about
| casing.
|
| Otherwise, for me, all SQL that's checked in gets the commands
| in ALL CAPS.
| jillyboel wrote:
| what is your prettifier of choice for postgres?
| zusammen wrote:
| The all caps syntax also helps queries stand out as distinct
| from typical source languages. It is often helpful, since SQL
| tends to end up in all sorts of applications.
| grahamplace wrote:
| For checked-in SQL queries, we follow:
| https://www.sqlstyle.guide/
|
| The combination of all caps keywords + following "the river"
| whitespace pattern dramatically improves readability in my
| opinion
| MetaWhirledPeas wrote:
| > writing queries with keywords in all caps is an effort to
| increase legibility for visual pattern matching
|
| Considering most programming languages do just fine without ALL
| CAPS KEYWORDS I'd say it's a _strange_ effort. I wish SQL didn
| 't insist on being different this way.
|
| I agree with you on prettification though. As long as the
| repository chooses a prettifier you can view it _your way_ then
| commit it _their way_. So that 's my advice: always demand
| prettification for pull requests.
| sensanaty wrote:
| I don't write or read SQL too often, but I prefer the
| ALL_CAPS because it usually lets me know if something is part
| of the SQL syntax itself or a function or whatever, or if
| it's referencing a table/column/etc.
|
| Obviously not very foolproof, but automated
| linters/prettifiers like the one in DataGrip do a good job
| with this for every query I've ever thrown at it.
| mannyv wrote:
| The uppercase is usually there to show people what's SQL vs
| what's custom to your database. In books it's usually set in
| courier.
|
| I thought he was talking about psql's case sensitivity with
| table names, which is incredibly aggravating.
| nasretdinov wrote:
| Nice article! One thing I'd add is that almost all of it applies
| to other MVCC databases like MySQL too. While some details might
| be different, it too suffers from lon transactions, holds
| metadata locks during ALTERs, etc, all the good stuff :).
| avg_dev wrote:
| this is really nice. i am glad the author put it together. i
| didn't know the pg docs were 3200 pages long! i have been using
| it for a while and try to learn as i go. i really do like the
| docs. and i also like to read articles on various particular
| subjects as i find a need to.
|
| i think the author might find it helpful to readers to add a note
| to
| https://challahscript.com/what_i_wish_someone_told_me_about_...
| that if someone is selecting for b alone, then an index on
| columns (b, a) would work fine. i think this is kind of implied
| when they talk about selecting on a alone, but maybe it wouldn't
| hurt to be extra explicit.
|
| (i didn't spend much time on the json/jsonb part since i rarely
| use that stuff)
| ellisv wrote:
| I really wish developers cared more about normalization and stop
| shoving everything into a JSON(b) column.
| devin wrote:
| Even if you care about it, you will still often wind up with a
| junk drawer of JSONB. I don't really see it as a problem unless
| people are writing bad queries against it instead of lifting
| values out of the JSONB into their own columns, etc.
| mnahkies wrote:
| Yeah exactly, and I'll take a JSON(B) column over TEXT with
| maybe valid serialised JSON, maybe RON, maybe something
| completely random any day
| randomdata wrote:
| Most developers using these kinds of tools these days are
| actually building their own database management systems, just
| outsourcing the persistence to another DMBS, so there isn't a
| strong imperative to think about good design so long as it
| successfully satisfies the persistence need.
|
| Whether we actually should be building DMBSes _on top_ of
| DMBSes is questionable, but is the current state of affairs
| regardless.
| tightbookkeeper wrote:
| A previous employer thought that sql databases didn't
| understand graphs. So they made their own system for
| serializing/deserializing graphs of objects into Postgres
|
| . They never used queries and instead had their own in-memory
| operators for traversing the graph, had to solve problems
| like deleting an entry and removing all references, partial
| graph updates.
|
| And I still don't think it works.
| theamk wrote:
| This needs working schema migration process, including ability
| to undo schema change if the new column tanks the performance
| or breaks stuff.
|
| If there are CLI tools involved, you also need to ensure you
| can handle some downtime, or do synchronized version update
| across company, or support both old and new schemas for a
| while.
|
| If a database is not part of team's primary product all of this
| could be missing.
| abraae wrote:
| Long before databases could even store structured JSON data,
| junior developers used to bikeshed viciously over the correct
| degree of normalization.
|
| More experienced developers knew that the correct answer was to
| duplicate nothing (except for keys obviously) and then to
| denormalize only with extreme reluctance.
|
| Then databases like mongo came along and encouraged those
| juniors by giving them something like a database, but where
| normalization was difficult/irrelevant. The result was a brief
| flowering of horrible database designs and unmaintainable crap
| towers.
|
| Now the pendulum has swing back and people have rediscovered
| the virtues of a normalized database, but JSON columns provide
| an escape hatch where those bad practices can still flower.
| christophilus wrote:
| Eh. JSON has its place. I have some stateful data that is
| fairly transient in nature and which doesn't really matter
| all that much if it gets lost / corrupted. It's the sort of
| thing I'd throw into Redis if we had Redis in our stack. But
| the only storage in my project is S3 and Postgres. Postgres
| allows me to trivially query, update, analyze the usage of
| the feature, etc. Normalization wouldn't buy me much, if
| anything, for my use case, but it would make "save this stuff
| for later" more of a nuisance (a sync across a bunch of rows
| vs a single upsert).
|
| That said, I've worked on projects that had almost no
| normalization, and it was pure hell. I'm certainly not
| arguing against normalizing; just saying that data blobs are
| useful sometimes.
| codr7 wrote:
| Yeah, I'm def not taking a any more mongodb jobs if I can
| avoid it.
|
| I'm fine with using it for simple throw away stuff, but
| deciphering someone else's ball of json is soul killing.
| yen223 wrote:
| There are two reasons to use a jsonb column:
|
| 1. To store JSON. There's a pattern where when your webserver
| calls into some third-party API, you store the raw API response
| in a JSONB column, and then process the response from there.
| This gives you an auditable paper trail if you need to debug
| issues coming from that 3rd-party API.
|
| 2. To store sum types. SQL not supporting sum types is arguably
| the biggest deficiency when modelling data in SQL databases.
| There are several workarounds - one of them being "just chuck
| it in a JSONB column and validate it in the application" - but
| none of the workarounds is particularly great.
| zie wrote:
| I would add:
|
| 3. End-user extra fields. Stuff you don't care about, but
| someone somewhere does.
| marcosdumay wrote:
| > Normalize your data unless you have a good reason not to
|
| Ouch. You don't want to just say that and move on.
|
| The author even linked to a page citing 10 different kinds of
| normalization (11 with the "non-normalized"). Most people don't
| even know what those are, and have no use for 7 of those. Do not
| send people on wild-goose chases after those higher normal forms.
| pavel_lishin wrote:
| But the author did have a paragraph explaining, in general,
| what they mean.
|
| And they're right! I've had to fix a few issues of this in a
| project I recently got moved to. There's almost never a reason
| to duplicate data.
| hobs wrote:
| I guess this is targeted towards noobs, but the answer is
| pretty much always 3rd normal form if you are clicking this and
| are not sure.
| makr17 wrote:
| Interesting, I would have said Boyce-Codd unless you have a
| good reason to vary in either direction.
| cryptonector wrote:
| The general rule is to normalize to the max, then denormalize
| till you get the performance that you need.
| jayknight wrote:
| Anyone here use 6NF to avoid nulls? See
| https://36chambers.wordpress.com/2021/10/22/avoiding-null-
| wi...
| cryptonector wrote:
| Isn't 6NF essentially a flavor of EAV? I think essentially
| it is.
|
| 6NF means having one non-PK column, so that if the value
| would be NULL then the row need not exist, and so the value
| column can be NOT NULL.
|
| But when you need all the columns of what you'd think of as
| the <thing> then you need to go gather them from all those
| rows in all those tables. It's a bit annoying. On the other
| hand a proper EAV schema has its positives (though also its
| negatives).
| DanHulton wrote:
| The one exception I'll make from the very start is "add
| tenant identifier to every row, yes, even if it's linked to
| another table that has tenant identifiers."
|
| Sure, this means you will have some "unnecessary" `tenant_id`
| columns in some tables that you could get through a relation,
| but it saves you from _having_ to include that relation just
| to limit by tenant, which you will some day almost be
| guaranteed to want. (Especially if you want to enable row-
| level security1 someday.)
|
| 1 - https://www.postgresql.org/docs/current/ddl-
| rowsecurity.html
| getcrunk wrote:
| Nice one! I'm working on an app the is user/consumer facing
| but will eventually have a teams/bussiness offering.
| Everything has a uid with it cus users are the core of your
| app. But yea if your multitennant then tenants should be
| treated like that as well.
| magicalhippo wrote:
| We do a variant of this across all our tables. If we have a
| parent-child-child relationship, then all child tables,
| regardless of depth, will have the parent id.
|
| This way we can load up all the data needed to process an
| order or invoice etc easily, without a ton of joins.
|
| We don't do multi-tenant, instead separate databases per
| tenant so far.
| wadadadad wrote:
| How is this working for you so far? Do you ever need to
| report across the multiple tenants, and how do database
| migrations go? I'm starting to look into this, and purely
| for reporting and database migrations I'm leaning towards
| multi-tenant.
| marcosdumay wrote:
| Oh, no, it's absolutely not.
|
| It's usually to normalize into the 3rd form. But that's not
| enough on some cases, that's too much on some other cases,
| and the reason it breaks is performance about as often as
| it's not.
| yen223 wrote:
| "normalise til it hurts, denormalise til it works"
| pavel_lishin wrote:
| A lot of these aren't postgres-specific. (null weirdness, index
| column order, etc.)
|
| For example, how nulls work - _especially how interact with
| indexes and unique constraints_ - is also non-intuitive in mysql.
|
| If you have a user table with a non-nullable email column and a
| nullable username column, and a uniqueness constraint on
| something like (email, username), you'll be able to insert
| multiple identical emails with a null username into that table -
| because a null isn't equivalent to another null.
| anarazel wrote:
| > If you have a user table with a non-nullable email column and
| a nullable username column, and a uniqueness constraint on
| something like (email, username), you'll be able to insert
| multiple identical emails with a null username into that table
| - because a null isn't equivalent to another null.
|
| FWIW, since 15 postgres you can influence that behaviour with
| NULLS [NOT] DISTINCT for constraints and unique indexes.
|
| https://www.postgresql.org/docs/devel/sql-createtable.html#S...
|
| EDIT: Added link
| bpicolo wrote:
| I think this is a good pragmatic default. The use case for the
| alternative is much more rare.
| the5avage wrote:
| From my experience with a lot of hilarious SQL stuff I have seen
| in the wild.
|
| It would be a good start to read the paper of codd and trying to
| understand what the relational model is. It's only 11 pages long
| and doing that would reduce the suffering in this world.
| samsartor wrote:
| Instead of psql, I really like https://github.com/dbcli/pgcli
| vivzkestrel wrote:
| since we are on the topic and since your article clearly mentions
| "Normalize your data unless you have a good reason not to" I had
| to ask. I am trying to build a news aggregator and I have many
| websites. Each of them has slightly different format. Even though
| I use feedparser in python, it still doesn't change how some of
| them put html text inside content and some of them break it down
| into a separate media xml attribute while retaining only basic
| textual summary inside a summary attribute. Do you think it makes
| more sense to store a single rss item as a column inside postgres
| or should it be stored after parsing it? I can see upsides and
| downsides to both approaches. Store it as XML and you have the
| ability to change your processing logic down the line for each
| stored item but you lose the flexibility of querying metadata and
| you also have to parse it on the fly every single time. Store it
| in multiple columns after processing it and it may require
| different logic for different websites + changing your overall
| python processing logic requires a lot of thinking on how it
| might affect some source. What do you guys recommend?
| cryptonector wrote:
| With _all_ RDBMSes the rule is "normalize to the max, then
| denormalize till you get the performance that you need".
| krab wrote:
| Just shooting without knowing your real needs - take this with
| a grain of salt.
|
| Store some parsed representation that makes it easier for you
| to work with (probably normalized). Keep an archive of raw data
| somewhere. That may be another column, table or even S3 bucket.
| Don't worry about schema changes but you need to not lose the
| original data. There are some pitfalls to schema migrations.
| But the schema should be the representation that works for you
| _at the moment_, otherwise it'll slow you down.
| sea6ear wrote:
| With the caveat that you probably shouldn't listen to me (or
| anyone else on here) since you are the only one who knows how
| much pain each choice will be ...
|
| I think that given that you are not really dealing with
| structured data - you've said that different sites have
| different structures, and I assume even with processing, you
| may not be able to generate identical metadata structures from
| each entry.
|
| I think I would go for one column of XML, plus maybe another
| column that holds a parsed data structure that represents the
| result of your processing (basically a cache holding the post-
| processed version of each site). Hopefully that could be re-
| evaluated by whatever language (Python?) you are using for your
| application. That way you don't have to do the full parsing
| each time you want to examine the entry, but you have access to
| something that can quickly give you whatever metadata is
| associated with it, but which doesn't tie you to the rigid
| structure of a table based database.
|
| Once you know what you are really doing with the data, then you
| could add additional metadata columns that are more rigid, and
| which can be queried directly in SQL as you identify patterns
| that are useful for performance.
| cryptonector wrote:
| Watch out, for row/record values, if a column in the row/record
| is NULL then IS NULL will be true! You want to use IS [NOT]
| DISTINCT FROM NULL, full stop.
| deadbabe wrote:
| Don't create views that reference other views.
| christophilus wrote:
| I'd never stumbled across the "don't do this" wiki entry[0]
| before. Very handy.
|
| [0] https://wiki.postgresql.org/wiki/Don%27t_Do_This
| datadrivenangel wrote:
| This reminds me of SQL Anti-patterns, which is a book that
| everyone who works with databases should read.
| gwbas1c wrote:
| Why don't they deprecate some of these features? If they're
| such easy stumbling blocks, seems like it makes sense to
| disable things like table inheritance in new schemas, and
| require some kind of arcane setting to re-enable them.
| __loam wrote:
| Changing defaults can screw over existing users.
| SoftTalker wrote:
| Presumably there are rare exceptions where you DO want to do
| the thing.
| mixmastamyk wrote:
| Several of the broken are SQL standard.
| samarthr1 wrote:
| That was a fun read, thanks!
|
| Made me reconsider a few habits I picked up from MySQL land
| teddyh wrote:
| \pset null ''
| RedShift1 wrote:
| I can't horizontally scroll on mobile, can't see the full query
| texts...
| hibachrach wrote:
| (author here) hmm it seems to work on my device. Are you
| scrolling within the code block?
| ahachete wrote:
| > Most notably, 'null'::jsonb = 'null'::jsonb is true whereas
| NULL = NULL is NULL
|
| Because 'null' in the JSON spec is a literal value (a constant),
| not SQL's NULL. Nothing to see here.
|
| https://datatracker.ietf.org/doc/html/rfc7159
| hibachrach wrote:
| Right--it makes sense and shouldn't be changed, but it's a bit
| unintuitive for newcomers.
| zmmmmm wrote:
| > It's possible that adding an index will do nothing
|
| This is one of the more perplexing thing to me where Postgres
| ideology is a bit too strong, or at least, the way it works is
| too hard for me to understand (and I've tried - I'm not going to
| claim I'm a genius but I'm also not a moron). I hear there may be
| finally support for some kind of hints in upcoming versions,
| which would be very welcome to me. I've spent way too much time
| trying to divine the voodoo of why a slow query is not using
| indexes when it seems obvious that it should.
| sroerick wrote:
| It's always a relief to read stuff articles this, realize I know
| 90% of it, and I've deserved the jobs I've had.
|
| Great and super useful notes
| cynicalsecurity wrote:
| People who use PostgreSQL instead of MySQL just want to suffer
| while pretending "they are better than others".
___________________________________________________________________
(page generated 2024-11-12 23:01 UTC)