[HN Gopher] Nullable but not null
___________________________________________________________________
Nullable but not null
Author : efeoge
Score : 55 points
Date : 2025-07-25 15:22 UTC (7 hours ago)
(HTM) web link (efe.me)
(TXT) w3m dump (efe.me)
| lblume wrote:
| > But a field that is nullable in the schema and never null in
| practice is a silent lie.
|
| This seems to be the central claim. But just as lies require
| intent, so does database design to some degree.
|
| A column that is nullable but never null does not conclusively
| say anything, really. That's like saying a birthday column that
| is never before 1970 in the current data should be restricted to
| years after that date.
|
| A nullable column signals that the data may be left empty, which
| is entirely different from that column actually having to be
| empty with the current data. Is-ought distinction, the
| potentiality of null ("nullability") is not to be confused with
| the actuality of current vacancy ("null values"). The programmer
| extracting data from the database must check for null if the data
| is nullable, not just if is null now as a principle of robust
| fault-tolerant code.
| matsemann wrote:
| Problem is you end up other places with the assumption thar
| it's never null. So in the future when you actually set it to
| null somewhere it will blow up.
| tuyiown wrote:
| People that assume that a (nullable) value is never null
| because the sample at hand does not contain a null value
| _will_ learn their lesson the hard way, the one you describe.
|
| But even non-nullable does not always resist to time, I'd
| argue that use cases where the field _has_ to be null
| eventually emerges and somehow have to be mitigated. There is
| no easy solution to safely workaround that without either
| tons work that duplicates lots of things or taking risks by
| adapting the code base.
| darkwater wrote:
| I think their point is that for all intents, the column IS not
| nullable. It's nullable as an artifact of making live schema
| migration easier, with no blocking/downtime. But as the data
| model is concerned, it should not be nullable.
| lblume wrote:
| Sure, if one just leaves a column nullable due to negligence,
| one should check the actual requirements and enforce them to
| make invalid states unrepresentable. The author still makes
| too strong of a claim that becomes detached from the
| migration aspect, insinuating that one can just empirically
| test the database to check whether this is the case, to which
| I disagree.
| jerf wrote:
| I don't think the author is talking generally about fields that
| could be NULL but just happen to never be so in the production
| DB. The piece is specifically in the context of a new database
| field that is fully intended by its designer to be NOT NULL,
| which was NULL only for migration purposes, and which was never
| updated to be NOT NULL once the migration is complete. The
| point was not meant to be extended beyond that.
|
| One could write a separate piece about maybe using that as a
| clue that the field could be NOT NULL'd in the future but
| that's not what this post is.
| lblume wrote:
| In that case we read it entirely differently -- the last
| paragraph explicitly mentions inferring "Nullable fields that
| never hold nulls are often the result of incomplete
| migrations or forgotten cleanup steps". The author also
| proposes a script "to identify cases where the field could
| safely be changed to non-nullable". But alas, we cannot do
| that with full generality, due to there being a big
| difference in intent that cannot be captured purely by
| counting nulls, and surely not by only calculating
| percentages.
| dataflow wrote:
| > That's like saying a birthday column that is never before
| 1970 in the current data should be restricted to years after
| that date.
|
| No it's not, because they specifically started with the premise
| that the field was initially intentionally non-null and was
| _only_ temporarily made nullable for migration purposes. That
| is _obviously_ not the situation you are describing, right?
| codingdave wrote:
| How would the database know whether the other app layers depend
| on that value or not? You could absolutely have an app that does
| not require data in a specific field to function, yet all records
| happen to have data. This is actually fairly common in single-
| tenant apps, where some tenants populate a field and others do
| not. You need to look at how the data is used across the entire
| stack to know whether or not it should be nullable, not whatever
| the current data happens to be.
| cerved wrote:
| It doesn't. That's why it's the responsibility of the
| application layer to correctly implement the data model of the
| database and not the other way around
| danbruc wrote:
| A column that is nullable but never null might indicate that it
| should be non-nullable but does not necessarily imply so. Say you
| have an optional comment field, it might just happen by accident
| that a comment was recorded for each row, but that of course
| becomes increasingly unlikely with each additional row in the
| table. There is probably no harm in checking your database for
| such columns, especially in tables with many rows, but in the end
| it is up to the semantics of the data model whether a column
| should be nullable or not. The absence of NULLs is an indicator
| but not a determiner.
| Szpadel wrote:
| there is also other scenario, field might obviously looks like
| it should have value, and in check it might event always have
| it, but it might be "lazy" value.
|
| eg. you might have some bug CSV uploaded and your have number
| of rows in it, your app could insert record without this number
| and async process would fill that later.
|
| there might be even some corner case where null value is
| possible
|
| I believe solution here isn't to check what fields do not use
| null, but to improve process of creating such migration. either
| you should create second ticket for next release to update db
| or commit new migration to some kind of next branch.
| marcosdumay wrote:
| > There is probably no harm in checking your database for such
| columns
|
| The harm is the same as any other unreliable linter rule.
|
| Each one such rule is almost harmless. And on most places that
| use that kind of rule, they are extremely harmful.
| danbruc wrote:
| With that I agree, once you start to treat the output of any
| linter as the truth, you are actively lowering the quality of
| your codebase. At the very least if it has reasonably good
| quality, if it is in bad shape, blindly following linters
| might still increase code quality, you will just not reach
| the peak.
| mdavid626 wrote:
| I completely agree. Unfortunately, this is one of those things
| that's hard to convince people of. You often hear: 'It doesn't
| matter,' 'It works fine without it,' or 'Why overcomplicate
| things?'--but that mindset can be limiting.
| deepsun wrote:
| I've seen worse. Some teams use JSON for their data. Not only
| each field can be missing (aka NULL), it can also be "null". Or a
| different type.
|
| I envy your team who's only mistake is to forget setting
| NULLABLE. Rainbows and unicorns ;)
| thomas-st wrote:
| That's just the JSON equivalent of "we have data, and it's
| null" vs "data is missing", and consistency could be enforced
| using a constraint or by making it non-NULL in the first place.
|
| It's more common in string fields, which in many cases just get
| rendered on a web form that doesn't differentiate blank (empty
| string) from null state, therefore in the database we should in
| most cases set it up as follows:
|
| - Value required: field should be non-NULL + at least length 1
| (via check constraint)
|
| - Value optional: either field is non-NULL, or field is
| nullable + at least length 1
|
| I'm curious if you prefer to store optional strings as non-NULL
| and rely on the length, or as nullable and have a length
| constraint.
| lock1 wrote:
| What if almost everything is NULLABLE? including the supposedly
| primary key and foreign keys of the table?
|
| I've had the firsthand experience building a consumer for that
| kind of DB and it's hell to get anything running correctly
| without literally writing a dedicated layer to sanity check
| everything.
| phplovesong wrote:
| Sounds like a bad domain, and/or poor design.
| gopher_space wrote:
| > without literally writing a dedicated layer to sanity check
| everything
|
| It's not attractive to developers for a variety of reasons,
| but encapsulation is generally cheaper (in every sense) than
| the alternatives by orders of magnitude. If a system is hard
| to alter _and_ other people rely on it to do their jobs,
| thinking about it as set in stone during my planning process
| will save me grief and heartache.
|
| Starting with an independent client in mind makes it easier
| to plan around things like reconstructing tables from
| composed exports or dealing with authoritatively incorrect
| data. It's a good collection point for hard-coded edge case
| handling, and a great location for all of your discovery
| notes.
| Trigg3r wrote:
| Am I missing something here in my (MS) SQL world? if a new field
| is added as null, I do that to the (now) 20 year old system to we
| don't break 100's of stored procs - any (new) code that needs
| that field, has to check for it being null...
| minkeymaniac wrote:
| in sql server, you can simply add a not null check constraint
| with nocheck (see my comment with full code)
|
| ALTER TABLE foo WITH NOCHECK ADD CONSTRAINT CheckNotnull CHECK
| (id IS NOT NULL)
|
| any new values coming in cannot be null but the values already
| in the table with null are fine... then you can update them to
| not null over time
| stux wrote:
| This is interesting! A field being nullable because it's
| legitimately optional in the domain model is one thing, but for
| _new_ fields which shouldn 't be nullable in the domain model,
| unless you can pick a reasonable identity value, you need a
| concept of absence that's different from null. Luckily the
| intersection of "non-nullable fields" and "fields with no
| reasonable identity value" and "fields which didnt exist in v1 of
| the domain model" is _normally_ pretty small, but it 's painful
| when it happens.
|
| This reminds me of frozen/nonfrozen enums in Swift. You can do
| exhaustive case analysis on frozen enums, but case analysis on
| nonfrozen enums requires adding an `@unknown default` case.
|
| https://docs.swift.org/swift-book/documentation/the-swift-pr...
| PeterZaitsev wrote:
| Should not we look for database to be able to do online,
| efficient non locking addition of column with any default value,
| not just NULL rather than application to have a complicated and
| fragile logic ?
| avg_dev wrote:
| I believe PostgreSQL does this since v11, which was released in
| 2018: (current is v17)
|
| > Many other useful performance improvements, including the
| ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN
| with a non-null column default
|
| https://www.postgresql.org/docs/release/11.0/
|
| I think there is some restriction there, like the default can't
| be "volatile" - I can't remember the precise definition here
| but I think current_timestamp would be volatile, but any static
| value would not.
| tudorg wrote:
| That is correct, for non-volatile default values Postgres is
| quick, which means that it is generally a safe operation.
|
| Also interesting, `now()` is non-volatile because it's
| defined as "start of the transaction". So if you add a column
| with `DEFAULT now()` all rows will get the same value. But
| `timeofday()` is not volatile, so `DEFAULT timeofday()` is
| going to lock the table for a long time. A bit of a subtle
| gotcha.
| avg_dev wrote:
| Thanks for the info. One minor point:
|
| > But `timeofday()` is not volatile, so `DEFAULT
| timeofday()` is going to lock the table for a long time.
|
| Perhaps the "not" was a typo?
| HocusLocus wrote:
| I'm glad to see people discussing [zero,infinitesimals,false,empt
| ystring,isnull,unpopulated,missing] as if each one is a thing.
| They've always been things! We've just been buried in compromises
| and shortcuts all these years.
|
| There should also be a [the-asteroid-has-hit-y'all-are-so-stupid]
| and global data systems should just just pass that around after
| impact until the power goes out for good.
| comrade1234 wrote:
| I've never seen step 5 happen...
| tudorg wrote:
| For a tooling solution for this problem, and many others, pgroll
| (https://github.com/xataio/pgroll) automates the steps from the
| blog post in a single higher-level operation. It can do things
| like adding a hidden column, backfill it with data, then adds the
| constraint, and only then expose it in the new schema.
| jonny_eh wrote:
| I recommend using checklists for schema changes like this.
| minkeymaniac wrote:
| If you don't care for old data having null , you could add a
| check contraint with nocheck (this is sql server fwiw)
|
| for example
|
| create table foo(id int) insert foo values (1), (2), (3)
|
| insert foo values (null)
|
| select * from foo
|
| id
|
| 1
|
| 2
|
| 3
|
| NULL
|
| ALTER TABLE foo with nocheck ADD CONSTRAINT CheckNotnull check
| (id IS NOT NULL)
|
| insert foo values (null)
|
| Msg 547, Level 16, State 0, Line 13 The INSERT statement
| conflicted with the CHECK constraint "CheckNotnull". The conflict
| occurred in database tempdb", table "dbo.foo", column 'id'. The
| statement has been terminated.
|
| However be aware that if you update an existing value to NULL,
| you will still get the error
|
| update foo set id = null where id = 2
|
| Msg 547, Level 16, State 0, Line 20 The UPDATE statement
| conflicted with the CHECK constraint "CheckNotnull". The conflict
| occurred in database "tempdb", table "dbo.foo", column 'id'.
___________________________________________________________________
(page generated 2025-07-25 23:01 UTC)