[HN Gopher] Nullable but not null
       ___________________________________________________________________
        
       Nullable but not null
        
       Author : efeoge
       Score  : 55 points
       Date   : 2025-07-25 15:22 UTC (7 hours ago)
        
 (HTM) web link (efe.me)
 (TXT) w3m dump (efe.me)
        
       | lblume wrote:
       | > But a field that is nullable in the schema and never null in
       | practice is a silent lie.
       | 
       | This seems to be the central claim. But just as lies require
       | intent, so does database design to some degree.
       | 
       | A column that is nullable but never null does not conclusively
       | say anything, really. That's like saying a birthday column that
       | is never before 1970 in the current data should be restricted to
       | years after that date.
       | 
       | A nullable column signals that the data may be left empty, which
       | is entirely different from that column actually having to be
       | empty with the current data. Is-ought distinction, the
       | potentiality of null ("nullability") is not to be confused with
       | the actuality of current vacancy ("null values"). The programmer
       | extracting data from the database must check for null if the data
       | is nullable, not just if is null now as a principle of robust
       | fault-tolerant code.
        
         | matsemann wrote:
         | Problem is you end up other places with the assumption thar
         | it's never null. So in the future when you actually set it to
         | null somewhere it will blow up.
        
           | tuyiown wrote:
           | People that assume that a (nullable) value is never null
           | because the sample at hand does not contain a null value
           | _will_ learn their lesson the hard way, the one you describe.
           | 
           | But even non-nullable does not always resist to time, I'd
           | argue that use cases where the field _has_ to be null
           | eventually emerges and somehow have to be mitigated. There is
           | no easy solution to safely workaround that without either
           | tons work that duplicates lots of things or taking risks by
           | adapting the code base.
        
         | darkwater wrote:
         | I think their point is that for all intents, the column IS not
         | nullable. It's nullable as an artifact of making live schema
         | migration easier, with no blocking/downtime. But as the data
         | model is concerned, it should not be nullable.
        
           | lblume wrote:
           | Sure, if one just leaves a column nullable due to negligence,
           | one should check the actual requirements and enforce them to
           | make invalid states unrepresentable. The author still makes
           | too strong of a claim that becomes detached from the
           | migration aspect, insinuating that one can just empirically
           | test the database to check whether this is the case, to which
           | I disagree.
        
         | jerf wrote:
         | I don't think the author is talking generally about fields that
         | could be NULL but just happen to never be so in the production
         | DB. The piece is specifically in the context of a new database
         | field that is fully intended by its designer to be NOT NULL,
         | which was NULL only for migration purposes, and which was never
         | updated to be NOT NULL once the migration is complete. The
         | point was not meant to be extended beyond that.
         | 
         | One could write a separate piece about maybe using that as a
         | clue that the field could be NOT NULL'd in the future but
         | that's not what this post is.
        
           | lblume wrote:
           | In that case we read it entirely differently -- the last
           | paragraph explicitly mentions inferring "Nullable fields that
           | never hold nulls are often the result of incomplete
           | migrations or forgotten cleanup steps". The author also
           | proposes a script "to identify cases where the field could
           | safely be changed to non-nullable". But alas, we cannot do
           | that with full generality, due to there being a big
           | difference in intent that cannot be captured purely by
           | counting nulls, and surely not by only calculating
           | percentages.
        
         | dataflow wrote:
         | > That's like saying a birthday column that is never before
         | 1970 in the current data should be restricted to years after
         | that date.
         | 
         | No it's not, because they specifically started with the premise
         | that the field was initially intentionally non-null and was
         | _only_ temporarily made nullable for migration purposes. That
         | is _obviously_ not the situation you are describing, right?
        
       | codingdave wrote:
       | How would the database know whether the other app layers depend
       | on that value or not? You could absolutely have an app that does
       | not require data in a specific field to function, yet all records
       | happen to have data. This is actually fairly common in single-
       | tenant apps, where some tenants populate a field and others do
       | not. You need to look at how the data is used across the entire
       | stack to know whether or not it should be nullable, not whatever
       | the current data happens to be.
        
         | cerved wrote:
         | It doesn't. That's why it's the responsibility of the
         | application layer to correctly implement the data model of the
         | database and not the other way around
        
       | danbruc wrote:
       | A column that is nullable but never null might indicate that it
       | should be non-nullable but does not necessarily imply so. Say you
       | have an optional comment field, it might just happen by accident
       | that a comment was recorded for each row, but that of course
       | becomes increasingly unlikely with each additional row in the
       | table. There is probably no harm in checking your database for
       | such columns, especially in tables with many rows, but in the end
       | it is up to the semantics of the data model whether a column
       | should be nullable or not. The absence of NULLs is an indicator
       | but not a determiner.
        
         | Szpadel wrote:
         | there is also other scenario, field might obviously looks like
         | it should have value, and in check it might event always have
         | it, but it might be "lazy" value.
         | 
         | eg. you might have some bug CSV uploaded and your have number
         | of rows in it, your app could insert record without this number
         | and async process would fill that later.
         | 
         | there might be even some corner case where null value is
         | possible
         | 
         | I believe solution here isn't to check what fields do not use
         | null, but to improve process of creating such migration. either
         | you should create second ticket for next release to update db
         | or commit new migration to some kind of next branch.
        
         | marcosdumay wrote:
         | > There is probably no harm in checking your database for such
         | columns
         | 
         | The harm is the same as any other unreliable linter rule.
         | 
         | Each one such rule is almost harmless. And on most places that
         | use that kind of rule, they are extremely harmful.
        
           | danbruc wrote:
           | With that I agree, once you start to treat the output of any
           | linter as the truth, you are actively lowering the quality of
           | your codebase. At the very least if it has reasonably good
           | quality, if it is in bad shape, blindly following linters
           | might still increase code quality, you will just not reach
           | the peak.
        
       | mdavid626 wrote:
       | I completely agree. Unfortunately, this is one of those things
       | that's hard to convince people of. You often hear: 'It doesn't
       | matter,' 'It works fine without it,' or 'Why overcomplicate
       | things?'--but that mindset can be limiting.
        
       | deepsun wrote:
       | I've seen worse. Some teams use JSON for their data. Not only
       | each field can be missing (aka NULL), it can also be "null". Or a
       | different type.
       | 
       | I envy your team who's only mistake is to forget setting
       | NULLABLE. Rainbows and unicorns ;)
        
         | thomas-st wrote:
         | That's just the JSON equivalent of "we have data, and it's
         | null" vs "data is missing", and consistency could be enforced
         | using a constraint or by making it non-NULL in the first place.
         | 
         | It's more common in string fields, which in many cases just get
         | rendered on a web form that doesn't differentiate blank (empty
         | string) from null state, therefore in the database we should in
         | most cases set it up as follows:
         | 
         | - Value required: field should be non-NULL + at least length 1
         | (via check constraint)
         | 
         | - Value optional: either field is non-NULL, or field is
         | nullable + at least length 1
         | 
         | I'm curious if you prefer to store optional strings as non-NULL
         | and rely on the length, or as nullable and have a length
         | constraint.
        
         | lock1 wrote:
         | What if almost everything is NULLABLE? including the supposedly
         | primary key and foreign keys of the table?
         | 
         | I've had the firsthand experience building a consumer for that
         | kind of DB and it's hell to get anything running correctly
         | without literally writing a dedicated layer to sanity check
         | everything.
        
           | phplovesong wrote:
           | Sounds like a bad domain, and/or poor design.
        
           | gopher_space wrote:
           | > without literally writing a dedicated layer to sanity check
           | everything
           | 
           | It's not attractive to developers for a variety of reasons,
           | but encapsulation is generally cheaper (in every sense) than
           | the alternatives by orders of magnitude. If a system is hard
           | to alter _and_ other people rely on it to do their jobs,
           | thinking about it as set in stone during my planning process
           | will save me grief and heartache.
           | 
           | Starting with an independent client in mind makes it easier
           | to plan around things like reconstructing tables from
           | composed exports or dealing with authoritatively incorrect
           | data. It's a good collection point for hard-coded edge case
           | handling, and a great location for all of your discovery
           | notes.
        
       | Trigg3r wrote:
       | Am I missing something here in my (MS) SQL world? if a new field
       | is added as null, I do that to the (now) 20 year old system to we
       | don't break 100's of stored procs - any (new) code that needs
       | that field, has to check for it being null...
        
         | minkeymaniac wrote:
         | in sql server, you can simply add a not null check constraint
         | with nocheck (see my comment with full code)
         | 
         | ALTER TABLE foo WITH NOCHECK ADD CONSTRAINT CheckNotnull CHECK
         | (id IS NOT NULL)
         | 
         | any new values coming in cannot be null but the values already
         | in the table with null are fine... then you can update them to
         | not null over time
        
       | stux wrote:
       | This is interesting! A field being nullable because it's
       | legitimately optional in the domain model is one thing, but for
       | _new_ fields which shouldn 't be nullable in the domain model,
       | unless you can pick a reasonable identity value, you need a
       | concept of absence that's different from null. Luckily the
       | intersection of "non-nullable fields" and "fields with no
       | reasonable identity value" and "fields which didnt exist in v1 of
       | the domain model" is _normally_ pretty small, but it 's painful
       | when it happens.
       | 
       | This reminds me of frozen/nonfrozen enums in Swift. You can do
       | exhaustive case analysis on frozen enums, but case analysis on
       | nonfrozen enums requires adding an `@unknown default` case.
       | 
       | https://docs.swift.org/swift-book/documentation/the-swift-pr...
        
       | PeterZaitsev wrote:
       | Should not we look for database to be able to do online,
       | efficient non locking addition of column with any default value,
       | not just NULL rather than application to have a complicated and
       | fragile logic ?
        
         | avg_dev wrote:
         | I believe PostgreSQL does this since v11, which was released in
         | 2018: (current is v17)
         | 
         | > Many other useful performance improvements, including the
         | ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN
         | with a non-null column default
         | 
         | https://www.postgresql.org/docs/release/11.0/
         | 
         | I think there is some restriction there, like the default can't
         | be "volatile" - I can't remember the precise definition here
         | but I think current_timestamp would be volatile, but any static
         | value would not.
        
           | tudorg wrote:
           | That is correct, for non-volatile default values Postgres is
           | quick, which means that it is generally a safe operation.
           | 
           | Also interesting, `now()` is non-volatile because it's
           | defined as "start of the transaction". So if you add a column
           | with `DEFAULT now()` all rows will get the same value. But
           | `timeofday()` is not volatile, so `DEFAULT timeofday()` is
           | going to lock the table for a long time. A bit of a subtle
           | gotcha.
        
             | avg_dev wrote:
             | Thanks for the info. One minor point:
             | 
             | > But `timeofday()` is not volatile, so `DEFAULT
             | timeofday()` is going to lock the table for a long time.
             | 
             | Perhaps the "not" was a typo?
        
       | HocusLocus wrote:
       | I'm glad to see people discussing [zero,infinitesimals,false,empt
       | ystring,isnull,unpopulated,missing] as if each one is a thing.
       | They've always been things! We've just been buried in compromises
       | and shortcuts all these years.
       | 
       | There should also be a [the-asteroid-has-hit-y'all-are-so-stupid]
       | and global data systems should just just pass that around after
       | impact until the power goes out for good.
        
       | comrade1234 wrote:
       | I've never seen step 5 happen...
        
       | tudorg wrote:
       | For a tooling solution for this problem, and many others, pgroll
       | (https://github.com/xataio/pgroll) automates the steps from the
       | blog post in a single higher-level operation. It can do things
       | like adding a hidden column, backfill it with data, then adds the
       | constraint, and only then expose it in the new schema.
        
       | jonny_eh wrote:
       | I recommend using checklists for schema changes like this.
        
       | minkeymaniac wrote:
       | If you don't care for old data having null , you could add a
       | check contraint with nocheck (this is sql server fwiw)
       | 
       | for example
       | 
       | create table foo(id int) insert foo values (1), (2), (3)
       | 
       | insert foo values (null)
       | 
       | select * from foo
       | 
       | id
       | 
       | 1
       | 
       | 2
       | 
       | 3
       | 
       | NULL
       | 
       | ALTER TABLE foo with nocheck ADD CONSTRAINT CheckNotnull check
       | (id IS NOT NULL)
       | 
       | insert foo values (null)
       | 
       | Msg 547, Level 16, State 0, Line 13 The INSERT statement
       | conflicted with the CHECK constraint "CheckNotnull". The conflict
       | occurred in database tempdb", table "dbo.foo", column 'id'. The
       | statement has been terminated.
       | 
       | However be aware that if you update an existing value to NULL,
       | you will still get the error
       | 
       | update foo set id = null where id = 2
       | 
       | Msg 547, Level 16, State 0, Line 20 The UPDATE statement
       | conflicted with the CHECK constraint "CheckNotnull". The conflict
       | occurred in database "tempdb", table "dbo.foo", column 'id'.
        
       ___________________________________________________________________
       (page generated 2025-07-25 23:01 UTC)