[HN Gopher] Postgres 15 improves UNIQUE and NULL
       ___________________________________________________________________
        
       Postgres 15 improves UNIQUE and NULL
        
       Author : Twisell
       Score  : 391 points
       Date   : 2022-07-11 07:29 UTC (15 hours ago)
        
 (HTM) web link (blog.rustprooflabs.com)
 (TXT) w3m dump (blog.rustprooflabs.com)
        
       | daigoba66 wrote:
       | Coming from an MS SQL Server background, this behavior surprised
       | me.
       | 
       | While SQL Server generally does treat null values as not equal in
       | query predicates, for the purpose of unique constraints/indexes
       | it'll treat them as the same value. I guess this does go against
       | the spec? But it makes sense as a human when looking at a tuple
       | and deciding if it is "unique" or not.
       | 
       | On that note, now I wonder what SQL Server does with nulls in a
       | DISTINCT or GROUP BY. I suspect that it'll treat all nulls as the
       | same.
        
         | hans_castorp wrote:
         | > Coming from an MS SQL Server background, this behavior
         | surprised me.
         | 
         | Oracle behaves the same as Postgres with regards to NULL values
         | in unique indexes.
         | 
         | > I guess this does go against the spec?
         | 
         | The first comment (by the blog author)
         | https://news.ycombinator.com/item?id=32054151 explained this:
         | 
         | > The background here is that the SQL standard was ambiguous
         | about which of the two ways an implementation should behave. So
         | in the upcoming SQL:202x, this was addressed by making the
         | behavior implementation-defined and adding this NULLS [NOT]
         | DISTINCT option to pick the other behavior.
        
         | hotsauceror wrote:
         | An interesting question. They may be treated differently
         | depending on whether the nullable field is one of the grouping
         | fields, or is one of the selected aggregate fields.
         | 
         | https://blog.sqlauthority.com/2015/02/13/sql-server-warning-...
        
       | rwmj wrote:
       | I think the main lesson from this is you should use a NOT NULL
       | constraint on just about every column of your database, since the
       | behaviour of NULLs is weird and likely to indicate a problem with
       | your data.
        
         | nicoburns wrote:
         | Except that this change means that the behaviour of nulls is
         | not weird anymore!
        
           | masklinn wrote:
           | Null is still null, sql null is always "weird" wherever it
           | exists.
        
             | gigatexal wrote:
             | Why is null weird? Null means unknown. It's perfectly
             | valid.
        
               | vasachi wrote:
               | It is kinda weird, if you don't know how it works. For
               | instance:
               | 
               | Say you have a unique constraint on two columns
               | (column_a, column_b). column_a is not nullable, column_b
               | IS nullable. Obviously these values are unique:
               | 
               | (1, 2) (1, 3) (2, 2)
               | 
               | But these values ARE ALSO UNIQUE:
               | 
               | (1, NULL) (1, NULL)
               | 
               | It is obvious in hindsight (NULL isn't equal to NULL),
               | but can be quite a surprise.
               | 
               | ...
               | 
               | Oh wait, postgres 15 deals with just this situation. Huh.
        
               | jhugo wrote:
               | Honestly it's what you want way more often than not. A
               | unique index on a nullable column is usually representing
               | some optional identifier. You wouldn't usually want only
               | one row to be able to omit the identifier.
        
               | vasachi wrote:
               | Oh yeah, totally. But not always. I'm very glad that we
               | now have an option to have both behaviours.
        
               | Sankozi wrote:
               | It makes lots of queries (like find duplicate queries)
               | much more complex without any benefit. Also it introduces
               | completely unnecessary three value logic in SQL.
        
               | masklinn wrote:
               | "Valid" and "weird" are orthogonal concepts.
               | 
               | That sql null is valid doesn't make its behaviour of not
               | being equal to itself any less weird, because it works
               | that way essentially nowhere else, and thus is highly
               | unintuitive to developers who don't live and breathe sql.
               | 
               | The closest thing in most langages is nan and developers
               | also find nan weird.
        
               | gigatexal wrote:
               | Think of nulls like different infinities. Two infinities
               | may not be equal.
               | 
               | I find it quite useful to know when something is null
               | though I have a hard time explaining it.
        
               | nicoburns wrote:
               | > The closest thing in most langages is nan and
               | developers also find nan weird.
               | 
               | I'd argue the closest thing in most languages is the null
               | propagation operator (usually using a ? symbol) that at
               | least JavaScript and C# have.
               | 
               | YMMV, but personally the unique index behaviour was the
               | main thing I found weird about NULLs. So now that I can
               | turn that off I'm pretty happy with them.
        
               | masklinn wrote:
               | > I'd argue the closest thing in most languages is the
               | null propagation operator (usually using a ? symbol) that
               | at least JavaScript and C# have.
               | 
               | In what sense? null-coalescing operators don't change how
               | nulls behave or relate to one another, they only provide
               | convenience operators for conditioning operation upon
               | them, like sql's COALESCE.
        
               | JUNGLEISMASSIVE wrote:
        
               | ryanbrunner wrote:
               | With keys it very often means "definitively not present"
               | in practice, and there's no other value that you can use
               | to represent that if you want referential integrity.
        
               | roenxi wrote:
               | SQL uses [0] 3 valued logic, and most people seem to
               | reason using 2-valued logic. So most people fund the
               | behaviour of NULL, the 3rd logical value, to be weird.
               | 
               | > It's perfectly valid.
               | 
               | :o You'll get jumped on, saying this to a forum of
               | software people. Lots of things that are valid are also
               | weird.
               | 
               | [0] https://www.postgresql.org/docs/14/functions-
               | logical.html
        
         | p49k wrote:
         | Be careful with this in Postgres: using a zero, blank space,
         | empty array or similar as a default value instead of NULL can
         | lead to enormous bloat in terms of disk space. There are
         | distinct advantages of NULL that have to be considered.
        
           | SnowHill9902 wrote:
           | He most probably means creating a data structure that does
           | not allow NULLs (usually increasing your normal form works)
           | not replacing them by a naive synonym.
        
           | idoubtit wrote:
           | Your statement is wrong in the general case. It probably
           | depends on the DB engine and several parameters, but, at
           | least with MariaDB/MySQL, allowing NULL _increases_ the disk
           | usage. See https://stackoverflow.com/questions/13207838/disk-
           | space-impl...
        
             | masklinn wrote:
             | Their statement is not "in the general case" though, it's
             | specifically for postgres.
             | 
             | Postgres uses a fixed-size null bitmap and variable-size
             | rows, so a NULL value takes one bit in the bitmap (and
             | additional nullable columns may require a wider bitmap),
             | but they are skipped in the row itself.
        
             | fabian2k wrote:
             | Postgres uses 1 bit per NULL value, though the reality is
             | more complex as everything is padded for alignment. So
             | fewer than 8 nullable columns are free, and above that you
             | pay probably 8 bytes at once (I'm not totally sure on the
             | exact numbers here).
             | 
             | So for Postgres it is generally true that storing NULLs is
             | very cheap in most cases.
        
         | Pxtl wrote:
         | A person with alopecia's hair color isn't "unknown" it's non-
         | existent. They do not have hair.
         | 
         | Would you suggest that a database of appearance information
         | about a person should have a separate subtable for "hair" to
         | properly model this feature?
         | 
         | Either way, in the end, for displaying 99% of the time you're
         | going to be using a VIEW that LEFT JOINs all these cute
         | normalized tables back together again, and then you're going to
         | want to filter those views dynamically on a nice grid that the
         | user is viewing, and the fact that X != X is going to bite you
         | in the ass all over again.
         | 
         | Creating more tables is just moving the problem around, not
         | solving it.
        
         | papito wrote:
         | Also, do not order a vanity plate that says "NULL" - you are
         | likely to have a very bad time.
         | 
         | https://www.wired.com/story/null-license-plate-landed-one-ha...
        
         | RedShift1 wrote:
         | What do you use instead of null then? For example let's take a
         | purchase table, that keeps track when the purchase has been
         | submitted to the supplier, let's call it sent_to_supplier. What
         | do you use for sent_to_supplier in case the purchase has not
         | been sent to the supplier yet?
        
           | Avamander wrote:
           | A boolean column?
        
             | RedShift1 wrote:
             | It keeps track of _when_, so timestamp.
        
               | vosper wrote:
               | What's the null-replacing value in a timestamp field?
        
               | konha wrote:
               | If you care about 'when' you probably also care about
               | resubmissions. Or metadata like the supplier's response.
               | Or who approved the submission etc. A separate table
               | seems like a good solution.
        
           | rwmj wrote:
           | It's not a hard and fast rule, but in your example you can
           | solve this in the model without needing NULLs. A table called
           | ORDERS records the orders, SUPPLIERS lists your suppliers,
           | and a table SENT_TO_SUPPLIER links ORDERS to SUPPLIERS, and
           | you don't add a row into that table until the order has been
           | sent.
        
             | RedShift1 wrote:
             | What about this, let's say I have two electricity meters.
             | They have bunch of fields (active power, reactive power,
             | voltage, current, and all that times 3 (one such
             | measurement for each phase)), but one kind of meter can
             | measure distortions, the other can not.
             | 
             | I would make the distortion columns nullable for the
             | pratical reason that you either have to duplicate all the
             | work for querying the same properties from the meters (for
             | example a dashboard that shows voltages only). If you want
             | to support both types of meters in the same dashboard you
             | would have to do a UNION type query (and not ever forget to
             | do that!) if you store the measurements in a different
             | table.
        
               | cwillu wrote:
               | Rather than putting the two types of meters in two
               | different tables and doing an error-prone UNION, you
               | would put the distortion measurements in a different
               | table joined to the first, and the meters that don't
               | support it would simply not include entries in that
               | table.
        
               | RedShift1 wrote:
               | But that would still get you nulls in the result set,
               | which is what we're trying to avoid?
        
               | rocqua wrote:
               | It's possible to handle these kinds of fields (at most
               | one value) with separate tables and foreign keys. It has
               | advantages around 'proving correctness'. But without
               | extra tooling it has significant usability, and perhaps
               | performance disadvantages. I wonder if you can get around
               | some of this using virtual tables.
               | 
               | I come at this with little experience with SQL, and
               | having worked a bit on ampersand[1], a tool where you
               | declare the structure of your data, and some invariants,
               | and the tool will create a schema for your database, and
               | some automatic checking to ensure your invariants are
               | upheld.
               | 
               | [1] https://ampersandtarski.gitbook.io/documentation/
        
               | Cthulhu_ wrote:
               | Wouldn't a meter value that doesn't have distortions just
               | have a value of 0 (that is, not null but just zero); the
               | properties of a meter in this case belong somewhere else,
               | either in a "meter properties" table if you have many
               | meters with many properties that can change, or just
               | hardcoded.
               | 
               | Alternatively, a key / value table, e.g. `meter_id,
               | property, value`. But that isn't very optimal, works
               | better for things like a shop product with many different
               | properties per product.
        
               | RedShift1 wrote:
               | It doesn't mean there are no distortions, it just means
               | they are not measured. So I vastly prefer knowing there's
               | no data than some placeholder that "simulates" no data
        
               | synthmeat wrote:
               | Boolean capabilities of meter per every entry? Shouldn't
               | impact size that much, easy to index. As a bonus, you can
               | track failure of a sensor per each meter, and know when
               | it first failed.
        
               | jve wrote:
               | Oh yeah, solve complexity by introducing complexity
               | squared.
        
           | fabian2k wrote:
           | The submission process might deserve its own table with more
           | detail than sent/not sent. Otherwise it's a straigtforward
           | boolean, I don't see any use for NULL here.
           | 
           | Edit: Since it's a date, there are valid use cases for
           | nullable timestamps. But only if there is no additional
           | information attached to the event you're saving in the
           | timestamp. Another complication with nullable timestamps is
           | sorting by them, which you often do with timestamps. With
           | nullable ones this can get messy.
        
           | pprotas wrote:
           | You could create a new table that keeps track of the
           | logistics of the product. If the product is sent to supplier,
           | a new row appears in this table with the date. If it hasn't
           | been sent yet, no row is present.
        
             | therein wrote:
             | Handles the cases where an order had to be resent to the
             | supplier better as well.
        
               | formercoder wrote:
               | And when a supplier returns a shipment, and when new
               | things happen that you haven't thought of so a new column
               | isn't needed.
        
           | mjevans wrote:
           | More careful thought about the processes and data that are to
           | be captured.
           | 
           | In a simplistic view maybe NULL is really the correct choice;
           | but is it? Does NULL represent the desired properties?
           | 
           | Another common pattern is the use of some sort of sentinel
           | value, E.G. the largest possible value for a date field, or
           | the smallest, might be used to indicate an unknown maximum or
           | minimum that propagates.
           | 
           | A related pattern might be some sort of orders_suppliers
           | table which would have a foreign key value; that might be
           | NULL or it could use a sentinel value with a dummy supplier
           | to indicate a special condition and an arbitrary number of
           | inband subsets which can be their own distinct matches.
        
             | stdbrouw wrote:
             | These all seem like fine examples of solutions that are
             | worse than the problem they were meant to solve. A separate
             | table will be harder and slower to query and will enable
             | M2M relationships even when you don't want those. Sentinel
             | values must be dealt with _at every query_ to ensure they
             | are always filtered out.
        
               | mjevans wrote:
               | Thank you for pointing that out.
               | 
               | A better relationship would be if the order_suppliers
               | table was the one with a relation to this table and there
               | wasn't even a column.
        
         | fabian2k wrote:
         | NOT NULL really should be the default. There are of course
         | valid reasons for allowing NULLs, but NOT NULL is much easier
         | to handle for most cases. I don't think nullable columns
         | indicate a problem with the data, but nullable columns that
         | don't have a good justifications are certainly trouble.
        
           | masklinn wrote:
           | The fun bit is this mistake gets replicated in every
           | definition langage, sometimes made ever worse (e.g. openapi
           | where fields can be omitted by default, and you can
           | additionally make them nullable).
        
             | JUNGLEISMASSIVE wrote:
        
             | KptMarchewa wrote:
             | Different null types are the worst :(
             | 
             | One of the worst offenders was one API I was working with
             | Optional<Something> could be null.
        
               | CipherThrowaway wrote:
               | What is your solution for representing the difference
               | between
               | 
               | {"foo": null} and {}
               | 
               | Because they are different, right?
        
               | ekidd wrote:
               | In Rust, I have occasionally seen things like:
               | Option<Option<String>>
               | 
               | Here, "None" means "no value specified", and "Some(None)"
               | means "a value is specified, and that value is null." And
               | "Some(Some(my_string))" means the value is "my_string".
               | 
               | There are other ways to represent this in Rust, some of
               | which might be clearer. This representation seems to be
               | used most often if you have a type "MyType", and you want
               | to automatically derive a type "MyTypePatch" using a
               | macro.
        
               | CipherThrowaway wrote:
               | That's also how I've seen it done. But that's not so
               | different from the nullable optional approach. It's still
               | effectively two flavors of null.
               | 
               | What I'm wondering is how people who have an issue with
               | different representations of absence would approach (or
               | avoid?) situations requiring them.
        
               | setr wrote:
               | Larger enums to represent and discriminate. Null is just
               | syntax sugar for the one-kind of absence case, and
               | multiple-null-likes eliminates all benefits of that
               | syntax sugar, so you might as well revert to a real
               | discriminator.
               | 
               | Ideally you can force the use of the discriminator... but
               | that depends on your type system
        
               | CipherThrowaway wrote:
               | The problem with flattening the two absence cases into an
               | enum is composability and generality because it forces
               | the "outer null" case to intrude into concrete types. Say
               | you have a key value mapping Map<String, T> and you
               | represent changes to this map as Map<String, Option<T>>.
               | One day T itself is Option<Int> so you end up Map<String,
               | Option<Option<Int>>. If you want to use e.g. Option2 for
               | that latter case you lose generality.
               | 
               | Where the "double absence" issue comes up in practice,
               | it's usually in a context where it does make sense to
               | represent and handle the first type of absence separately
               | from the second type.
        
               | tinco wrote:
               | Not parent, and not necessarily agree, but one should
               | throw a parser/typecheck error and the other should not.
        
               | CipherThrowaway wrote:
               | In cases where the field is required I would agree. But
               | in the context of an update request, {"foo":null} and {}
               | might both be valid with different semantics. "Update
               | this field to null" vs "don't update this field"
        
               | dbrgn wrote:
               | A similar issue is {}, {"mylist": null} and {"mylist":
               | []}. Three ways to represent a missing or empty list.
        
           | AmericanChopper wrote:
           | I don't think default check constraints should be a part of
           | any serious RDBMS. Developers are free to default to any
           | behaviour they prefer.
        
         | GuB-42 wrote:
         | NULL is totally fine, and happen naturally if you make an outer
         | join. But if you are making a UNIQUE constraint one a nullable
         | column, then you _may_ have a problem.
         | 
         | The idea with NULL is that it is not a value, it is the absence
         | of value, like a field you didn't fill in a form. For example,
         | if you ask two people their personal information, and neither
         | specified their email address (email=NULL), you can't assume
         | they have the same email address. And if you put a UNIQUE
         | constraint on that email field, you probably don't mean that
         | only one person is allowed to leave it blank: either you make
         | it mandatory (NOT NULL), or you let everyone leave it blank.
         | 
         | The reason nullable and unique are rarely seen together is that
         | unique is typically for keys, and generally, you don't want
         | rows that have no key value. Also, putting a unique constraint
         | on something that is not a key may not be the best idea. For
         | example, if you don't intend to use email to uniquely identify
         | people, what's the problem with two people having the same
         | email?
        
         | knorker wrote:
         | You can't get away from having to reason about NULL in SQL even
         | if you remove all NULLs from the database.
         | 
         | You can still get NULLs at query time, and they'll have
         | semantically valid meanings in views and subqueries.
        
         | usrbinbash wrote:
         | The problem is: NULL is a valid datatype in many situations.
         | There are non-boolean fields that can just be empty; My goto
         | example:                   -- Table definition for employee
         | name, surname, date_entry , date_exit
         | 
         | Everyone has names, and if hes an employee, he probably has an
         | entry date...but until he leaves, what's the exit date?
         | 
         | Other than soothsaying, my choices here are: NULL, some magic
         | value, or an additional field telling me whether the
         | application should evaluate the field.
         | 
         | The latter just makes the code more complicated.
         | 
         | Magic values may not be portable, and lead to a whole range of
         | other problems if applications forget to check for them; I
         | count myself lucky if the result is something as harmless as an
         | auto-email congratulating an employee to his 7000s-something
         | birthday.
         | 
         | That leaves me with NULL. Yes, it causes problems. Many
         | problems. But at least it causes _predictable_ problems like
         | apps crashing when some python code tries to shove a NULL into
         | datetime.datetime.
        
         | evv wrote:
         | I have struggled to implement a tree structure in PG with
         | nullable unique values.
         | 
         | Consider a "docs" table where each doc has a unique name, under
         | a given parent doc. A null parent would be a top-level doc, and
         | top-level docs just have a unique name. This didn't work
         | before, and would hopefully be addressed by PG15.
         | 
         | I'm not sure if null parents really represent a "problem with
         | my data", or if the tree structure was too exotic for PG to
         | support properly.
         | 
         | How I got around it: hardcode a certain root doc ID, then the
         | parent column can be NOT NULL. But this felt janky because the
         | app has to ensure the root node exists before it can interact
         | with top-level docs. Plus there were edge cases introduced with
         | the root doc.
        
           | rocqua wrote:
           | Thanks for this example! As I was reading the post, I was
           | thinking "cool, and feels like a thing people would expect,
           | but does it have a real world usecase?".
           | 
           | This feels like an actual real-world thing people might want
           | to do where indeed you'd want to have a single NULL value.
        
           | barrkel wrote:
           | When I store tree-structured data in a relational database I
           | generally add a 'path' column which is a denormalized string
           | containing the names of all the parents with a path separator
           | between.
           | 
           | The biggest reason is it makes finding all descendants very
           | fast with a prefix search (foo/bar/%) on the path column when
           | it's indexed. It's not unusual to want to find or update all
           | descendants because descendants are usually related. If you
           | don't have a path column, then you need to write a recursive
           | CTE, which is Slow, or recurse in your application, which is
           | normally even slower. The reason they're slow is they require
           | a number of seeks which is exponential in the depth of the
           | tree.
           | 
           | It also makes lookup of a node from the path fast, and
           | producing a qualified path to a node fast, but these costs
           | are linear in path length.
           | 
           | Anyway, this path column is also a good place to put your
           | unique constraint.
           | 
           | If you don't want to restrict names from containing the path
           | separator, you can escape application-side. For example, if
           | using '/' as a path separator, consider '::' to escape ':'
           | and ':s' to escape '/' - don't use your path separator in the
           | escape or it'll muck up prefix searches.
        
             | SnowHill9902 wrote:
             | Doesn't feel natural in Postgres. I'd check an auxiliary
             | table nodes_nodes (integer, integer) having an index of all
             | ancestor-descendants.
        
               | evv wrote:
               | Agree its a bit unnatural, but it seems workable. My
               | problem with the auxiliary table is, I wouldn't know how
               | to implement the unique constraint.
        
               | SnowHill9902 wrote:
               | In your current implementation you either have no unique
               | constraint or it is a more particular case of
               | UNIQUE(ancestor, descendant).
        
           | radiospiel wrote:
           | You might want to look at a unique partial index: "CREATE
           | UNIQUE INDEX foo ON pages(name, parent_id) WHERE parent_id IS
           | NOT NULL"
        
           | pdenton wrote:
           | The way I'd do this, is by separating concerns into separate
           | tables. If you have a table with (id, name) and a table with
           | (id, parent_id), any doc with a parent will have a
           | corresponding record in the second table.
        
             | evv wrote:
             | Interesting! But where can you implement the UNIQUE
             | constraint for doc names under a given parent? I guess your
             | application code would need to handle that
        
         | tgv wrote:
         | That's just plain wrong. You need some value to populate
         | optional fields, and NULL is a pretty decent choice to indicate
         | that a string, date or number has not been filled in. If you
         | try to fix that by having special string or numeric constants,
         | you're just making life harder, especially when you export the
         | data. The poor souls that import it will think "hey, this
         | timestamp says 1-1-1970, so this probably happened at Jan. 1,
         | 1970."
        
       | zeroimpl wrote:
       | I think this is a great change. I wish NULLS NOT DISTINCT was the
       | default already - if I wanted the constraint to ignore null
       | values I'd have used a partial index to only index non-null
       | values. Doing the opposite is much more difficult.
        
       | petereisentraut wrote:
       | I am the author of this feature. The background here is that the
       | SQL standard was ambiguous about which of the two ways an
       | implementation should behave. So in the upcoming SQL:202x, this
       | was addressed by making the behavior implementation-defined and
       | adding this NULLS [NOT] DISTINCT option to pick the other
       | behavior.
       | 
       | Personally, I think that the existing PostgreSQL behavior (NULLS
       | DISTINCT) is the "right" one, and the other option was mainly
       | intended for compatibility with other SQL implementations. But
       | I'm glad that people are also finding other uses for it.
        
         | zeugmasyllepsis wrote:
         | Playing around in a docker image of the beta build, it looks
         | like this allows you to add a `unique nulls not distinct`
         | constraint to a composite set of fields, but still does _not_
         | allow you to specify those same fields a primary key. For
         | example                   -- This works         alter table
         | announcement_destinations             add constraint
         | announcement_destinations_pk unique nulls not distinct (sender,
         | receiving_group, optional_filter);
         | 
         | but                   -- Still does not work         alter
         | table announcement_destinations             add constraint
         | announcement_destinations_pk2 primary key (sender,
         | receiving_group, optional_filter);
         | 
         | fails with a message like "ERROR: column "optional_filter" of
         | relation "announcement_destinations" contains null values". Is
         | there a motivation for this distinction?
        
         | [deleted]
        
         | zigzag312 wrote:
         | > Personally, I think that the existing PostgreSQL behavior
         | (NULLS DISTINCT) is the "right" one, and the other option was
         | mainly intended for compatibility with other SQL
         | implementations.
         | 
         | Care to explain why you think NULLS DISTINCT is the "right"
         | default behavior? What problems does it solve to warrant
         | additional complexity by default?
        
           | Pxtl wrote:
           | Once you've accepted that X=X doesn't return TRUE when
           | X=NULL, NULLS DISTINCT is consistent with that. ANSI NULL
           | three value boolean algebra is insane, but it's insane in a
           | pretty consistent way.
        
           | woevdbz wrote:
           | An example where NULLS DISTINCT might make sense in a
           | constraint is when a table is used to store an analytical
           | cube roll-up (eg output of GROUP BY ROLLUP), where NULL in a
           | dimension column has special meaning (it indicates
           | subtotals). Having multiple rows with the same NULL key
           | could, in some of those cases, be an error leading to double-
           | counting.
        
           | petereisentraut wrote:
           | > Care to explain why you think NULLS DISTINCT is the "right"
           | default behavior? What problems does it solve to warrant
           | additional complexity by default?
           | 
           | It's the most consistent with the equality behavior of null
           | values elsewhere.
        
             | jeff-davis wrote:
             | I generally agree that, in most cases, you want the NULLS
             | DISTINCT behavior. But thank you for providing such a
             | developer-friendly feature that allows flexibility here!
             | 
             | NULL in SQL is not terribly consistent overall. Sometimes
             | NULL is treated like "unknown" and sometimes more like
             | "n/a". And the interaction with non-scalar types (like
             | records) is pretty strange. Also, it's common for ORMs to
             | map the app language's NULL (or None/Nil/whatever) to SQL
             | NULL, which adds its own nuance. So I can see this being a
             | useful feature.
        
           | dx034 wrote:
           | I'm not the person you're responding to, but also think NULLS
           | DISTINCT makes sense in many cases. NULLs often represent
           | missing data. Imagine storing a customer's address and the
           | street name is NULL. If several customers have a street=null,
           | it doesn't mean that they have the same street. So from a
           | data perspective, it makes sense to treat these unknown
           | values as distinct.
           | 
           | For filtering and aggregation I still welcome the change, as
           | it makes sense in many cases to treat them as not distinct.
        
             | zigzag312 wrote:
             | Thanks. It seems that the issue arises from value equality
             | vs optional value state equality.
             | 
             | To me, a more natural way to treat NULLs is to think of
             | NULL not as a value, but as a state. Several customers with
             | a street=null all have street property in an equal state.
             | However, an equal state doesn't mean the value is also
             | equal, as there is no value in that state. Option type in
             | functional languages models this perfectly:
             | 'a option =          |None       // no value here
             | |Some of 'a // value of type 'a
             | 
             | So, when checking, if customers have the same street we
             | need to check the value. So, comparision is only valid if
             | street has value defined. Adding null check in addition to
             | equality check to ensure that the actual values are equal
             | feels most natural to me (as this is how it's done in most
             | languages).
             | 
             | Unique constraints should not be a problem. Unique
             | constraints are about values. So if state is NULL there is
             | no value and so constraint does not treat NULL states as
             | equal values. Adding a keyword to change constraint
             | behavior when needed, would be best IMO, as it would need
             | to be rarely used.
        
               | masklinn wrote:
               | > So, when checking, if customers have the same street we
               | need to check the value. So, comparision is only valid if
               | street has value defined. Adding null check in addition
               | to equality check to ensure that the actual values are
               | equal feels most natural to me (as this is how it's done
               | in most languages).
               | 
               | Is it tho?
               | 
               | Languages with ubiquitous nullability will accept both
               | being null or both being non-null and the same value.
               | Languages with option types do the same (as long as the
               | value is equatable obviously), this includes OCaml:
               | https://v2.ocaml.org/api/Option.html#preds.
               | 
               | The only languages I can think of which seriously diverge
               | would be C and C++ where this is UB.
               | 
               | Maybe if you way stretch it Java because it doesn't have
               | operator overloading, but even there there's
               | Object.equals which is null-safe... and returns `true` if
               | both parameters are `null`.
        
               | zigzag312 wrote:
               | I completely agree. I tried to explain basically the same
               | thing as you, but it seems there is some ambiguity in my
               | example I didn't intend.
               | 
               | Comparing two nullable/option values when both are in a
               | "null state" will return true in most languages. Which,
               | to me, is the most logical result as they have the same
               | state. So, if you want to check if two people actually
               | live on the same street, it's not enough to check
               | p1.street == p2.street, but you need to also check that
               | at least one of the streets is not null. Two nulls would
               | return true, but if equality returns true and we know
               | that one of the streets isn't null, we know we are
               | comparing actual street values (not null states). Sadly,
               | in SQL, NULL=NULL result is not true, but NULL.
        
           | knorker wrote:
           | Not parent commenter, but to me it seems "obviously correct".
           | 
           | As explained in the article NULL means "unknown". Let's say I
           | have two people, and a "tax identifier" column. Let's say two
           | people can't have the same tax id. Obviously "don't have one"
           | or "unknown" fit very well in NULL, and two people can both
           | be missing a tax id, even if those who have it need it to be
           | unique.
        
             | chris_wot wrote:
             | Except that's the ambiguity of NULL in the standard. It
             | doesn't just mean "unknown". It can also mean "no value".
             | And you can clearly compare two elements with no value.
             | 
             | Which is why nulls are so incredibly confusing.
        
               | knorker wrote:
               | Yeah that's true. All the database can say is "I don't
               | have a value here". Not "there exists no value for this
               | column".
               | 
               | But this doesn't mean that two NULLs are the same. Two
               | people with NULL tax IDs are not the same (one turns out
               | to be unknown but existing, the other is a toddler
               | without a tax ID). Ten people with NULL address don't
               | live in the same house. Half declined to give address,
               | the other half are homeless. Either way we can't send
               | mail to them.
               | 
               | If you use the database to have a UNIQUE constraint of
               | "only one purchase per household" then it makes no sense
               | to allow NULL addresses, but only allow the first NULL
               | address purchaser to buy the thing.
               | 
               | Or "sorry, in this hotel we only allow one guest at a
               | time without a car, and someone else already doesn't have
               | a car".
               | 
               | Does that guest without a car actually have a car? I
               | don't think that's something that the database can solve.
               | 
               | Should databases have two separate values for unknown or
               | no value? That sounds like a world of hurt, with two
               | types of null.
        
               | jl6 wrote:
               | > Should databases have two separate values for unknown
               | or no value? That sounds like a world of hurt, with two
               | types of null.
               | 
               | It's also a non-problem in practice, because if an
               | application needs to distinguish between multiple types
               | of nulls, it can very easily just use an extra column
               | holding the information needed to disambiguate.
        
               | jstanley wrote:
               | Similarly, the need for non-boolean columns is a non-
               | problem in practice, because an application can just make
               | an extra column for every bit it needs to store. If it
               | needs strings or arbitrary-width numbers it can just join
               | on other tables that store those bits.
               | 
               | (Existence of a workaround doesn't turn a problem into a
               | non-problem)
        
               | jl6 wrote:
               | I stand by it being a _non-problem in practice_ because
               | the workaround is trivial, and the supposed solution adds
               | complexity everywhere.
               | 
               | There isn't even a universally agreed methodology for how
               | to handle nulls in the relational model in theory.
               | 
               | It's a semantic problem that can only be solved with
               | reference to the requirements of a particular use case,
               | and when you do that using explicit extra columns, the
               | problem evaporates.
        
               | fendy3002 wrote:
               | Which happen in world of javascript. It has null and
               | undefined. Null variable is a variable / property that's
               | defined and set with null, while undefined is a variable
               | / property that's not defined yet (or maybe not
               | initialized, I forgot).
               | 
               | I hope at least SQL will give another comparison
               | operation, such as the current equal sign (=) means
               | a.prop is not null && b.prop is not null && a.prop equal
               | b.prop (for not defined yet).
               | 
               | Let's say another sign like a.prop *= b.prop, meaning
               | a.prop is null && b.prop is null || a.prop = b.prop (for
               | our intention is both value are empty).
        
               | pdw wrote:
               | Such a comparison operator exists: IS (NOT) DISTINCT
               | FROM.
        
               | jhgb wrote:
               | > It doesn't just mean "unknown". It can also mean "no
               | value".
               | 
               | Isn't the canonical representation of _known_ -no-value
               | an absence of a tuple? Like as opposed to saying "There
               | exists an employee X who works in department NULL", you
               | simply don't make any claim about employee X working in a
               | department? After all, when enumerating the members of a
               | set, you're also omitting the enumeration of _non_
               | -members of a set, and the law of excluded middle
               | applies.
        
               | Tostino wrote:
               | Nulls will reappear in your queries due to joins even if
               | none exists in your data model.
        
               | jhgb wrote:
               | Ah yes, they're the curse of SQL. Still doesn't seem to
               | be a reason to not squash them out as much as feasible
               | within the constraints of SQL.
        
               | [deleted]
        
           | richdougherty wrote:
           | It's probably a good default because it's more consistent
           | with how NULL equality is handled in SQL generally. From the
           | article:
           | 
           | > In Postgres 14 and prior, unique constraints treated NULL
           | values as not equal to other NULL values.
           | 
           | ...
           | 
           | > This is consistent with the SQL Standard handling of NULL
           | in general, where NULL is unknown. It is impossible to
           | determine if one unknown is equal to another unknown. Because
           | NULL values are of unknown equality to one another, they do
           | not violate UNIQUE constraints.
        
             | zeroimpl wrote:
             | Your last statement would be better written as: Because
             | NULL values are of unknown equality, they may or may not
             | violate UNIQUE constraints.
             | 
             | The safest implementation would assume that they do violate
             | the constraint, rather than the current behavior assuming
             | they don't.
             | 
             | In practice I don't think I've ever added a unique index on
             | a nullable column where null might imply unknown. I have
             | used it in cases where null meant "none".
        
           | dtech wrote:
           | Not OP, but this constraint means "the data must either be
           | empty or unique", which is an extremely common constraint. In
           | contrast, I've never encountered "only 1 entry is allowed not
           | to have the data, all others must be unique".
        
             | enkrs wrote:
             | "only 1 entry is allowed not to have the data, all others
             | must be unique" makes sense in larger schemas as the
             | database grows. It replaces is_default_address,
             | is_primary_child etc. fields and constraints on relational
             | tables. In the perfect world all the data would be
             | normalized not to have such columns and cases, but in real
             | life it just grows that way. So for some cases NULLS NOT
             | DISTINCT will be a welcome addition.
        
           | chris_wot wrote:
           | Well, if NULL is most often meant to signify an unknown
           | value. You can't compare one unknown value with another and
           | say they are the same.
           | 
           | The standard is, I believe, ambiguous about NULL because it
           | can also mean that it is an absence of data. In other words,
           | it is "this is not yet determined" (aka a CAR table has a
           | COLOR table, but the paint is applied as the final operation
           | so NULL could be used as the car is not at this stage yet and
           | the color will be chosen later). In this case, you can
           | compare NULLs against other NULLs, because you can have two
           | cars that are in an indeterminant state. In that case, it is
           | useful to compare to NULL values as it means the same thing.
        
           | undecisive wrote:
           | I think it's worth pointing out what the opposite means - to
           | say that "this field might have a value, or it might be null
           | - but only one tuple/row can set this field to be null"
           | implies that you have given the null option some kind of
           | real-world value.
           | 
           | That is to say, you are using null to encode some kind of
           | meaning, when really this is not what null is supposed to be
           | used for.
           | 
           | That's not to say I think we should be morally disapproving
           | of people who do that - I use things for their unintended
           | purpose all the time, and it bugs me when people get on a
           | high horse. Use what makes sense to you - and I love this
           | change for exactly that reason.
           | 
           | But the general theoretical approach is that if you want to
           | care about the value in a field, you need to give it a value.
           | Null is for the valueless, and if that isn't an allowed
           | state, you should simply set the field as not-nullable.
           | Theory is fine in theory. For a practical database that
           | exists in the real world, this option is a good addition.
        
             | nightski wrote:
             | I think a lot of the trouble stems from the fact that
             | databases do not have sum types and there is no way to
             | encode a "None" type without hacks. NULL is the only
             | reasonable option in a lot of cases.
        
               | wongarsu wrote:
               | Yeah, in hindsight the world would be a slightly better
               | place if SQL didn't include NULL, but instead "UNKNOWN"
               | and "NONE", with "NONE = NONE" being true, and "UNKNOWN =
               | UNKNOWN" being unknown.
        
               | kzrdude wrote:
               | When would NONE show up and when would UNKNOWN show up?
        
               | wtetzner wrote:
               | NONE would mean "there is no value", and UNKNOWN "there
               | might be a value, but I don't know what it is".
        
               | stouset wrote:
               | Or have an Option (or Maybe) type. This is how modern
               | programming languages solve the nil issue, I don't see a
               | reason why a database couldn't take the same approach.
        
             | eins1234 wrote:
             | Fauna has an interesting approach to null that I've grown
             | to like:
             | 
             | Null is a real value that can be compared intuitively like
             | any other value (`Equals(null, null)` returns true). [1]
             | 
             | However, in indexes, any term that evaluates to null is
             | simply not stored in the index. So we can create a unique
             | index with multiple null values because they simply won't
             | exist in the index, so won't violate the unique constraint.
             | 
             | If we do care about null values in a particular index, we
             | can handle that by mapping the null value into another
             | value that will get stored in indexes using a "binding",
             | and then use that index to query for or constrain
             | uniqueness by that mapped value [2].
             | 
             | This is not the most convenient thing in the world, but at
             | the end of the day, it feels like an edge case, so I'm
             | happy with having to jump through some hoops to handle it
             | in exchange for making the much more common cases (null
             | comparisons and unique indexes with nullable values) more
             | intuitive and less error prone.
             | 
             | [1] https://fauna.com/blog/understanding-nothing-or-null-
             | in-faun...
             | 
             | [2] https://docs.fauna.com/fauna/current/learn/cookbook/fql
             | /sear...
        
         | thijsvandien wrote:
         | > Personally, I think that the existing PostgreSQL behavior
         | (NULLS DISTINCT) is the "right" one
         | 
         | I find this to be at odds with IS DISTINCT FROM, which is false
         | for two NULLs.
        
         | rst wrote:
         | So, we have a "standard" which says that both behaviors should
         | be supported, and there should be two different ways to express
         | them, but explicitly _declines_ to say which does what?
         | 
         | I'm genuinely not sure what purpose this particular standard
         | serves, but it looks like it's not allowing people to write
         | portable code...
        
           | anamexis wrote:
           | I can't imagine the standard does not specify that NULLS
           | DISTINCT should treat nulls as distinct and NULLS NOT
           | DISTINCT should treat nulls as not distinct.
        
             | rst wrote:
             | Ah. So there are _three_ ways to write it, and explicit
             | ways to call for either behavior. That 's sane -- just
             | wasn't what I got from the original writeup.
        
           | mjw1007 wrote:
           | No, it's not as bad as that.
           | 
           | The thing that's implementation defined is what happens if
           | you don't explictly select either option.
        
         | woevdbz wrote:
         | > Personally, I think that the existing PostgreSQL behavior
         | (NULLS DISTINCT) is the "right" one
         | 
         | SQL is old enough and this debate so unsettled still that I
         | think it should be clear there isn't a categorically right
         | behavior here anymore than there is a clear winner between tabs
         | and spaces.
         | 
         | As an example of the irreconciliable weirdness of NULL,
         | consider that "NULL = NULL" is false, and so is "NULL != NULL",
         | while rows with NULL still group together in GROUP BY.
         | 
         | I appreciate you giving folks the option.
        
           | bobbylarrybobby wrote:
           | Er, sorry, but I thought that the answer to tabs vs spaces
           | was use tabs for semantic/logical indent, use spaces for
           | physical/alignment indent.
        
             | jon_richards wrote:
             | Yes, mixing invisible characters will solve all our
             | problems.
        
           | anamexis wrote:
           | > As an example of the irreconciliable weirdness of NULL,
           | consider that "NULL = NULL" is false, and so is "NULL != NULL
           | 
           | This isn't quite true - these comparisons don't evaluate to
           | false, they evaluate to NULL.
           | 
           | I intuitively think of NULL as "unable to compute," a
           | generalized NaN.
        
             | Pxtl wrote:
             | > This isn't quite true - these comparisons don't evaluate
             | to false, they evaluate to NULL.
             | 
             | Which large parts of SQL treat as equivalent to false.
             | WITH vars AS (             SELECT CAST(null AS INT) as X
             | )         SELECT 'yes this is a tautology' as "is this is a
             | tautology"         FROM vars         WHERE X = X
             | 
             | The above query returns nothing, not an error because the
             | value of X is unknowable. In a sane language, X=X and so it
             | would yield "yes this is a tautology". SQL standard wants
             | it both ways - it wants to say "boolean null isn't false"
             | except when for its own convenience it decides that yes,
             | null is false.
             | 
             | The truth, imho, is that SQL has an archaic type system. In
             | this day and age it should be offering algebraic types
             | where we can implement our own "empty" values and the like
             | that may not use three-value logic as appropriate, so that
             | I can say that for example End Date is _blank_ instead of
             | either unknown or Date.MaxValue.
        
               | anamexis wrote:
               | > SQL standard wants it both ways - it wants to say
               | "boolean null isn't false" except when for its own
               | convenience it decides that yes, null is false.
               | 
               | I don't see it that way. Null cast to boolean is _always_
               | false. Null equality is undefined. There is no
               | contradiction there.
        
               | saltcured wrote:
               | I think that this topic gets tripped up with people
               | wanting to assume boolean means two possible states...
               | NULL adds an extra state to every type much like the
               | "bottom" type/value in some functional programming
               | languages or type systems.
               | 
               | It's not the `=` operator that is special or significant
               | here. It is the `WHERE` clause semantics, which are
               | defined as returning rows when the boolean expression is
               | true. This treats both false and null cases as the other
               | possible outcome, when rows are not returned. Not
               | understanding this distinction is also where people write
               | buggy queries using WHERE NOT ..., because like equality
               | the NOT operator will return NULL for a NULL input (so
               | NULL is not like false and does not negate to true).
               | 
               | SELECT 'hidden by null result' WHERE CAST(NULL AS
               | boolean);
               | 
               | SELECT 'also hidden by null result' WHERE NOT CAST(NULL
               | AS boolean);
               | 
               | SELECT 'shown by true result' WHERE CAST(NULL AS boolean)
               | IS NULL;
               | 
               | SELECT 'hidden by false result' WHERE NOT (CAST(NULL AS
               | boolean) IS NULL);
               | 
               | The existing GROUP BY behavior is different than WHERE
               | because it does not have a special rule for handling true
               | values. It just returns a group for each distinct value.
               | 
               | The prior (new default) UNIQUE constraint behavior made
               | sense when you think of NULL as this bottom value and
               | meaning "unknown", while also thinking that a uniqueness
               | constraint is only applied to complete keys. The check is
               | effectively deferred, and the partial or incomplete tuple
               | might become valid and unique once its missing part(s)
               | are filled in with real values. The new optional behavior
               | makes more sense if you think that the uniqueness
               | constraint enforces distinct records even with partial
               | keys, just as the GROUP BY clause will determine distinct
               | groups.
               | 
               | There is also a similar issue with FOREIGN KEY
               | constraints. The default MATCH SIMPLE behavior is
               | consistent with the default UNIQUE constraint behavior.
               | An incomplete foreign key is essentially ignored and not
               | subject to integrity checks. The MATCH FULL behavior is
               | almost a style checker to say that a compound foreign key
               | should be all null or all non-null, while still ignoring
               | incomplete foreign keys. As far as I know, the MATCH
               | PARTIAL case is not implemented, but I think it might
               | complement this new UNIQUE NULLS NOT DISTINCT behavior?
               | 
               | One can imagine a further branching point in behaviors
               | for foreign keys and I'm not sure how MATCH PARTIAL will
               | be defined. One mode might be easily implementable with
               | the new index method, if we want partial foreign keys to
               | be able to reference partial keys in the referred table,
               | i.e. the partial key with NULLs is considered an actual
               | row key. But we might also imagine another partial
               | matching mode where the the non-null components of a
               | compound foreign key have to all exist in at least one
               | key in the referred table, while ignoring the null fields
               | as incompletely formed references. This would be an
               | improvement on MATCH SIMPLE which allows the non-null
               | columns in a foreign key to hold values which could not
               | possibly match the referring table no matter which values
               | are filled in for the remaining null fields of the
               | foreign key. However, this enforcement would not be
               | easily implemented without a powerset of indexes for all
               | partial combinations allowed in the referencing tables!
        
             | paulryanrogers wrote:
             | This is a good way to remember the SQL behavior. Though I
             | do hope it'll change in the future. So many other languages
             | interpret null as a comparable value, and it will be less
             | surprising to folks coming from those.
        
             | magicalhippo wrote:
             | > these comparisons don't evaluate to false, they evaluate
             | to NULL
             | 
             | In the databases I've used, such as MSSQL[1], they evaluate
             | to UNKNOWN.
             | 
             | [1]: https://docs.microsoft.com/en-us/sql/t-sql/language-
             | elements...
        
               | bfgoodrich wrote:
               | The document you link uses unknown as a synonym for null.
               | If you inserted the result of such a comparison into a
               | table, the value inserted would be NULL.
        
               | magicalhippo wrote:
               | I'm self-taught, could you show how to insert the result
               | of such a comparison into a table in mssql? I can only
               | think of using CASE or similar, which would mean no
               | direct conversion.
        
               | orf wrote:
               | insert into table (c) values (null != null)
        
               | ratsmack wrote:
               | It's good to see that everyone agrees on the definition
               | of NULL.
        
               | DaiPlusPlus wrote:
               | Yes: a billion-dollar mistake.
        
               | erichocean wrote:
               | And with RDBMSs, a multi-trillion dollar success!
        
               | bazoom42 wrote:
               | This is s misunderstandig of Hoares "billion dollar
               | mistake". The mistake was not the existence of nulls, the
               | mistake was type systems where every reference could be
               | null.
               | 
               | Sql does not have this problem.
        
               | [deleted]
        
           | qez wrote:
           | > anymore than there is a clear winner between tabs and
           | spaces.
           | 
           | But there is a clear winner here. Tabs won, almost all code
           | files use spaces. Of course everyone uses the tab key on
           | their keyboard, and the IDE makes spaces for you; no one is
           | going to manually press space 4 times.
        
             | spacemanmatt wrote:
             | As a savage, I use 4 tabs now.
        
             | tingletech wrote:
             | I hit space 4 times...
        
               | jjoonathan wrote:
               | ...which is clear evidence that hitting space 4 times is
               | the wrong thing to do :P
        
             | bornfreddy wrote:
             | There is no winner, both tabs snd spaces are widely used.
             | However with proliferation of better tools the question
             | thankfully became mostly irrelevant. I don't care if it is
             | spaces or tabs, as long as it is consistent across codebase
             | (which CI enforces with autoformatters and linters) and as
             | long as my editor (which obeys project's editorconfig) uses
             | the same convention.
        
           | masklinn wrote:
           | > SQL is old enough and this debate so unsettled
           | 
           | Is the debate really unsettled?
           | 
           | MSSQL considers nulls to be equal, everyone else considers
           | them to be distinct (aside from a few DBs which just refuse
           | to create UNIQUE indexes on nullable columns), and Oracle is
           | apparently huffling glue in a corner.
           | 
           | > As an example of the irreconciliable weirdness of NULL,
           | consider that "NULL = NULL" is false, and so is "NULL !=
           | NULL", while rows with NULL still group together in GROUP BY.
           | 
           | Group by is the exception here. And arguably it is so because
           | the alternative is never useful, or close enough as to not be
           | worth it.
        
             | feanaro wrote:
             | So of the three major SQL implementations, you have one of
             | them following the results of the settled debate, one of
             | them being an exception and one of them huffling glue
             | (whatever that means). In addition, there's a language
             | exception even in the settled group, which you have to
             | handwave away.
             | 
             | Sounds pretty unsettled to me.
        
               | irrational wrote:
               | Oracle considers null to be equal to empty string.
        
               | mike_hock wrote:
               | That is truly huffing glue.
        
             | irrational wrote:
             | And oracle considers null to be equal to empty string. My
             | understanding is oracle was created before the sql standard
             | existed and they don't want to change because of backwards
             | comparability. I don't understand why they can't add a db
             | parameter to allow you to set how nulls are handled.
        
               | to11mtm wrote:
               | IDK if you ever saw the HN thread where they talked about
               | oracle and change tests from a few years back, but if I
               | had to guess, the pain just isn't worth the overall cost.
               | 
               | Overall I remember running into this but finding it
               | fairly simple to add a couple specific checks around the
               | condition where it came up in our code.
        
               | layer8 wrote:
               | They would need to change their storage format to be able
               | to distinguish between NULL and empty string, and
               | probably a lot of in-memory structures and even more code
               | logic as well. It would be hugely expensive to support
               | both.
               | 
               | Oracle introduced VARCHAR2 in the 90s(?) to mean "Oracle
               | semantics" as opposed to VARCHAR which is documented as
               | "may change to standard SQL semantics in a future
               | version". However I don't think they'll ever materialize
               | that distinction.
        
         | nicoburns wrote:
         | A slight tangent, but do you have any pointers on how one might
         | get involved in either:
         | 
         | - The SQL standardisation process (or whether this is even
         | feasible as someone who isn't involved in the development of a
         | major database engine)
         | 
         | - Postgres development
         | 
         | The feature I am particularly keen to get accepted is trailing
         | commas in SELECT column lists (potentially other places too,
         | but SELECT lists would be a very good start). And there are
         | potentially a bunch of other improvements in the syntax sugar
         | department that I might be interested in spearheading if I
         | could successfully get this accepted (e.g. being able to refer
         | to select list column aliases in GROUP BY clauses).
         | 
         | I don't have much experience with C, but I would potentially be
         | up for actually implementing the change if I could get buy in
         | from the maintainers.
        
           | gurjeet wrote:
           | Postgres project is always looking for contributors. The
           | project has a very nice, detailed, well-defined documentation
           | on how to contribute [1]. The community is very welcoming,
           | and has a process (CommitFest[2]) in place to ensure all
           | submissions get their due attention.
           | 
           | FWIW, I am starting to make an effort (e.g. [3]) towards
           | helping the newcomers get their work in shape to be
           | acceptable by the committers.
           | 
           | Note: I'm biased towards Postgres community, since I've
           | worked with them for many years. So others' opinion _may_
           | differ, but it's highly unlikely.
           | 
           | [1]: https://wiki.postgresql.org/wiki/Development_information
           | [2]: https://wiki.postgresql.org/wiki/Development_information
           | #Com... [3]: https://www.postgresql.org/message-
           | id/CABwTF4Us8WGMffNGTNY1X...
        
           | orthoxerox wrote:
           | > The feature I am particularly keen to get accepted is
           | trailing commas in SELECT column lists
           | 
           | Not "GROUP BY THE OBVIOUS LIST OF EXPRESSIONS, YOU KNOW, THE
           | ONES FROM THE SELECT CLAUSE THAT DON'T CONTAIN AGGREGATE
           | FUNCTIONS"? Aka "GROUP BY ALL" from DuckDB.
        
         | Justsignedup wrote:
         | Currently nulls are considered not distinct in postgres, no?
         | Doesn't this mean anyone upgrading will have to fix up all
         | their table definitions? Or am I wrong about this?
         | 
         | I just care about the backwards compat. To be clear, the second
         | I saw this I thought "finally!!!"
        
       | irrational wrote:
       | This is amazing. We moved from Oracle to Postgres after being on
       | Oracle for 15+ years. That was a huge move that took 2 years. One
       | of the biggest hurdles was rewriting thousands of sql queries.
       | One of the biggest things we had to change was accounting for how
       | oracle handled nulls versus how Postgres handled nulls. However,
       | we missed this nuance about how Postgres treated nulls and
       | unique. It wasn't until about 2 years into using Postgres, and
       | running into so many issues, that we finally stumbled upon this
       | nuance in the documentation. Crap. That required making a bunch
       | of more changes to account for this. At this point I'm firmly
       | convinced that null handling is the hardest part of database work
       | ;-)
        
       | jeff-davis wrote:
       | In any thread involving SQL NULL, I see a lot of not-quite-right
       | explanations of what SQL NULL is, conceptually. I challenge
       | anyone who feels like they understand NULL conceptually to
       | explain the following query:                   -- find orders
       | with a total less than $10000         select order_id, sum(price)
       | from orders o left join order_lineitems l using (order_id)
       | group by order_id having sum(price) < 10000;
       | 
       | This query is actually incorrect. Orders with no line items at
       | all are clearly less than $10000, but they will be excluded
       | because: first, the left outer join produces a NULL for the
       | price; second, the group aggregation with SUM over that NULL will
       | result in NULL; and third, the HAVING clause treats that NULL as
       | false-like and excludes the order from the result.
       | 
       | Of course, we can explain _procedurally_ what 's happening here,
       | and each individual step makes some sense. But the end result has
       | no conceptual integrity.
       | 
       | Extra challenge: explain why using COUNT instead of SUM in the
       | query does correctly return orders with fewer than 4 items:
       | -- find orders with fewer than 4 line items         select
       | order_id, count(price)           from orders o left join
       | order_lineitems l using (order_id)           group by order_id
       | having count(price) < 4;
       | 
       | PS: thank you to the author for a developer-friendly feature that
       | adds flexibility here!
        
       | hagope wrote:
       | This is cool, but wouldn't creating a constraint using a nullable
       | column be considered a poor design decision? In which scenarios
       | would this be a good idea?
        
         | cptn_badass wrote:
         | When an entry can belong to 0 or 1 related object only. Not
         | that I'd put a constraint in such scenario, but I imagine a
         | User can optionally have a Subscription, so subscription_id is
         | either nil or present, and said subscription cannot be
         | associated to any other User.
        
           | hagope wrote:
           | in that scenario why not use foreign key? any advantage to
           | using constraint?
        
             | Pxtl wrote:
             | You do both.
             | 
             | The foreign key only gaurantees that the other entity
             | exists.
             | 
             | The unique constraint ensures that only one pair of
             | entities has this relationship, preventing a one-to-many
             | binding.
             | 
             | The distinctness of NULL allows you to have multiple
             | entities with the same NULL value without violating the
             | above UNIQUE constraint.
             | 
             | The "NULL is empty" vs "NULL is unknown" is a series of
             | trade-offs of labor-saving. Imho, the wrong trade-offs were
             | made, but once the choice is made it makes sense to
             | continue and be consistent with it. I'd rather be
             | consistently wrong than inconsistently right.
        
               | hagope wrote:
               | If the entity doesn't exist, wouldn't it violate the FK
               | and therefore no need for the nullable constraint?
        
       | ahmed_ds wrote:
       | When something that fundamental changes what is the cascading
       | effect? Do other functions refer UNIQUE/NULL or do they have
       | their own versions of UNIQUE/NULL like code?
        
         | masklinn wrote:
         | There is no fundamental change, this is an opt-in on a per-
         | index basis.
         | 
         | It does not change null's identity, it only allows indexes to
         | treat nulls as identical.
        
       | ryanbrunner wrote:
       | This is great! I have multiple places in our code where I've had
       | to define multiple unique indexes (with conditionals) to get
       | around this. It's definitely confusing behaviour the first time
       | you encounter it.
       | 
       | Honestly I think SQL already breaks it's own standard of "NULL =
       | unknown". One example: If you do a LEFT OUTER JOIN and the joined
       | data is missing, you get NULL in the resultset. But in that case,
       | it's not "unknown" whether there are records that can be joined
       | to, the answer is that there are definitively no records there.
       | 
       | NULL is univerally considered by most programming languages to
       | equal "not present" vs "unknown", and in many cases by SQL in
       | practice. The ship has probably sailed on actually changing the
       | default behaviour, but optional flags like these are a good step.
        
         | knorker wrote:
         | NULL means "I don't have this data". It doesn't guarantee that
         | the data doesn't exist anywhere.
         | 
         | It's semantically ambiguous, yes.
         | 
         | But it's not ambiguous that the _database_ doesn 't have the
         | data. It's very clearly NULL.
         | 
         | If "middle name" is NULL, then the database itself can't tell
         | you if a person doesn't have a middle name, or if it's merely
         | not yet been told the middle name. You could differentiate the
         | two by saying empty string for "has no middle name"... except
         | on Oracle VARCHAR2, where NULL is equal to empty string.
        
           | Pxtl wrote:
           | > You could differentiate the two by saying empty string for
           | "has no middle name"... except on Oracle VARCHAR2, where NULL
           | is equal to empty string.
           | 
           | This is what kills me.
           | 
           | This seems like an obvious case where multiple flavors of
           | NULL would make perfect sense. For example
           | Ticket.CancellationDate isn't "unknown", it's _this ticket is
           | not cancelled_. But SQL standard doesn 't have a way to
           | express "empty value" as distinct from "unknown value", so
           | we're stuck working around the strange implications of using
           | this square peg to fill a round hole.
           | 
           | Oracle's bizarre "empty VARCHAR2s are NULL" is strange but is
           | consistent with every other datatype that offers no way to
           | say "empty" but "NULL". 0 is not the same as NULL int and is
           | not the same as empty int. Mindate is not the same as NULL
           | date and is not the same as empty date.
        
             | knorker wrote:
             | Well, for the string case empty string is... the empty
             | string.
             | 
             | Storing the exit code and stdout of a program run Oracle
             | allows the state "it's not finished yet" to have exit code
             | representation of NULL, but you cannot say the same for
             | string.
             | 
             | "Unknown" is not a perfect map to NULL. "I have no answer
             | for you" is better.
             | 
             | Maybe NULL values could use an annotation.
             | 
             | But I think that would make things worse. Like let's say
             | you have some query that joins addresses with coordinates.
             | _You_ know that semantically any address without a
             | coordinate has an  "unknown" coordinate, not that it
             | authoritatively doesn't have coordinates.
             | 
             | But not only do you need to change the data model then, but
             | the query language too, to be able to express the
             | difference where absense means one or the other.
             | 
             | Now absense is NULL, and leaves it up to the application to
             | interpret.
             | 
             | SELECT a.address, b.coord FROM a LEFT OUTER JOIN (SELECT
             | address,coord FROM coords WHERE is_test=FALSE) AS b ON
             | a.address=b.address;
             | 
             | Or maybe some coordinates are missing because their
             | locations no longer exist. The house has been demolished,
             | so it's not "unknown" where the house was, it's just that
             | it no longer has a location. And how would you join that
             | data.
             | 
             | I don't think I've encountered a problem in practice with
             | SQL in differentiating empty from unknown. It's out of
             | scope of what SQL tries to solve, and that's fine.
        
           | ryanbrunner wrote:
           | That's fair - as I said in another comment, it breaks down
           | with keys. I can't think of a great way to model "this row
           | definitively does not have whatever is referenced in another
           | table". Using anything other than NULL for that means you
           | can't use proper foreign keys.
        
         | SnowHill9902 wrote:
         | NULL means that the value is unknown, not that the data is
         | unknown. You indeed have no data but its value may exist. You
         | just don't know it yet. NULL is very powerful when you
         | understand that.
        
           | woevdbz wrote:
           | It is mostly true but not always, eg in the output of GROUP
           | BY ROLLUP, a NULL value is used to indicate rows representing
           | subtotals / partial aggregates
        
           | ryanbrunner wrote:
           | That seems like a fairly big assumption on how I'm modelling
           | the data though. Particularly in the LEFT JOIN example, I'm
           | not sure if I agree with the idea of "a row not being present
           | means that it's unknown whether it exists in the real world"
           | depending on the underlying content being modelled.
           | 
           | For a string column, you do have blanks vs. NULLs, so it's
           | fairly easy to distinguish between "John definitely doesn't
           | have a middle name" vs "It's unknown whether John has a
           | middle name", but that doesn't hold true with most other
           | datatypes.
        
             | SnowHill9902 wrote:
             | The string '' is as arbitrarily blank as 0 is a blank
             | integer.
        
               | wongarsu wrote:
               | There are two "blank" integers: 0 is the identity element
               | under addition, 1 is the identity element under
               | multiplication. Or worded differently 0 is a blank
               | addend, 1 is a blank multiplier.
               | 
               | '' would be the identity element for string
               | concatenation, or any other string manipulation method I
               | can think of, so it has a stronger claim on being blank
               | than a 0 has.
               | 
               | Unless your string is later parsed as something else. A
               | blank JSON value would look differently from a blank
               | phone number.
        
               | SnowHill9902 wrote:
               | Sure, but it's arbitrary from a data structure standpoint
               | and sloppy design.
        
               | ryanbrunner wrote:
               | SQL gives you no other options though, there's no other
               | construct that means "not present", since NULL is unknown
               | vs. not present.
        
               | SnowHill9902 wrote:
               | It depends on your column. If you have a products table
               | and a certain handmade product has no upc, it's correct
               | to use '' as its value rather than NULL because you know
               | for a fact that its non-existent. Some people respect
               | NULL too much.
        
       | tzahifadida wrote:
       | wow thanks!
        
       | pbz wrote:
       | Since we have some PG devs here: Can we please have a way to
       | reorder columns?
       | 
       | Coming from MySQL, this is one of the first missing features one
       | hits. It's like moving to a new house where you're told you can
       | never clean or move the furniture without moving to a new house.
       | 
       | It leaves the unfair impression that this is a "toy" db. "What
       | other basic features is it missing if it doesn't have this?"
       | Please don't think of it as trivial; first impression are very
       | important.
        
         | urthor wrote:
         | There's like 5 different features wants I could add like this.
         | A columnar database engine comes to mind ( _slightly_ more work
         | involved) and poaching a few Sqllite QOL features.
         | 
         | Ultimately Postgres is a community driven product, and people
         | want to work on what they want the work on.
         | 
         | I don't get angry that some kind soul hasn't volunteered their
         | Saturday for free for me.
        
           | pbz wrote:
           | Oh, absolutely; I'm just trying to raise the
           | awareness/importance of this missing feature. Most devs that
           | I know IRL that looked at PG had a very similar reaction to
           | mine.
        
             | [deleted]
        
             | mixmastamyk wrote:
             | I'm receptive to the feature. But wouldn't it require
             | locking the full table as it works? Seems like it would
             | hurt availability of the biggest databases that would
             | benefit most. Perhaps this is why it is thought somewhat
             | impractical and therefore low-priority.
        
           | DangitBobby wrote:
           | That's not exactly true. You can write your own patch for a
           | feature and there's a good chance they'll decide they don't
           | want it. It's their prerogative, but also slightly upsetting
           | when you know they're blocking features you want. Just
           | observe the bike shedding over a trailing commas patch [1].
           | 
           | 1. https://postgrespro.com/list/thread-id/1853280
        
             | TheP1000 wrote:
             | There are lots of good reasons in that thread why the patch
             | was declined.
             | 
             | The patch made things more lenient which makes things less
             | compatible (this whole thread is about ANSI SQL standards
             | and they do matter) and changes what was an error behavior
             | to now silently succeed.
             | 
             | +1 to postgres devs for curating patches.
        
         | SnowHill9902 wrote:
         | What do you gain from that other than soothing your OCD when
         | executing \d t (?)
        
           | iruoy wrote:
           | I like to have my `*_id` columns at the front, and
           | `created_at` etc. at the back. And in the middle all the
           | fields by descending importance. Just a personal habit so I
           | can quickly lookup data in a table.
        
             | pbz wrote:
             | Exactly... You gain so much speed by having everything in
             | the right order.
        
               | sophacles wrote:
               | This seems pretty voodoo to me. Do you have any examples
               | that show it?
        
               | pbz wrote:
               | I'm not sure how I could show if something is
               | easier/harder to work with. Speed in this context refers
               | to human processing speed, not database speed.
        
             | ggregoire wrote:
             | I'm totally with you on this, and would love to be able to
             | directly reorder the columns in the tables.
             | 
             | A possible workaround is to have a view for every table
             | (which I often read is a good practice anyway) and order
             | the columns as you want in the views.
        
           | pbz wrote:
           | Soothing my OCD is important, but it also helps when working
           | in a team and having a shared ordering style. For example, we
           | have certain types of columns that are always at the bottom.
           | One would look there first. With PG you're forced to scan all
           | the columns every time.
        
             | 9dev wrote:
             | This is why I order anything alphabetically, always.
             | Properties, methods, columns... anything. Saves you the
             | trouble of custom conventions.
        
               | pbz wrote:
               | The trouble is when you need to add a new column you can
               | only add it at the end, so you'd lose the alphabetical
               | ordering.
        
           | MichaelApproved wrote:
           | Based on my understanding of DB storage (which is decades old
           | and as I write the comment my explanation seems stupid but
           | it's what I was taught. Please someone correct me).
           | 
           | Image a table with the following:
           | 
           | id (int), title (varchar), created_date (date)
           | 
           | That data is stored in a similar order on disk (is this still
           | true?).
           | 
           | So, if the initial insert has a title of 10 characters, then
           | the date will get placed after those 10 characters on disk:
           | 
           | Int, 10 characters, date
           | 
           | Later, the row is updated with a longer title of 500
           | characters.
           | 
           | Because it's longer, you now have to get more space for the
           | title on another part of the disk.
           | 
           | Since there are columns after the title, it'll either leave
           | dead space where the old 10 character title was or it'll need
           | to move those other columns too.
           | 
           | I can't remember which I was taught. I just remember that
           | columns after variable columns get impacted when the data in
           | a preceding variable column changes.
           | 
           | If the title were at the end of the row, the db could expand
           | the data without needing to move other columns over (if there
           | happened to be available space right there).
           | 
           | If you were updating with a shorter value, it could shorten
           | the row by moving the terminating character to the new
           | shorter location, freeing up the space the longer title was
           | using.
           | 
           | Bottom line is you want to keep variable columns which get
           | updated most frequently towards the end of your table
           | structure.
           | 
           | Knowing which column that would be isn't always clear when
           | creating the table.
           | 
           | Again, this understanding was taught to me over 20 years ago.
           | I'm probably remembering parts of it wrong and DB storage has
           | likely (hopefully) advanced since then.
        
             | jasonwatkinspdx wrote:
             | DB storage is a lot more sophisticated than what you were
             | taught.
             | 
             | Most databases use Slotted Pages to organize storage. Pages
             | are fixed size and numbered by their offset within the
             | database file. The page header contains the number of rows,
             | followed by an array of offsets for individual rows within
             | the page. Rows themselves generally are stored at the end
             | of the page filling downwards. The storage engine can move
             | around rows in arbitrary ways to consolidate free space.
             | 
             | Fundamentally there's no connection between SQL schema
             | order and how table storage is organized on disk. For
             | example in a column store there's often no contiguous row
             | stored anywhere, instead there's just separate indexes per
             | column.
        
               | lsaferite wrote:
               | That would seem like an argument in favor of allowing
               | column reordering.
        
             | jhgb wrote:
             | > That data is stored in a similar order on disk (is this
             | still true?).
             | 
             | There should be no requirement for this. Columns in
             | relations are not conceptually ordered, so it shouldn't
             | matter for the things you're doing with the data anyway,
             | and the database should be able to reorder the data in
             | whatever way it likes, since desire to isolate the user
             | from physical data structures was one of the main reasons
             | for the rise of RDBMS.
        
               | MichaelApproved wrote:
               | > _the database should be able to reorder the data in
               | whatever way it likes_
               | 
               | My point was that you're trying to prevent the DB from
               | reordering the data because there's a performance cost
               | when that happens.
        
               | jhgb wrote:
               | Why would you be trying to _prevent_ the DB from
               | reordering the data? You 're not supposed to have better
               | knowledge of what's good for your use case than an RDBMS
               | that can collect usage statistics on queries and such.
               | Ditto for compilers rearranging structures and such. When
               | you start having hundreds of tables and thousands of
               | queries, I don't see how you can do a better job than an
               | automated system at that point.
        
               | MichaelApproved wrote:
               | > Why would you be trying to prevent the DB from
               | reordering the data?
               | 
               | Sure. "Reduce the need" would be a better word than
               | "prevent".
               | 
               | If I can do a good job organizing the table columns (as
               | described above) it'll lower the need for the DB to
               | reorder data.
               | 
               | Reduced need to reorder, improves performance.
        
               | masklinn wrote:
               | 1. That's not exactly true, AFAIK all RDBMS return
               | columns in table order when order is unspecified (`*`),
               | and while they could reorder on retrieval
               | 
               | 2. postgres definitely does not, and column tetris is
               | absolutely a thing in the same way struct packing is
               | (with the additional complexity of variable-size columns)
        
               | jhgb wrote:
               | But the order of attributes in the presentation of the
               | result relation has nothing to do with physical layout of
               | base relvars -- primarily because any relationship
               | between the two is purely coincidental. The vast majority
               | of useful queries will _not_ reuse the order of
               | attributes in base relvars, so optimizing for the unusual
               | trivial case by prohibiting better rearrangements that
               | could be useful for a much larger number of use cases
               | seems rather pointless.
               | 
               | And what PostgreSQL does is of course an implementation
               | detail of PostgreSQL.
        
           | fabian2k wrote:
           | There is some small benefit to playing column tetris if you
           | have columns with different sizes that waste space due to
           | padding. I'm not convinced this would be worth the complexity
           | of this feature, but in some cases reordering columns might
           | have measurable benefits in reducing the size of the data.
        
             | pbz wrote:
             | I'm not talking about the physical layout of the data, just
             | a thin UI layer that the DB tool could use. Maybe we could
             | have two modes: physical vs UI ordering?
        
               | andruby wrote:
               | As mentioned by another comment: you can use views for
               | that.
        
               | pbz wrote:
               | Views are not a good solution for this. The point is to
               | be able see a specific order in 3rd party apps and when
               | writing quick add-hoc queries (select *). Writing views
               | for every table would just pollute the db.
        
         | Tostino wrote:
         | Ideally, postgres would play column Tetris behind the scenes
         | and store the columns on disk in the most appropriate way,
         | while allowing the representation to be changed at will.
        
           | pbz wrote:
           | Yeah with maybe an option to manually optimize (that would
           | rebuild the table if needed).
        
         | dboreham wrote:
         | > Please don't think of it as trivial
         | 
         | I've worked with relational databases for 20+ years. This is
         | the very first time I heard of this.
        
           | pbz wrote:
           | I've worked with DBs for 20+ years as well. This is a quality
           | of life type of improvement. If you've worked mainly with DBs
           | that don't make this easy it's hard to know what you're
           | missing. Do a search for column reordering for PG and you'll
           | get a ton of hits.
        
         | hans_castorp wrote:
         | > It leaves the unfair impression that this is a "toy" db.
         | 
         | So you consider Oracle, SQL Server and DB2 also to be "toy"
         | databases?
        
           | pbz wrote:
           | With SQL Server the management tool does give you a way to do
           | this. Yes, it does a table rebuild behind the scenes. The
           | point is that it's easy. Don't have experience with the other
           | two, but MySQL is the most popular so it kinda sets the tone
           | whether we like it or not.
        
             | hans_castorp wrote:
             | Well, writing a procedure that rebuilds the complete table
             | in Postgres or Oracle is easy as well. I never needed this,
             | but I am sure, there are some sample implementations out
             | there.
             | 
             | Rebuilding the entire table doesn't seem feasible for large
             | tables to begin with. Especially with a lot of incoming and
             | outgoing foreign keys.
             | 
             | I disagree that MySQL is the most "popular".
             | 
             | It might be the "most used" one because of so many web
             | hosting services included it for ages by default.
        
               | jl6 wrote:
               | Tangential, but there are almost certainly more SQLite
               | databases in existence than every other RDBMS put
               | together, probably by 3 or 4 orders of magnitude.
               | 
               | It doesn't support column reordering either.
        
         | jl6 wrote:
         | Better to channel your displeasure into advocating for a future
         | SQL standard to include this feature. Other DBs that support
         | this do so via proprietary syntax extensions.
        
           | masklinn wrote:
           | Technically it doesn't require syntactic extensions, postgres
           | stores the column index as the `attnum` attribute column of
           | the `pg_catalog.pg_attributes` system table.
           | 
           | So this could be made to work by hooking into the storage
           | system and rewriting the table and all pointers to the table
           | when `attnum` is updated.
           | 
           | Without this rewriting this only works if the table was just
           | created and has no data, and no external metadata referencing
           | the columns themselves e.g. views, fks, indexes, defaults,
           | rules, ...
           | 
           | An alternative would be to add automatic packing (a la rustc)
           | to postgres, decorrelating the "table position" and the
           | "physical position" of the rows, this would also allow free
           | "table position" reordering.
           | 
           | And while it's by far the most complex option, one of the
           | nice bits with it would be that the system columns could be
           | packed as well. Currently there's quite a bit of waste
           | because there are 6 system columns (by default), the first 5
           | are 4 bytes, but the 6th (ctid) is 6 bytes, meaning 2 bytes
           | of padding if your first column is a SERIAL, and 6 if the
           | first column is a BIGSERIAL (or an other double-aligned
           | column).
        
             | anarazel wrote:
             | > And while it's by far the most complex option, one of the
             | nice bits with it would be that the system columns could be
             | packed as well. Currently there's quite a bit of waste
             | because there are 6 system columns (by default), the first
             | 5 are 4 bytes, but the 6th (ctid) is 6 bytes, meaning 2
             | bytes of padding if your first column is a SERIAL, and 6 if
             | the first column is a BIGSERIAL (or an other double-aligned
             | column).
             | 
             | FWIW, system columns aren't stored as normal columns in
             | tuples. Some of them are implied (e.g. tableoid doesn't
             | need to be stored in each tuple, ctid is inferred from
             | position), others are not stored in the way normal columns
             | are stored (e.g. xmin, xmax).
        
             | RedShift1 wrote:
             | Couldn't an ordering column be added to the pg_attributes
             | that determines in which way the columns are sorted when
             | they are displayed? Standard SQL code can then be used to
             | manipulate the display order of the columns.
        
               | [deleted]
        
         | anarazel wrote:
         | The problem is that we hear a lot of different features touted
         | as "the crucial missing one"...
         | 
         | Anyway, there's been work on this in the past, which recently
         | has been picked up again. It's not all that trivial to do well.
        
           | pbz wrote:
           | _The problem is that we hear a lot of different features
           | touted as "the crucial missing one"_ -- I would definitely
           | put this into the polish category, but items in that category
           | are also important; especially for those with a MySQL
           | background.
           | 
           |  _which recently has been picked up again_ -- that 's awesome
           | to hear
        
       | silvestrov wrote:
       | > This fits with my mental model of how I think unique
       | constraints should work with null values.
       | 
       | Agree, this should be the default for unique indexes. Why would
       | you ever want multiple NULLs in an unique index?
        
         | mhaberl wrote:
         | > Why would you ever want multiple NULLs in an unique index?
         | 
         | You would maybe want some value to be unique if it exists.
        
         | daveoc64 wrote:
         | Some kind of secondary identifier, but only for records where
         | it is applicable (e.g. loyalty card number).
        
         | phoe-krk wrote:
         | The rows (1, NULL), (1, NULL), and (1, NULL) - each NULL
         | signifies an absence of a value - these NULLs may actually be
         | later filled into (1, 1), (1, 2), and (1, 3). That's why may
         | not know if any two NULL values are equal to one another.
         | 
         | In some cases the above is actually important, whereas in
         | others you may want to treat NULL as a "value" instead.
        
         | larsnystrom wrote:
         | Multiple NULLs in a unique index are useful when an entity has
         | 0 or 1 of a simple scalar value. For example, you might have a
         | column with an external ID, such as a facebook login id in your
         | users table. Not all users have a facebook id, but those who do
         | should have a unique value in that column.
        
           | radiospiel wrote:
           | You might want to look at a unique partial index: "CREATE
           | UNIQUE INDEX foo ON users(facebook_id) WHERE facebook_id IS
           | NOT NULL".
           | 
           | I am not sure if this really is a great modelling. If a user
           | happens to have two logins in your table (maybe they signed
           | up with two different emails), and then connects one of his
           | accounts with his FB identity, what does it mean that the
           | other entry cannot be connected to the FB identity?
        
         | masklinn wrote:
         | Because it's very common for attributes to be optional but
         | unique if specified e.g. your fleet management might have a
         | 0-1:1 relationship between employees and cars, no car can be
         | associated with 2 employees, an employee can't have two cars,
         | but any number of employee can be car-less, or car employee-
         | less (lending pool, or cars which have not been attributed
         | yet).
         | 
         | This also fits perfectly well with the normal interpretation /
         | treatment of sql null as "not a value".
        
           | faho wrote:
           | But an employee can't be car-less twice!
           | 
           | That's what this change prohibits.
        
             | masklinn wrote:
             | > But an employee can't be car-less twice!
             | 
             | What.
             | 
             | > That's what this change prohibits.
             | 
             | This change prohibits nothing, it allows specifying
             | something (which was a bit complicated to enforce before).
        
               | faho wrote:
               | >This change prohibits nothing, it allows specifying
               | something (which was a bit complicated to enforce
               | before).
               | 
               | It allows specifying a _constraint_ - that you can 't
               | have two rows with the same values even if one of the
               | values is a NULL. That's prohibiting duplicate NULLs. The
               | change allows you to prohibit duplicate NULLs.
               | 
               | Say you have a table (EmployeeName, CarID). You could do
               | a UNIQUE constraint on those two attributes, but that
               | would still allow:
               | 
               | EmployeeName | CarID
               | 
               | Jeff | 2
               | 
               | Kim | NULL
               | 
               | Kim | NULL
               | 
               | Here, "Kim" is car-less (NULL in the CarID field) _twice_
               | , which makes no sense.
               | 
               | Hence the new constraint.
        
               | masklinn wrote:
               | > It allows specifying a constraint - that you can't have
               | two rows with the same values even if one of the values
               | is a NULL. That's prohibiting duplicate NULLs. The change
               | allows you to prohibit duplicate NULLs.
               | 
               | Sure.
               | 
               | > Here, "Kim" is car-less (NULL in the CarID field)
               | twice, which makes no sense.
               | 
               | The relation makes no sense to start with, it didn't need
               | duplicate nulls for that.
        
               | jhugo wrote:
               | Kim being car-less would be represented by Kim not
               | appearing in this table at all. The design you've
               | presented makes no sense.
        
               | [deleted]
        
               | jamesmstone wrote:
               | pre postgres 15 can you solve this with two unique
               | constraints ?
               | 
               | unique (EmployeeName), unique (EmployeeName, CarID)
        
               | masklinn wrote:
               | You'd most likely solve this by making the CarID NOT NULL
               | because there's why would you even have a row with no
               | useful data?
               | 
               | And if this is your employee table, you'd have UNIQUE
               | (EmployeeData) and UNIQUE (CarID), and it would do what
               | you want.
        
               | ryanbrunner wrote:
               | That's not quite the same, since the previous solution of
               | unique (EmployeeName, CarID) allowed for one employee to
               | have 2 separate cars. You have to use conditional indexes
               | to solve this.
        
             | Beltalowda wrote:
             | It's a 0-1:1 relationship, so you'd have columns like
             | "name", "email", "car", etc. on the employees table, and
             | "car" might have a unique index (which is probably an ID
             | referencing a "cars" table).
             | 
             | Since employees can only have one car and since a car can
             | only be used by one employee, an unique index is fine, but
             | there may be 20 people that don't have a car so you don't
             | want NULL values to be unique.
             | 
             | Of course there are always ways to work around this, but
             | this is the general idea.
        
         | seanhunter wrote:
         | One way nullable columns are often used is like an option type.
         | It's easy to see that you might want rows to be unique if they
         | have Some(value) but have multiple rows where the column is
         | None
        
           | almog wrote:
           | I agree that this is often the use case but depending on how
           | the query (as well as how sparse the nulls are, the total
           | data size and partitioning scheme I guess) it's also possible
           | to achieve this through a join-table without introducing NULL
           | values.
        
             | gsk22 wrote:
             | Unless you outer join, in which case NULL values pop up out
             | of thin air...
        
               | almog wrote:
               | Good point. I did try to say that whether that's a viable
               | technique would depend on your query (but omitted some
               | verb that should have made it better resemble a sentence
               | in English :)).
               | 
               | But yes, even if no nulls are present in any table, outer
               | join could (or maybe even should) introduce them.
        
         | jhugo wrote:
         | I can't think of a situation where I'd want "only a single
         | NULL"; that seems much weirder. Another comment described a
         | tree data structure where it could make sense, but I can't
         | think of others. Nullable unique indexes usually describe some
         | kind of optional identifier; if it's present, you want it to be
         | unique, but it's perfectly fine for more than one row to omit
         | it.
        
           | orthoxerox wrote:
           | You might want an index like this on some aggregate table.
           | Like, shoe_sales_total, which is grouped by the id of the
           | shop assistant that helped you with the selection. But
           | customers that have marched straight to the till and asked
           | for shoe polish result in records with a NULL for agent_id.
           | With a regular unique constraint (month, agent_id), your
           | aggregate table can end up with two records for (May-21,
           | NULL).
           | 
           | Of course, the workaround for such cases is simple: have a
           | sentinel value for unassisted sales and make the column not
           | nullable.
        
         | goto11 wrote:
         | Lets say you have a Person table with with a SSN column. The
         | column has a unique constraint because no two people have the
         | same SSN. But the column in nullable, since not everyone has a
         | SSN.
         | 
         | It would be weird to expect that only _one person_ does not
         | have SSN.
        
         | knorker wrote:
         | E.g. if you have a "tax ID" column, and two people in your
         | database don't have tax IDs because they're not <your country>
         | tax payers.
         | 
         | Or "payment vendor processing ID". You may have multiple
         | entries not yet sent to the vendor, so they're NULL.
         | 
         | This shows up literally all the time in SQL schemas.
        
       | hrdwdmrbl wrote:
       | Cool but this was trivially solved previously by excluding NULL
       | from the index using a partial index.
       | 
       | I'm more interested in problems that have no great solution, like
       | taking the first element of a set of partitioned data. Right now
       | you have to partition & rank, and then select the rank 1 elements
       | in a subsequent query.
        
         | masklinn wrote:
         | > Cool but this was trivially solved previously by excluding
         | NULL from the index using a partial index.
         | 
         | That seems like the opposite solution, treating NULLs as
         | different in database systems which treat them as identical by
         | default (which I think only includes MSSQL).
         | 
         | In postgres, and most DBMS, using a partial index excluding
         | NULL allows any number of rows to have a NULL value... which is
         | the same behaviour as if you're not excluding NULLs.
         | 
         | The new opt-in behaviour is treating all nulls as the _same_
         | value, and thus allowing a UNIQUE index to constraint them.
         | 
         | This required two different partial indexes, or for constrained
         | data subset in hacks like COALESCE-ing nulls to a sentinel.
        
         | silon42 wrote:
         | The new behavior is more useful because it allows a single
         | unique null.
         | 
         | For 2, have you looked at "LATERAL JOIN" ?
        
         | _flux wrote:
         | I don't think you understood what this does (or then I
         | misunderstood this) but if you do that, then how are you going
         | to get an index violation from having two NULL values in the
         | index? Because that is what this allows: normally indexing NULL
         | values permitted multiple NULL entries, but with this NULL is
         | comparable to other NULLs, therefore you can have only one NULL
         | value in the index.
         | 
         | Previously I had worked this around with functional index on
         | e.g. COALESCE(value_with_null, -1), but not always you have a
         | good sentinel value easily available. A more complicated index
         | (..CASE WHEN..) would solve that, but it's nice to have this
         | directly expressible, seems efficient as well.
        
         | jhugo wrote:
         | Maybe it's what you meant be "subsequent query", but you don't
         | actually need to do it in a separate query; you can do the
         | partition & rank in a CTE and select from the resulting table,
         | all in a single query. It's quite tidy actually.
        
       | layer8 wrote:
       | > Because NULL values are of unknown equality to one another,
       | they do not violate UNIQUE constraints.
       | 
       | Well, arguably you don't know whether they violate uniqueness, so
       | inserting more than one shouldn't have been allowed.
        
       ___________________________________________________________________
       (page generated 2022-07-11 23:01 UTC)