[HN Gopher] Postgres 15 improves UNIQUE and NULL
___________________________________________________________________
Postgres 15 improves UNIQUE and NULL
Author : Twisell
Score : 391 points
Date : 2022-07-11 07:29 UTC (15 hours ago)
(HTM) web link (blog.rustprooflabs.com)
(TXT) w3m dump (blog.rustprooflabs.com)
| daigoba66 wrote:
| Coming from an MS SQL Server background, this behavior surprised
| me.
|
| While SQL Server generally does treat null values as not equal in
| query predicates, for the purpose of unique constraints/indexes
| it'll treat them as the same value. I guess this does go against
| the spec? But it makes sense as a human when looking at a tuple
| and deciding if it is "unique" or not.
|
| On that note, now I wonder what SQL Server does with nulls in a
| DISTINCT or GROUP BY. I suspect that it'll treat all nulls as the
| same.
| hans_castorp wrote:
| > Coming from an MS SQL Server background, this behavior
| surprised me.
|
| Oracle behaves the same as Postgres with regards to NULL values
| in unique indexes.
|
| > I guess this does go against the spec?
|
| The first comment (by the blog author)
| https://news.ycombinator.com/item?id=32054151 explained this:
|
| > The background here is that the SQL standard was ambiguous
| about which of the two ways an implementation should behave. So
| in the upcoming SQL:202x, this was addressed by making the
| behavior implementation-defined and adding this NULLS [NOT]
| DISTINCT option to pick the other behavior.
| hotsauceror wrote:
| An interesting question. They may be treated differently
| depending on whether the nullable field is one of the grouping
| fields, or is one of the selected aggregate fields.
|
| https://blog.sqlauthority.com/2015/02/13/sql-server-warning-...
| rwmj wrote:
| I think the main lesson from this is you should use a NOT NULL
| constraint on just about every column of your database, since the
| behaviour of NULLs is weird and likely to indicate a problem with
| your data.
| nicoburns wrote:
| Except that this change means that the behaviour of nulls is
| not weird anymore!
| masklinn wrote:
| Null is still null, sql null is always "weird" wherever it
| exists.
| gigatexal wrote:
| Why is null weird? Null means unknown. It's perfectly
| valid.
| vasachi wrote:
| It is kinda weird, if you don't know how it works. For
| instance:
|
| Say you have a unique constraint on two columns
| (column_a, column_b). column_a is not nullable, column_b
| IS nullable. Obviously these values are unique:
|
| (1, 2) (1, 3) (2, 2)
|
| But these values ARE ALSO UNIQUE:
|
| (1, NULL) (1, NULL)
|
| It is obvious in hindsight (NULL isn't equal to NULL),
| but can be quite a surprise.
|
| ...
|
| Oh wait, postgres 15 deals with just this situation. Huh.
| jhugo wrote:
| Honestly it's what you want way more often than not. A
| unique index on a nullable column is usually representing
| some optional identifier. You wouldn't usually want only
| one row to be able to omit the identifier.
| vasachi wrote:
| Oh yeah, totally. But not always. I'm very glad that we
| now have an option to have both behaviours.
| Sankozi wrote:
| It makes lots of queries (like find duplicate queries)
| much more complex without any benefit. Also it introduces
| completely unnecessary three value logic in SQL.
| masklinn wrote:
| "Valid" and "weird" are orthogonal concepts.
|
| That sql null is valid doesn't make its behaviour of not
| being equal to itself any less weird, because it works
| that way essentially nowhere else, and thus is highly
| unintuitive to developers who don't live and breathe sql.
|
| The closest thing in most langages is nan and developers
| also find nan weird.
| gigatexal wrote:
| Think of nulls like different infinities. Two infinities
| may not be equal.
|
| I find it quite useful to know when something is null
| though I have a hard time explaining it.
| nicoburns wrote:
| > The closest thing in most langages is nan and
| developers also find nan weird.
|
| I'd argue the closest thing in most languages is the null
| propagation operator (usually using a ? symbol) that at
| least JavaScript and C# have.
|
| YMMV, but personally the unique index behaviour was the
| main thing I found weird about NULLs. So now that I can
| turn that off I'm pretty happy with them.
| masklinn wrote:
| > I'd argue the closest thing in most languages is the
| null propagation operator (usually using a ? symbol) that
| at least JavaScript and C# have.
|
| In what sense? null-coalescing operators don't change how
| nulls behave or relate to one another, they only provide
| convenience operators for conditioning operation upon
| them, like sql's COALESCE.
| JUNGLEISMASSIVE wrote:
| ryanbrunner wrote:
| With keys it very often means "definitively not present"
| in practice, and there's no other value that you can use
| to represent that if you want referential integrity.
| roenxi wrote:
| SQL uses [0] 3 valued logic, and most people seem to
| reason using 2-valued logic. So most people fund the
| behaviour of NULL, the 3rd logical value, to be weird.
|
| > It's perfectly valid.
|
| :o You'll get jumped on, saying this to a forum of
| software people. Lots of things that are valid are also
| weird.
|
| [0] https://www.postgresql.org/docs/14/functions-
| logical.html
| p49k wrote:
| Be careful with this in Postgres: using a zero, blank space,
| empty array or similar as a default value instead of NULL can
| lead to enormous bloat in terms of disk space. There are
| distinct advantages of NULL that have to be considered.
| SnowHill9902 wrote:
| He most probably means creating a data structure that does
| not allow NULLs (usually increasing your normal form works)
| not replacing them by a naive synonym.
| idoubtit wrote:
| Your statement is wrong in the general case. It probably
| depends on the DB engine and several parameters, but, at
| least with MariaDB/MySQL, allowing NULL _increases_ the disk
| usage. See https://stackoverflow.com/questions/13207838/disk-
| space-impl...
| masklinn wrote:
| Their statement is not "in the general case" though, it's
| specifically for postgres.
|
| Postgres uses a fixed-size null bitmap and variable-size
| rows, so a NULL value takes one bit in the bitmap (and
| additional nullable columns may require a wider bitmap),
| but they are skipped in the row itself.
| fabian2k wrote:
| Postgres uses 1 bit per NULL value, though the reality is
| more complex as everything is padded for alignment. So
| fewer than 8 nullable columns are free, and above that you
| pay probably 8 bytes at once (I'm not totally sure on the
| exact numbers here).
|
| So for Postgres it is generally true that storing NULLs is
| very cheap in most cases.
| Pxtl wrote:
| A person with alopecia's hair color isn't "unknown" it's non-
| existent. They do not have hair.
|
| Would you suggest that a database of appearance information
| about a person should have a separate subtable for "hair" to
| properly model this feature?
|
| Either way, in the end, for displaying 99% of the time you're
| going to be using a VIEW that LEFT JOINs all these cute
| normalized tables back together again, and then you're going to
| want to filter those views dynamically on a nice grid that the
| user is viewing, and the fact that X != X is going to bite you
| in the ass all over again.
|
| Creating more tables is just moving the problem around, not
| solving it.
| papito wrote:
| Also, do not order a vanity plate that says "NULL" - you are
| likely to have a very bad time.
|
| https://www.wired.com/story/null-license-plate-landed-one-ha...
| RedShift1 wrote:
| What do you use instead of null then? For example let's take a
| purchase table, that keeps track when the purchase has been
| submitted to the supplier, let's call it sent_to_supplier. What
| do you use for sent_to_supplier in case the purchase has not
| been sent to the supplier yet?
| Avamander wrote:
| A boolean column?
| RedShift1 wrote:
| It keeps track of _when_, so timestamp.
| vosper wrote:
| What's the null-replacing value in a timestamp field?
| konha wrote:
| If you care about 'when' you probably also care about
| resubmissions. Or metadata like the supplier's response.
| Or who approved the submission etc. A separate table
| seems like a good solution.
| rwmj wrote:
| It's not a hard and fast rule, but in your example you can
| solve this in the model without needing NULLs. A table called
| ORDERS records the orders, SUPPLIERS lists your suppliers,
| and a table SENT_TO_SUPPLIER links ORDERS to SUPPLIERS, and
| you don't add a row into that table until the order has been
| sent.
| RedShift1 wrote:
| What about this, let's say I have two electricity meters.
| They have bunch of fields (active power, reactive power,
| voltage, current, and all that times 3 (one such
| measurement for each phase)), but one kind of meter can
| measure distortions, the other can not.
|
| I would make the distortion columns nullable for the
| pratical reason that you either have to duplicate all the
| work for querying the same properties from the meters (for
| example a dashboard that shows voltages only). If you want
| to support both types of meters in the same dashboard you
| would have to do a UNION type query (and not ever forget to
| do that!) if you store the measurements in a different
| table.
| cwillu wrote:
| Rather than putting the two types of meters in two
| different tables and doing an error-prone UNION, you
| would put the distortion measurements in a different
| table joined to the first, and the meters that don't
| support it would simply not include entries in that
| table.
| RedShift1 wrote:
| But that would still get you nulls in the result set,
| which is what we're trying to avoid?
| rocqua wrote:
| It's possible to handle these kinds of fields (at most
| one value) with separate tables and foreign keys. It has
| advantages around 'proving correctness'. But without
| extra tooling it has significant usability, and perhaps
| performance disadvantages. I wonder if you can get around
| some of this using virtual tables.
|
| I come at this with little experience with SQL, and
| having worked a bit on ampersand[1], a tool where you
| declare the structure of your data, and some invariants,
| and the tool will create a schema for your database, and
| some automatic checking to ensure your invariants are
| upheld.
|
| [1] https://ampersandtarski.gitbook.io/documentation/
| Cthulhu_ wrote:
| Wouldn't a meter value that doesn't have distortions just
| have a value of 0 (that is, not null but just zero); the
| properties of a meter in this case belong somewhere else,
| either in a "meter properties" table if you have many
| meters with many properties that can change, or just
| hardcoded.
|
| Alternatively, a key / value table, e.g. `meter_id,
| property, value`. But that isn't very optimal, works
| better for things like a shop product with many different
| properties per product.
| RedShift1 wrote:
| It doesn't mean there are no distortions, it just means
| they are not measured. So I vastly prefer knowing there's
| no data than some placeholder that "simulates" no data
| synthmeat wrote:
| Boolean capabilities of meter per every entry? Shouldn't
| impact size that much, easy to index. As a bonus, you can
| track failure of a sensor per each meter, and know when
| it first failed.
| jve wrote:
| Oh yeah, solve complexity by introducing complexity
| squared.
| fabian2k wrote:
| The submission process might deserve its own table with more
| detail than sent/not sent. Otherwise it's a straigtforward
| boolean, I don't see any use for NULL here.
|
| Edit: Since it's a date, there are valid use cases for
| nullable timestamps. But only if there is no additional
| information attached to the event you're saving in the
| timestamp. Another complication with nullable timestamps is
| sorting by them, which you often do with timestamps. With
| nullable ones this can get messy.
| pprotas wrote:
| You could create a new table that keeps track of the
| logistics of the product. If the product is sent to supplier,
| a new row appears in this table with the date. If it hasn't
| been sent yet, no row is present.
| therein wrote:
| Handles the cases where an order had to be resent to the
| supplier better as well.
| formercoder wrote:
| And when a supplier returns a shipment, and when new
| things happen that you haven't thought of so a new column
| isn't needed.
| mjevans wrote:
| More careful thought about the processes and data that are to
| be captured.
|
| In a simplistic view maybe NULL is really the correct choice;
| but is it? Does NULL represent the desired properties?
|
| Another common pattern is the use of some sort of sentinel
| value, E.G. the largest possible value for a date field, or
| the smallest, might be used to indicate an unknown maximum or
| minimum that propagates.
|
| A related pattern might be some sort of orders_suppliers
| table which would have a foreign key value; that might be
| NULL or it could use a sentinel value with a dummy supplier
| to indicate a special condition and an arbitrary number of
| inband subsets which can be their own distinct matches.
| stdbrouw wrote:
| These all seem like fine examples of solutions that are
| worse than the problem they were meant to solve. A separate
| table will be harder and slower to query and will enable
| M2M relationships even when you don't want those. Sentinel
| values must be dealt with _at every query_ to ensure they
| are always filtered out.
| mjevans wrote:
| Thank you for pointing that out.
|
| A better relationship would be if the order_suppliers
| table was the one with a relation to this table and there
| wasn't even a column.
| fabian2k wrote:
| NOT NULL really should be the default. There are of course
| valid reasons for allowing NULLs, but NOT NULL is much easier
| to handle for most cases. I don't think nullable columns
| indicate a problem with the data, but nullable columns that
| don't have a good justifications are certainly trouble.
| masklinn wrote:
| The fun bit is this mistake gets replicated in every
| definition langage, sometimes made ever worse (e.g. openapi
| where fields can be omitted by default, and you can
| additionally make them nullable).
| JUNGLEISMASSIVE wrote:
| KptMarchewa wrote:
| Different null types are the worst :(
|
| One of the worst offenders was one API I was working with
| Optional<Something> could be null.
| CipherThrowaway wrote:
| What is your solution for representing the difference
| between
|
| {"foo": null} and {}
|
| Because they are different, right?
| ekidd wrote:
| In Rust, I have occasionally seen things like:
| Option<Option<String>>
|
| Here, "None" means "no value specified", and "Some(None)"
| means "a value is specified, and that value is null." And
| "Some(Some(my_string))" means the value is "my_string".
|
| There are other ways to represent this in Rust, some of
| which might be clearer. This representation seems to be
| used most often if you have a type "MyType", and you want
| to automatically derive a type "MyTypePatch" using a
| macro.
| CipherThrowaway wrote:
| That's also how I've seen it done. But that's not so
| different from the nullable optional approach. It's still
| effectively two flavors of null.
|
| What I'm wondering is how people who have an issue with
| different representations of absence would approach (or
| avoid?) situations requiring them.
| setr wrote:
| Larger enums to represent and discriminate. Null is just
| syntax sugar for the one-kind of absence case, and
| multiple-null-likes eliminates all benefits of that
| syntax sugar, so you might as well revert to a real
| discriminator.
|
| Ideally you can force the use of the discriminator... but
| that depends on your type system
| CipherThrowaway wrote:
| The problem with flattening the two absence cases into an
| enum is composability and generality because it forces
| the "outer null" case to intrude into concrete types. Say
| you have a key value mapping Map<String, T> and you
| represent changes to this map as Map<String, Option<T>>.
| One day T itself is Option<Int> so you end up Map<String,
| Option<Option<Int>>. If you want to use e.g. Option2 for
| that latter case you lose generality.
|
| Where the "double absence" issue comes up in practice,
| it's usually in a context where it does make sense to
| represent and handle the first type of absence separately
| from the second type.
| tinco wrote:
| Not parent, and not necessarily agree, but one should
| throw a parser/typecheck error and the other should not.
| CipherThrowaway wrote:
| In cases where the field is required I would agree. But
| in the context of an update request, {"foo":null} and {}
| might both be valid with different semantics. "Update
| this field to null" vs "don't update this field"
| dbrgn wrote:
| A similar issue is {}, {"mylist": null} and {"mylist":
| []}. Three ways to represent a missing or empty list.
| AmericanChopper wrote:
| I don't think default check constraints should be a part of
| any serious RDBMS. Developers are free to default to any
| behaviour they prefer.
| GuB-42 wrote:
| NULL is totally fine, and happen naturally if you make an outer
| join. But if you are making a UNIQUE constraint one a nullable
| column, then you _may_ have a problem.
|
| The idea with NULL is that it is not a value, it is the absence
| of value, like a field you didn't fill in a form. For example,
| if you ask two people their personal information, and neither
| specified their email address (email=NULL), you can't assume
| they have the same email address. And if you put a UNIQUE
| constraint on that email field, you probably don't mean that
| only one person is allowed to leave it blank: either you make
| it mandatory (NOT NULL), or you let everyone leave it blank.
|
| The reason nullable and unique are rarely seen together is that
| unique is typically for keys, and generally, you don't want
| rows that have no key value. Also, putting a unique constraint
| on something that is not a key may not be the best idea. For
| example, if you don't intend to use email to uniquely identify
| people, what's the problem with two people having the same
| email?
| knorker wrote:
| You can't get away from having to reason about NULL in SQL even
| if you remove all NULLs from the database.
|
| You can still get NULLs at query time, and they'll have
| semantically valid meanings in views and subqueries.
| usrbinbash wrote:
| The problem is: NULL is a valid datatype in many situations.
| There are non-boolean fields that can just be empty; My goto
| example: -- Table definition for employee
| name, surname, date_entry , date_exit
|
| Everyone has names, and if hes an employee, he probably has an
| entry date...but until he leaves, what's the exit date?
|
| Other than soothsaying, my choices here are: NULL, some magic
| value, or an additional field telling me whether the
| application should evaluate the field.
|
| The latter just makes the code more complicated.
|
| Magic values may not be portable, and lead to a whole range of
| other problems if applications forget to check for them; I
| count myself lucky if the result is something as harmless as an
| auto-email congratulating an employee to his 7000s-something
| birthday.
|
| That leaves me with NULL. Yes, it causes problems. Many
| problems. But at least it causes _predictable_ problems like
| apps crashing when some python code tries to shove a NULL into
| datetime.datetime.
| evv wrote:
| I have struggled to implement a tree structure in PG with
| nullable unique values.
|
| Consider a "docs" table where each doc has a unique name, under
| a given parent doc. A null parent would be a top-level doc, and
| top-level docs just have a unique name. This didn't work
| before, and would hopefully be addressed by PG15.
|
| I'm not sure if null parents really represent a "problem with
| my data", or if the tree structure was too exotic for PG to
| support properly.
|
| How I got around it: hardcode a certain root doc ID, then the
| parent column can be NOT NULL. But this felt janky because the
| app has to ensure the root node exists before it can interact
| with top-level docs. Plus there were edge cases introduced with
| the root doc.
| rocqua wrote:
| Thanks for this example! As I was reading the post, I was
| thinking "cool, and feels like a thing people would expect,
| but does it have a real world usecase?".
|
| This feels like an actual real-world thing people might want
| to do where indeed you'd want to have a single NULL value.
| barrkel wrote:
| When I store tree-structured data in a relational database I
| generally add a 'path' column which is a denormalized string
| containing the names of all the parents with a path separator
| between.
|
| The biggest reason is it makes finding all descendants very
| fast with a prefix search (foo/bar/%) on the path column when
| it's indexed. It's not unusual to want to find or update all
| descendants because descendants are usually related. If you
| don't have a path column, then you need to write a recursive
| CTE, which is Slow, or recurse in your application, which is
| normally even slower. The reason they're slow is they require
| a number of seeks which is exponential in the depth of the
| tree.
|
| It also makes lookup of a node from the path fast, and
| producing a qualified path to a node fast, but these costs
| are linear in path length.
|
| Anyway, this path column is also a good place to put your
| unique constraint.
|
| If you don't want to restrict names from containing the path
| separator, you can escape application-side. For example, if
| using '/' as a path separator, consider '::' to escape ':'
| and ':s' to escape '/' - don't use your path separator in the
| escape or it'll muck up prefix searches.
| SnowHill9902 wrote:
| Doesn't feel natural in Postgres. I'd check an auxiliary
| table nodes_nodes (integer, integer) having an index of all
| ancestor-descendants.
| evv wrote:
| Agree its a bit unnatural, but it seems workable. My
| problem with the auxiliary table is, I wouldn't know how
| to implement the unique constraint.
| SnowHill9902 wrote:
| In your current implementation you either have no unique
| constraint or it is a more particular case of
| UNIQUE(ancestor, descendant).
| radiospiel wrote:
| You might want to look at a unique partial index: "CREATE
| UNIQUE INDEX foo ON pages(name, parent_id) WHERE parent_id IS
| NOT NULL"
| pdenton wrote:
| The way I'd do this, is by separating concerns into separate
| tables. If you have a table with (id, name) and a table with
| (id, parent_id), any doc with a parent will have a
| corresponding record in the second table.
| evv wrote:
| Interesting! But where can you implement the UNIQUE
| constraint for doc names under a given parent? I guess your
| application code would need to handle that
| tgv wrote:
| That's just plain wrong. You need some value to populate
| optional fields, and NULL is a pretty decent choice to indicate
| that a string, date or number has not been filled in. If you
| try to fix that by having special string or numeric constants,
| you're just making life harder, especially when you export the
| data. The poor souls that import it will think "hey, this
| timestamp says 1-1-1970, so this probably happened at Jan. 1,
| 1970."
| zeroimpl wrote:
| I think this is a great change. I wish NULLS NOT DISTINCT was the
| default already - if I wanted the constraint to ignore null
| values I'd have used a partial index to only index non-null
| values. Doing the opposite is much more difficult.
| petereisentraut wrote:
| I am the author of this feature. The background here is that the
| SQL standard was ambiguous about which of the two ways an
| implementation should behave. So in the upcoming SQL:202x, this
| was addressed by making the behavior implementation-defined and
| adding this NULLS [NOT] DISTINCT option to pick the other
| behavior.
|
| Personally, I think that the existing PostgreSQL behavior (NULLS
| DISTINCT) is the "right" one, and the other option was mainly
| intended for compatibility with other SQL implementations. But
| I'm glad that people are also finding other uses for it.
| zeugmasyllepsis wrote:
| Playing around in a docker image of the beta build, it looks
| like this allows you to add a `unique nulls not distinct`
| constraint to a composite set of fields, but still does _not_
| allow you to specify those same fields a primary key. For
| example -- This works alter table
| announcement_destinations add constraint
| announcement_destinations_pk unique nulls not distinct (sender,
| receiving_group, optional_filter);
|
| but -- Still does not work alter
| table announcement_destinations add constraint
| announcement_destinations_pk2 primary key (sender,
| receiving_group, optional_filter);
|
| fails with a message like "ERROR: column "optional_filter" of
| relation "announcement_destinations" contains null values". Is
| there a motivation for this distinction?
| [deleted]
| zigzag312 wrote:
| > Personally, I think that the existing PostgreSQL behavior
| (NULLS DISTINCT) is the "right" one, and the other option was
| mainly intended for compatibility with other SQL
| implementations.
|
| Care to explain why you think NULLS DISTINCT is the "right"
| default behavior? What problems does it solve to warrant
| additional complexity by default?
| Pxtl wrote:
| Once you've accepted that X=X doesn't return TRUE when
| X=NULL, NULLS DISTINCT is consistent with that. ANSI NULL
| three value boolean algebra is insane, but it's insane in a
| pretty consistent way.
| woevdbz wrote:
| An example where NULLS DISTINCT might make sense in a
| constraint is when a table is used to store an analytical
| cube roll-up (eg output of GROUP BY ROLLUP), where NULL in a
| dimension column has special meaning (it indicates
| subtotals). Having multiple rows with the same NULL key
| could, in some of those cases, be an error leading to double-
| counting.
| petereisentraut wrote:
| > Care to explain why you think NULLS DISTINCT is the "right"
| default behavior? What problems does it solve to warrant
| additional complexity by default?
|
| It's the most consistent with the equality behavior of null
| values elsewhere.
| jeff-davis wrote:
| I generally agree that, in most cases, you want the NULLS
| DISTINCT behavior. But thank you for providing such a
| developer-friendly feature that allows flexibility here!
|
| NULL in SQL is not terribly consistent overall. Sometimes
| NULL is treated like "unknown" and sometimes more like
| "n/a". And the interaction with non-scalar types (like
| records) is pretty strange. Also, it's common for ORMs to
| map the app language's NULL (or None/Nil/whatever) to SQL
| NULL, which adds its own nuance. So I can see this being a
| useful feature.
| dx034 wrote:
| I'm not the person you're responding to, but also think NULLS
| DISTINCT makes sense in many cases. NULLs often represent
| missing data. Imagine storing a customer's address and the
| street name is NULL. If several customers have a street=null,
| it doesn't mean that they have the same street. So from a
| data perspective, it makes sense to treat these unknown
| values as distinct.
|
| For filtering and aggregation I still welcome the change, as
| it makes sense in many cases to treat them as not distinct.
| zigzag312 wrote:
| Thanks. It seems that the issue arises from value equality
| vs optional value state equality.
|
| To me, a more natural way to treat NULLs is to think of
| NULL not as a value, but as a state. Several customers with
| a street=null all have street property in an equal state.
| However, an equal state doesn't mean the value is also
| equal, as there is no value in that state. Option type in
| functional languages models this perfectly:
| 'a option = |None // no value here
| |Some of 'a // value of type 'a
|
| So, when checking, if customers have the same street we
| need to check the value. So, comparision is only valid if
| street has value defined. Adding null check in addition to
| equality check to ensure that the actual values are equal
| feels most natural to me (as this is how it's done in most
| languages).
|
| Unique constraints should not be a problem. Unique
| constraints are about values. So if state is NULL there is
| no value and so constraint does not treat NULL states as
| equal values. Adding a keyword to change constraint
| behavior when needed, would be best IMO, as it would need
| to be rarely used.
| masklinn wrote:
| > So, when checking, if customers have the same street we
| need to check the value. So, comparision is only valid if
| street has value defined. Adding null check in addition
| to equality check to ensure that the actual values are
| equal feels most natural to me (as this is how it's done
| in most languages).
|
| Is it tho?
|
| Languages with ubiquitous nullability will accept both
| being null or both being non-null and the same value.
| Languages with option types do the same (as long as the
| value is equatable obviously), this includes OCaml:
| https://v2.ocaml.org/api/Option.html#preds.
|
| The only languages I can think of which seriously diverge
| would be C and C++ where this is UB.
|
| Maybe if you way stretch it Java because it doesn't have
| operator overloading, but even there there's
| Object.equals which is null-safe... and returns `true` if
| both parameters are `null`.
| zigzag312 wrote:
| I completely agree. I tried to explain basically the same
| thing as you, but it seems there is some ambiguity in my
| example I didn't intend.
|
| Comparing two nullable/option values when both are in a
| "null state" will return true in most languages. Which,
| to me, is the most logical result as they have the same
| state. So, if you want to check if two people actually
| live on the same street, it's not enough to check
| p1.street == p2.street, but you need to also check that
| at least one of the streets is not null. Two nulls would
| return true, but if equality returns true and we know
| that one of the streets isn't null, we know we are
| comparing actual street values (not null states). Sadly,
| in SQL, NULL=NULL result is not true, but NULL.
| knorker wrote:
| Not parent commenter, but to me it seems "obviously correct".
|
| As explained in the article NULL means "unknown". Let's say I
| have two people, and a "tax identifier" column. Let's say two
| people can't have the same tax id. Obviously "don't have one"
| or "unknown" fit very well in NULL, and two people can both
| be missing a tax id, even if those who have it need it to be
| unique.
| chris_wot wrote:
| Except that's the ambiguity of NULL in the standard. It
| doesn't just mean "unknown". It can also mean "no value".
| And you can clearly compare two elements with no value.
|
| Which is why nulls are so incredibly confusing.
| knorker wrote:
| Yeah that's true. All the database can say is "I don't
| have a value here". Not "there exists no value for this
| column".
|
| But this doesn't mean that two NULLs are the same. Two
| people with NULL tax IDs are not the same (one turns out
| to be unknown but existing, the other is a toddler
| without a tax ID). Ten people with NULL address don't
| live in the same house. Half declined to give address,
| the other half are homeless. Either way we can't send
| mail to them.
|
| If you use the database to have a UNIQUE constraint of
| "only one purchase per household" then it makes no sense
| to allow NULL addresses, but only allow the first NULL
| address purchaser to buy the thing.
|
| Or "sorry, in this hotel we only allow one guest at a
| time without a car, and someone else already doesn't have
| a car".
|
| Does that guest without a car actually have a car? I
| don't think that's something that the database can solve.
|
| Should databases have two separate values for unknown or
| no value? That sounds like a world of hurt, with two
| types of null.
| jl6 wrote:
| > Should databases have two separate values for unknown
| or no value? That sounds like a world of hurt, with two
| types of null.
|
| It's also a non-problem in practice, because if an
| application needs to distinguish between multiple types
| of nulls, it can very easily just use an extra column
| holding the information needed to disambiguate.
| jstanley wrote:
| Similarly, the need for non-boolean columns is a non-
| problem in practice, because an application can just make
| an extra column for every bit it needs to store. If it
| needs strings or arbitrary-width numbers it can just join
| on other tables that store those bits.
|
| (Existence of a workaround doesn't turn a problem into a
| non-problem)
| jl6 wrote:
| I stand by it being a _non-problem in practice_ because
| the workaround is trivial, and the supposed solution adds
| complexity everywhere.
|
| There isn't even a universally agreed methodology for how
| to handle nulls in the relational model in theory.
|
| It's a semantic problem that can only be solved with
| reference to the requirements of a particular use case,
| and when you do that using explicit extra columns, the
| problem evaporates.
| fendy3002 wrote:
| Which happen in world of javascript. It has null and
| undefined. Null variable is a variable / property that's
| defined and set with null, while undefined is a variable
| / property that's not defined yet (or maybe not
| initialized, I forgot).
|
| I hope at least SQL will give another comparison
| operation, such as the current equal sign (=) means
| a.prop is not null && b.prop is not null && a.prop equal
| b.prop (for not defined yet).
|
| Let's say another sign like a.prop *= b.prop, meaning
| a.prop is null && b.prop is null || a.prop = b.prop (for
| our intention is both value are empty).
| pdw wrote:
| Such a comparison operator exists: IS (NOT) DISTINCT
| FROM.
| jhgb wrote:
| > It doesn't just mean "unknown". It can also mean "no
| value".
|
| Isn't the canonical representation of _known_ -no-value
| an absence of a tuple? Like as opposed to saying "There
| exists an employee X who works in department NULL", you
| simply don't make any claim about employee X working in a
| department? After all, when enumerating the members of a
| set, you're also omitting the enumeration of _non_
| -members of a set, and the law of excluded middle
| applies.
| Tostino wrote:
| Nulls will reappear in your queries due to joins even if
| none exists in your data model.
| jhgb wrote:
| Ah yes, they're the curse of SQL. Still doesn't seem to
| be a reason to not squash them out as much as feasible
| within the constraints of SQL.
| [deleted]
| richdougherty wrote:
| It's probably a good default because it's more consistent
| with how NULL equality is handled in SQL generally. From the
| article:
|
| > In Postgres 14 and prior, unique constraints treated NULL
| values as not equal to other NULL values.
|
| ...
|
| > This is consistent with the SQL Standard handling of NULL
| in general, where NULL is unknown. It is impossible to
| determine if one unknown is equal to another unknown. Because
| NULL values are of unknown equality to one another, they do
| not violate UNIQUE constraints.
| zeroimpl wrote:
| Your last statement would be better written as: Because
| NULL values are of unknown equality, they may or may not
| violate UNIQUE constraints.
|
| The safest implementation would assume that they do violate
| the constraint, rather than the current behavior assuming
| they don't.
|
| In practice I don't think I've ever added a unique index on
| a nullable column where null might imply unknown. I have
| used it in cases where null meant "none".
| dtech wrote:
| Not OP, but this constraint means "the data must either be
| empty or unique", which is an extremely common constraint. In
| contrast, I've never encountered "only 1 entry is allowed not
| to have the data, all others must be unique".
| enkrs wrote:
| "only 1 entry is allowed not to have the data, all others
| must be unique" makes sense in larger schemas as the
| database grows. It replaces is_default_address,
| is_primary_child etc. fields and constraints on relational
| tables. In the perfect world all the data would be
| normalized not to have such columns and cases, but in real
| life it just grows that way. So for some cases NULLS NOT
| DISTINCT will be a welcome addition.
| chris_wot wrote:
| Well, if NULL is most often meant to signify an unknown
| value. You can't compare one unknown value with another and
| say they are the same.
|
| The standard is, I believe, ambiguous about NULL because it
| can also mean that it is an absence of data. In other words,
| it is "this is not yet determined" (aka a CAR table has a
| COLOR table, but the paint is applied as the final operation
| so NULL could be used as the car is not at this stage yet and
| the color will be chosen later). In this case, you can
| compare NULLs against other NULLs, because you can have two
| cars that are in an indeterminant state. In that case, it is
| useful to compare to NULL values as it means the same thing.
| undecisive wrote:
| I think it's worth pointing out what the opposite means - to
| say that "this field might have a value, or it might be null
| - but only one tuple/row can set this field to be null"
| implies that you have given the null option some kind of
| real-world value.
|
| That is to say, you are using null to encode some kind of
| meaning, when really this is not what null is supposed to be
| used for.
|
| That's not to say I think we should be morally disapproving
| of people who do that - I use things for their unintended
| purpose all the time, and it bugs me when people get on a
| high horse. Use what makes sense to you - and I love this
| change for exactly that reason.
|
| But the general theoretical approach is that if you want to
| care about the value in a field, you need to give it a value.
| Null is for the valueless, and if that isn't an allowed
| state, you should simply set the field as not-nullable.
| Theory is fine in theory. For a practical database that
| exists in the real world, this option is a good addition.
| nightski wrote:
| I think a lot of the trouble stems from the fact that
| databases do not have sum types and there is no way to
| encode a "None" type without hacks. NULL is the only
| reasonable option in a lot of cases.
| wongarsu wrote:
| Yeah, in hindsight the world would be a slightly better
| place if SQL didn't include NULL, but instead "UNKNOWN"
| and "NONE", with "NONE = NONE" being true, and "UNKNOWN =
| UNKNOWN" being unknown.
| kzrdude wrote:
| When would NONE show up and when would UNKNOWN show up?
| wtetzner wrote:
| NONE would mean "there is no value", and UNKNOWN "there
| might be a value, but I don't know what it is".
| stouset wrote:
| Or have an Option (or Maybe) type. This is how modern
| programming languages solve the nil issue, I don't see a
| reason why a database couldn't take the same approach.
| eins1234 wrote:
| Fauna has an interesting approach to null that I've grown
| to like:
|
| Null is a real value that can be compared intuitively like
| any other value (`Equals(null, null)` returns true). [1]
|
| However, in indexes, any term that evaluates to null is
| simply not stored in the index. So we can create a unique
| index with multiple null values because they simply won't
| exist in the index, so won't violate the unique constraint.
|
| If we do care about null values in a particular index, we
| can handle that by mapping the null value into another
| value that will get stored in indexes using a "binding",
| and then use that index to query for or constrain
| uniqueness by that mapped value [2].
|
| This is not the most convenient thing in the world, but at
| the end of the day, it feels like an edge case, so I'm
| happy with having to jump through some hoops to handle it
| in exchange for making the much more common cases (null
| comparisons and unique indexes with nullable values) more
| intuitive and less error prone.
|
| [1] https://fauna.com/blog/understanding-nothing-or-null-
| in-faun...
|
| [2] https://docs.fauna.com/fauna/current/learn/cookbook/fql
| /sear...
| thijsvandien wrote:
| > Personally, I think that the existing PostgreSQL behavior
| (NULLS DISTINCT) is the "right" one
|
| I find this to be at odds with IS DISTINCT FROM, which is false
| for two NULLs.
| rst wrote:
| So, we have a "standard" which says that both behaviors should
| be supported, and there should be two different ways to express
| them, but explicitly _declines_ to say which does what?
|
| I'm genuinely not sure what purpose this particular standard
| serves, but it looks like it's not allowing people to write
| portable code...
| anamexis wrote:
| I can't imagine the standard does not specify that NULLS
| DISTINCT should treat nulls as distinct and NULLS NOT
| DISTINCT should treat nulls as not distinct.
| rst wrote:
| Ah. So there are _three_ ways to write it, and explicit
| ways to call for either behavior. That 's sane -- just
| wasn't what I got from the original writeup.
| mjw1007 wrote:
| No, it's not as bad as that.
|
| The thing that's implementation defined is what happens if
| you don't explictly select either option.
| woevdbz wrote:
| > Personally, I think that the existing PostgreSQL behavior
| (NULLS DISTINCT) is the "right" one
|
| SQL is old enough and this debate so unsettled still that I
| think it should be clear there isn't a categorically right
| behavior here anymore than there is a clear winner between tabs
| and spaces.
|
| As an example of the irreconciliable weirdness of NULL,
| consider that "NULL = NULL" is false, and so is "NULL != NULL",
| while rows with NULL still group together in GROUP BY.
|
| I appreciate you giving folks the option.
| bobbylarrybobby wrote:
| Er, sorry, but I thought that the answer to tabs vs spaces
| was use tabs for semantic/logical indent, use spaces for
| physical/alignment indent.
| jon_richards wrote:
| Yes, mixing invisible characters will solve all our
| problems.
| anamexis wrote:
| > As an example of the irreconciliable weirdness of NULL,
| consider that "NULL = NULL" is false, and so is "NULL != NULL
|
| This isn't quite true - these comparisons don't evaluate to
| false, they evaluate to NULL.
|
| I intuitively think of NULL as "unable to compute," a
| generalized NaN.
| Pxtl wrote:
| > This isn't quite true - these comparisons don't evaluate
| to false, they evaluate to NULL.
|
| Which large parts of SQL treat as equivalent to false.
| WITH vars AS ( SELECT CAST(null AS INT) as X
| ) SELECT 'yes this is a tautology' as "is this is a
| tautology" FROM vars WHERE X = X
|
| The above query returns nothing, not an error because the
| value of X is unknowable. In a sane language, X=X and so it
| would yield "yes this is a tautology". SQL standard wants
| it both ways - it wants to say "boolean null isn't false"
| except when for its own convenience it decides that yes,
| null is false.
|
| The truth, imho, is that SQL has an archaic type system. In
| this day and age it should be offering algebraic types
| where we can implement our own "empty" values and the like
| that may not use three-value logic as appropriate, so that
| I can say that for example End Date is _blank_ instead of
| either unknown or Date.MaxValue.
| anamexis wrote:
| > SQL standard wants it both ways - it wants to say
| "boolean null isn't false" except when for its own
| convenience it decides that yes, null is false.
|
| I don't see it that way. Null cast to boolean is _always_
| false. Null equality is undefined. There is no
| contradiction there.
| saltcured wrote:
| I think that this topic gets tripped up with people
| wanting to assume boolean means two possible states...
| NULL adds an extra state to every type much like the
| "bottom" type/value in some functional programming
| languages or type systems.
|
| It's not the `=` operator that is special or significant
| here. It is the `WHERE` clause semantics, which are
| defined as returning rows when the boolean expression is
| true. This treats both false and null cases as the other
| possible outcome, when rows are not returned. Not
| understanding this distinction is also where people write
| buggy queries using WHERE NOT ..., because like equality
| the NOT operator will return NULL for a NULL input (so
| NULL is not like false and does not negate to true).
|
| SELECT 'hidden by null result' WHERE CAST(NULL AS
| boolean);
|
| SELECT 'also hidden by null result' WHERE NOT CAST(NULL
| AS boolean);
|
| SELECT 'shown by true result' WHERE CAST(NULL AS boolean)
| IS NULL;
|
| SELECT 'hidden by false result' WHERE NOT (CAST(NULL AS
| boolean) IS NULL);
|
| The existing GROUP BY behavior is different than WHERE
| because it does not have a special rule for handling true
| values. It just returns a group for each distinct value.
|
| The prior (new default) UNIQUE constraint behavior made
| sense when you think of NULL as this bottom value and
| meaning "unknown", while also thinking that a uniqueness
| constraint is only applied to complete keys. The check is
| effectively deferred, and the partial or incomplete tuple
| might become valid and unique once its missing part(s)
| are filled in with real values. The new optional behavior
| makes more sense if you think that the uniqueness
| constraint enforces distinct records even with partial
| keys, just as the GROUP BY clause will determine distinct
| groups.
|
| There is also a similar issue with FOREIGN KEY
| constraints. The default MATCH SIMPLE behavior is
| consistent with the default UNIQUE constraint behavior.
| An incomplete foreign key is essentially ignored and not
| subject to integrity checks. The MATCH FULL behavior is
| almost a style checker to say that a compound foreign key
| should be all null or all non-null, while still ignoring
| incomplete foreign keys. As far as I know, the MATCH
| PARTIAL case is not implemented, but I think it might
| complement this new UNIQUE NULLS NOT DISTINCT behavior?
|
| One can imagine a further branching point in behaviors
| for foreign keys and I'm not sure how MATCH PARTIAL will
| be defined. One mode might be easily implementable with
| the new index method, if we want partial foreign keys to
| be able to reference partial keys in the referred table,
| i.e. the partial key with NULLs is considered an actual
| row key. But we might also imagine another partial
| matching mode where the the non-null components of a
| compound foreign key have to all exist in at least one
| key in the referred table, while ignoring the null fields
| as incompletely formed references. This would be an
| improvement on MATCH SIMPLE which allows the non-null
| columns in a foreign key to hold values which could not
| possibly match the referring table no matter which values
| are filled in for the remaining null fields of the
| foreign key. However, this enforcement would not be
| easily implemented without a powerset of indexes for all
| partial combinations allowed in the referencing tables!
| paulryanrogers wrote:
| This is a good way to remember the SQL behavior. Though I
| do hope it'll change in the future. So many other languages
| interpret null as a comparable value, and it will be less
| surprising to folks coming from those.
| magicalhippo wrote:
| > these comparisons don't evaluate to false, they evaluate
| to NULL
|
| In the databases I've used, such as MSSQL[1], they evaluate
| to UNKNOWN.
|
| [1]: https://docs.microsoft.com/en-us/sql/t-sql/language-
| elements...
| bfgoodrich wrote:
| The document you link uses unknown as a synonym for null.
| If you inserted the result of such a comparison into a
| table, the value inserted would be NULL.
| magicalhippo wrote:
| I'm self-taught, could you show how to insert the result
| of such a comparison into a table in mssql? I can only
| think of using CASE or similar, which would mean no
| direct conversion.
| orf wrote:
| insert into table (c) values (null != null)
| ratsmack wrote:
| It's good to see that everyone agrees on the definition
| of NULL.
| DaiPlusPlus wrote:
| Yes: a billion-dollar mistake.
| erichocean wrote:
| And with RDBMSs, a multi-trillion dollar success!
| bazoom42 wrote:
| This is s misunderstandig of Hoares "billion dollar
| mistake". The mistake was not the existence of nulls, the
| mistake was type systems where every reference could be
| null.
|
| Sql does not have this problem.
| [deleted]
| qez wrote:
| > anymore than there is a clear winner between tabs and
| spaces.
|
| But there is a clear winner here. Tabs won, almost all code
| files use spaces. Of course everyone uses the tab key on
| their keyboard, and the IDE makes spaces for you; no one is
| going to manually press space 4 times.
| spacemanmatt wrote:
| As a savage, I use 4 tabs now.
| tingletech wrote:
| I hit space 4 times...
| jjoonathan wrote:
| ...which is clear evidence that hitting space 4 times is
| the wrong thing to do :P
| bornfreddy wrote:
| There is no winner, both tabs snd spaces are widely used.
| However with proliferation of better tools the question
| thankfully became mostly irrelevant. I don't care if it is
| spaces or tabs, as long as it is consistent across codebase
| (which CI enforces with autoformatters and linters) and as
| long as my editor (which obeys project's editorconfig) uses
| the same convention.
| masklinn wrote:
| > SQL is old enough and this debate so unsettled
|
| Is the debate really unsettled?
|
| MSSQL considers nulls to be equal, everyone else considers
| them to be distinct (aside from a few DBs which just refuse
| to create UNIQUE indexes on nullable columns), and Oracle is
| apparently huffling glue in a corner.
|
| > As an example of the irreconciliable weirdness of NULL,
| consider that "NULL = NULL" is false, and so is "NULL !=
| NULL", while rows with NULL still group together in GROUP BY.
|
| Group by is the exception here. And arguably it is so because
| the alternative is never useful, or close enough as to not be
| worth it.
| feanaro wrote:
| So of the three major SQL implementations, you have one of
| them following the results of the settled debate, one of
| them being an exception and one of them huffling glue
| (whatever that means). In addition, there's a language
| exception even in the settled group, which you have to
| handwave away.
|
| Sounds pretty unsettled to me.
| irrational wrote:
| Oracle considers null to be equal to empty string.
| mike_hock wrote:
| That is truly huffing glue.
| irrational wrote:
| And oracle considers null to be equal to empty string. My
| understanding is oracle was created before the sql standard
| existed and they don't want to change because of backwards
| comparability. I don't understand why they can't add a db
| parameter to allow you to set how nulls are handled.
| to11mtm wrote:
| IDK if you ever saw the HN thread where they talked about
| oracle and change tests from a few years back, but if I
| had to guess, the pain just isn't worth the overall cost.
|
| Overall I remember running into this but finding it
| fairly simple to add a couple specific checks around the
| condition where it came up in our code.
| layer8 wrote:
| They would need to change their storage format to be able
| to distinguish between NULL and empty string, and
| probably a lot of in-memory structures and even more code
| logic as well. It would be hugely expensive to support
| both.
|
| Oracle introduced VARCHAR2 in the 90s(?) to mean "Oracle
| semantics" as opposed to VARCHAR which is documented as
| "may change to standard SQL semantics in a future
| version". However I don't think they'll ever materialize
| that distinction.
| nicoburns wrote:
| A slight tangent, but do you have any pointers on how one might
| get involved in either:
|
| - The SQL standardisation process (or whether this is even
| feasible as someone who isn't involved in the development of a
| major database engine)
|
| - Postgres development
|
| The feature I am particularly keen to get accepted is trailing
| commas in SELECT column lists (potentially other places too,
| but SELECT lists would be a very good start). And there are
| potentially a bunch of other improvements in the syntax sugar
| department that I might be interested in spearheading if I
| could successfully get this accepted (e.g. being able to refer
| to select list column aliases in GROUP BY clauses).
|
| I don't have much experience with C, but I would potentially be
| up for actually implementing the change if I could get buy in
| from the maintainers.
| gurjeet wrote:
| Postgres project is always looking for contributors. The
| project has a very nice, detailed, well-defined documentation
| on how to contribute [1]. The community is very welcoming,
| and has a process (CommitFest[2]) in place to ensure all
| submissions get their due attention.
|
| FWIW, I am starting to make an effort (e.g. [3]) towards
| helping the newcomers get their work in shape to be
| acceptable by the committers.
|
| Note: I'm biased towards Postgres community, since I've
| worked with them for many years. So others' opinion _may_
| differ, but it's highly unlikely.
|
| [1]: https://wiki.postgresql.org/wiki/Development_information
| [2]: https://wiki.postgresql.org/wiki/Development_information
| #Com... [3]: https://www.postgresql.org/message-
| id/CABwTF4Us8WGMffNGTNY1X...
| orthoxerox wrote:
| > The feature I am particularly keen to get accepted is
| trailing commas in SELECT column lists
|
| Not "GROUP BY THE OBVIOUS LIST OF EXPRESSIONS, YOU KNOW, THE
| ONES FROM THE SELECT CLAUSE THAT DON'T CONTAIN AGGREGATE
| FUNCTIONS"? Aka "GROUP BY ALL" from DuckDB.
| Justsignedup wrote:
| Currently nulls are considered not distinct in postgres, no?
| Doesn't this mean anyone upgrading will have to fix up all
| their table definitions? Or am I wrong about this?
|
| I just care about the backwards compat. To be clear, the second
| I saw this I thought "finally!!!"
| irrational wrote:
| This is amazing. We moved from Oracle to Postgres after being on
| Oracle for 15+ years. That was a huge move that took 2 years. One
| of the biggest hurdles was rewriting thousands of sql queries.
| One of the biggest things we had to change was accounting for how
| oracle handled nulls versus how Postgres handled nulls. However,
| we missed this nuance about how Postgres treated nulls and
| unique. It wasn't until about 2 years into using Postgres, and
| running into so many issues, that we finally stumbled upon this
| nuance in the documentation. Crap. That required making a bunch
| of more changes to account for this. At this point I'm firmly
| convinced that null handling is the hardest part of database work
| ;-)
| jeff-davis wrote:
| In any thread involving SQL NULL, I see a lot of not-quite-right
| explanations of what SQL NULL is, conceptually. I challenge
| anyone who feels like they understand NULL conceptually to
| explain the following query: -- find orders
| with a total less than $10000 select order_id, sum(price)
| from orders o left join order_lineitems l using (order_id)
| group by order_id having sum(price) < 10000;
|
| This query is actually incorrect. Orders with no line items at
| all are clearly less than $10000, but they will be excluded
| because: first, the left outer join produces a NULL for the
| price; second, the group aggregation with SUM over that NULL will
| result in NULL; and third, the HAVING clause treats that NULL as
| false-like and excludes the order from the result.
|
| Of course, we can explain _procedurally_ what 's happening here,
| and each individual step makes some sense. But the end result has
| no conceptual integrity.
|
| Extra challenge: explain why using COUNT instead of SUM in the
| query does correctly return orders with fewer than 4 items:
| -- find orders with fewer than 4 line items select
| order_id, count(price) from orders o left join
| order_lineitems l using (order_id) group by order_id
| having count(price) < 4;
|
| PS: thank you to the author for a developer-friendly feature that
| adds flexibility here!
| hagope wrote:
| This is cool, but wouldn't creating a constraint using a nullable
| column be considered a poor design decision? In which scenarios
| would this be a good idea?
| cptn_badass wrote:
| When an entry can belong to 0 or 1 related object only. Not
| that I'd put a constraint in such scenario, but I imagine a
| User can optionally have a Subscription, so subscription_id is
| either nil or present, and said subscription cannot be
| associated to any other User.
| hagope wrote:
| in that scenario why not use foreign key? any advantage to
| using constraint?
| Pxtl wrote:
| You do both.
|
| The foreign key only gaurantees that the other entity
| exists.
|
| The unique constraint ensures that only one pair of
| entities has this relationship, preventing a one-to-many
| binding.
|
| The distinctness of NULL allows you to have multiple
| entities with the same NULL value without violating the
| above UNIQUE constraint.
|
| The "NULL is empty" vs "NULL is unknown" is a series of
| trade-offs of labor-saving. Imho, the wrong trade-offs were
| made, but once the choice is made it makes sense to
| continue and be consistent with it. I'd rather be
| consistently wrong than inconsistently right.
| hagope wrote:
| If the entity doesn't exist, wouldn't it violate the FK
| and therefore no need for the nullable constraint?
| ahmed_ds wrote:
| When something that fundamental changes what is the cascading
| effect? Do other functions refer UNIQUE/NULL or do they have
| their own versions of UNIQUE/NULL like code?
| masklinn wrote:
| There is no fundamental change, this is an opt-in on a per-
| index basis.
|
| It does not change null's identity, it only allows indexes to
| treat nulls as identical.
| ryanbrunner wrote:
| This is great! I have multiple places in our code where I've had
| to define multiple unique indexes (with conditionals) to get
| around this. It's definitely confusing behaviour the first time
| you encounter it.
|
| Honestly I think SQL already breaks it's own standard of "NULL =
| unknown". One example: If you do a LEFT OUTER JOIN and the joined
| data is missing, you get NULL in the resultset. But in that case,
| it's not "unknown" whether there are records that can be joined
| to, the answer is that there are definitively no records there.
|
| NULL is univerally considered by most programming languages to
| equal "not present" vs "unknown", and in many cases by SQL in
| practice. The ship has probably sailed on actually changing the
| default behaviour, but optional flags like these are a good step.
| knorker wrote:
| NULL means "I don't have this data". It doesn't guarantee that
| the data doesn't exist anywhere.
|
| It's semantically ambiguous, yes.
|
| But it's not ambiguous that the _database_ doesn 't have the
| data. It's very clearly NULL.
|
| If "middle name" is NULL, then the database itself can't tell
| you if a person doesn't have a middle name, or if it's merely
| not yet been told the middle name. You could differentiate the
| two by saying empty string for "has no middle name"... except
| on Oracle VARCHAR2, where NULL is equal to empty string.
| Pxtl wrote:
| > You could differentiate the two by saying empty string for
| "has no middle name"... except on Oracle VARCHAR2, where NULL
| is equal to empty string.
|
| This is what kills me.
|
| This seems like an obvious case where multiple flavors of
| NULL would make perfect sense. For example
| Ticket.CancellationDate isn't "unknown", it's _this ticket is
| not cancelled_. But SQL standard doesn 't have a way to
| express "empty value" as distinct from "unknown value", so
| we're stuck working around the strange implications of using
| this square peg to fill a round hole.
|
| Oracle's bizarre "empty VARCHAR2s are NULL" is strange but is
| consistent with every other datatype that offers no way to
| say "empty" but "NULL". 0 is not the same as NULL int and is
| not the same as empty int. Mindate is not the same as NULL
| date and is not the same as empty date.
| knorker wrote:
| Well, for the string case empty string is... the empty
| string.
|
| Storing the exit code and stdout of a program run Oracle
| allows the state "it's not finished yet" to have exit code
| representation of NULL, but you cannot say the same for
| string.
|
| "Unknown" is not a perfect map to NULL. "I have no answer
| for you" is better.
|
| Maybe NULL values could use an annotation.
|
| But I think that would make things worse. Like let's say
| you have some query that joins addresses with coordinates.
| _You_ know that semantically any address without a
| coordinate has an "unknown" coordinate, not that it
| authoritatively doesn't have coordinates.
|
| But not only do you need to change the data model then, but
| the query language too, to be able to express the
| difference where absense means one or the other.
|
| Now absense is NULL, and leaves it up to the application to
| interpret.
|
| SELECT a.address, b.coord FROM a LEFT OUTER JOIN (SELECT
| address,coord FROM coords WHERE is_test=FALSE) AS b ON
| a.address=b.address;
|
| Or maybe some coordinates are missing because their
| locations no longer exist. The house has been demolished,
| so it's not "unknown" where the house was, it's just that
| it no longer has a location. And how would you join that
| data.
|
| I don't think I've encountered a problem in practice with
| SQL in differentiating empty from unknown. It's out of
| scope of what SQL tries to solve, and that's fine.
| ryanbrunner wrote:
| That's fair - as I said in another comment, it breaks down
| with keys. I can't think of a great way to model "this row
| definitively does not have whatever is referenced in another
| table". Using anything other than NULL for that means you
| can't use proper foreign keys.
| SnowHill9902 wrote:
| NULL means that the value is unknown, not that the data is
| unknown. You indeed have no data but its value may exist. You
| just don't know it yet. NULL is very powerful when you
| understand that.
| woevdbz wrote:
| It is mostly true but not always, eg in the output of GROUP
| BY ROLLUP, a NULL value is used to indicate rows representing
| subtotals / partial aggregates
| ryanbrunner wrote:
| That seems like a fairly big assumption on how I'm modelling
| the data though. Particularly in the LEFT JOIN example, I'm
| not sure if I agree with the idea of "a row not being present
| means that it's unknown whether it exists in the real world"
| depending on the underlying content being modelled.
|
| For a string column, you do have blanks vs. NULLs, so it's
| fairly easy to distinguish between "John definitely doesn't
| have a middle name" vs "It's unknown whether John has a
| middle name", but that doesn't hold true with most other
| datatypes.
| SnowHill9902 wrote:
| The string '' is as arbitrarily blank as 0 is a blank
| integer.
| wongarsu wrote:
| There are two "blank" integers: 0 is the identity element
| under addition, 1 is the identity element under
| multiplication. Or worded differently 0 is a blank
| addend, 1 is a blank multiplier.
|
| '' would be the identity element for string
| concatenation, or any other string manipulation method I
| can think of, so it has a stronger claim on being blank
| than a 0 has.
|
| Unless your string is later parsed as something else. A
| blank JSON value would look differently from a blank
| phone number.
| SnowHill9902 wrote:
| Sure, but it's arbitrary from a data structure standpoint
| and sloppy design.
| ryanbrunner wrote:
| SQL gives you no other options though, there's no other
| construct that means "not present", since NULL is unknown
| vs. not present.
| SnowHill9902 wrote:
| It depends on your column. If you have a products table
| and a certain handmade product has no upc, it's correct
| to use '' as its value rather than NULL because you know
| for a fact that its non-existent. Some people respect
| NULL too much.
| tzahifadida wrote:
| wow thanks!
| pbz wrote:
| Since we have some PG devs here: Can we please have a way to
| reorder columns?
|
| Coming from MySQL, this is one of the first missing features one
| hits. It's like moving to a new house where you're told you can
| never clean or move the furniture without moving to a new house.
|
| It leaves the unfair impression that this is a "toy" db. "What
| other basic features is it missing if it doesn't have this?"
| Please don't think of it as trivial; first impression are very
| important.
| urthor wrote:
| There's like 5 different features wants I could add like this.
| A columnar database engine comes to mind ( _slightly_ more work
| involved) and poaching a few Sqllite QOL features.
|
| Ultimately Postgres is a community driven product, and people
| want to work on what they want the work on.
|
| I don't get angry that some kind soul hasn't volunteered their
| Saturday for free for me.
| pbz wrote:
| Oh, absolutely; I'm just trying to raise the
| awareness/importance of this missing feature. Most devs that
| I know IRL that looked at PG had a very similar reaction to
| mine.
| [deleted]
| mixmastamyk wrote:
| I'm receptive to the feature. But wouldn't it require
| locking the full table as it works? Seems like it would
| hurt availability of the biggest databases that would
| benefit most. Perhaps this is why it is thought somewhat
| impractical and therefore low-priority.
| DangitBobby wrote:
| That's not exactly true. You can write your own patch for a
| feature and there's a good chance they'll decide they don't
| want it. It's their prerogative, but also slightly upsetting
| when you know they're blocking features you want. Just
| observe the bike shedding over a trailing commas patch [1].
|
| 1. https://postgrespro.com/list/thread-id/1853280
| TheP1000 wrote:
| There are lots of good reasons in that thread why the patch
| was declined.
|
| The patch made things more lenient which makes things less
| compatible (this whole thread is about ANSI SQL standards
| and they do matter) and changes what was an error behavior
| to now silently succeed.
|
| +1 to postgres devs for curating patches.
| SnowHill9902 wrote:
| What do you gain from that other than soothing your OCD when
| executing \d t (?)
| iruoy wrote:
| I like to have my `*_id` columns at the front, and
| `created_at` etc. at the back. And in the middle all the
| fields by descending importance. Just a personal habit so I
| can quickly lookup data in a table.
| pbz wrote:
| Exactly... You gain so much speed by having everything in
| the right order.
| sophacles wrote:
| This seems pretty voodoo to me. Do you have any examples
| that show it?
| pbz wrote:
| I'm not sure how I could show if something is
| easier/harder to work with. Speed in this context refers
| to human processing speed, not database speed.
| ggregoire wrote:
| I'm totally with you on this, and would love to be able to
| directly reorder the columns in the tables.
|
| A possible workaround is to have a view for every table
| (which I often read is a good practice anyway) and order
| the columns as you want in the views.
| pbz wrote:
| Soothing my OCD is important, but it also helps when working
| in a team and having a shared ordering style. For example, we
| have certain types of columns that are always at the bottom.
| One would look there first. With PG you're forced to scan all
| the columns every time.
| 9dev wrote:
| This is why I order anything alphabetically, always.
| Properties, methods, columns... anything. Saves you the
| trouble of custom conventions.
| pbz wrote:
| The trouble is when you need to add a new column you can
| only add it at the end, so you'd lose the alphabetical
| ordering.
| MichaelApproved wrote:
| Based on my understanding of DB storage (which is decades old
| and as I write the comment my explanation seems stupid but
| it's what I was taught. Please someone correct me).
|
| Image a table with the following:
|
| id (int), title (varchar), created_date (date)
|
| That data is stored in a similar order on disk (is this still
| true?).
|
| So, if the initial insert has a title of 10 characters, then
| the date will get placed after those 10 characters on disk:
|
| Int, 10 characters, date
|
| Later, the row is updated with a longer title of 500
| characters.
|
| Because it's longer, you now have to get more space for the
| title on another part of the disk.
|
| Since there are columns after the title, it'll either leave
| dead space where the old 10 character title was or it'll need
| to move those other columns too.
|
| I can't remember which I was taught. I just remember that
| columns after variable columns get impacted when the data in
| a preceding variable column changes.
|
| If the title were at the end of the row, the db could expand
| the data without needing to move other columns over (if there
| happened to be available space right there).
|
| If you were updating with a shorter value, it could shorten
| the row by moving the terminating character to the new
| shorter location, freeing up the space the longer title was
| using.
|
| Bottom line is you want to keep variable columns which get
| updated most frequently towards the end of your table
| structure.
|
| Knowing which column that would be isn't always clear when
| creating the table.
|
| Again, this understanding was taught to me over 20 years ago.
| I'm probably remembering parts of it wrong and DB storage has
| likely (hopefully) advanced since then.
| jasonwatkinspdx wrote:
| DB storage is a lot more sophisticated than what you were
| taught.
|
| Most databases use Slotted Pages to organize storage. Pages
| are fixed size and numbered by their offset within the
| database file. The page header contains the number of rows,
| followed by an array of offsets for individual rows within
| the page. Rows themselves generally are stored at the end
| of the page filling downwards. The storage engine can move
| around rows in arbitrary ways to consolidate free space.
|
| Fundamentally there's no connection between SQL schema
| order and how table storage is organized on disk. For
| example in a column store there's often no contiguous row
| stored anywhere, instead there's just separate indexes per
| column.
| lsaferite wrote:
| That would seem like an argument in favor of allowing
| column reordering.
| jhgb wrote:
| > That data is stored in a similar order on disk (is this
| still true?).
|
| There should be no requirement for this. Columns in
| relations are not conceptually ordered, so it shouldn't
| matter for the things you're doing with the data anyway,
| and the database should be able to reorder the data in
| whatever way it likes, since desire to isolate the user
| from physical data structures was one of the main reasons
| for the rise of RDBMS.
| MichaelApproved wrote:
| > _the database should be able to reorder the data in
| whatever way it likes_
|
| My point was that you're trying to prevent the DB from
| reordering the data because there's a performance cost
| when that happens.
| jhgb wrote:
| Why would you be trying to _prevent_ the DB from
| reordering the data? You 're not supposed to have better
| knowledge of what's good for your use case than an RDBMS
| that can collect usage statistics on queries and such.
| Ditto for compilers rearranging structures and such. When
| you start having hundreds of tables and thousands of
| queries, I don't see how you can do a better job than an
| automated system at that point.
| MichaelApproved wrote:
| > Why would you be trying to prevent the DB from
| reordering the data?
|
| Sure. "Reduce the need" would be a better word than
| "prevent".
|
| If I can do a good job organizing the table columns (as
| described above) it'll lower the need for the DB to
| reorder data.
|
| Reduced need to reorder, improves performance.
| masklinn wrote:
| 1. That's not exactly true, AFAIK all RDBMS return
| columns in table order when order is unspecified (`*`),
| and while they could reorder on retrieval
|
| 2. postgres definitely does not, and column tetris is
| absolutely a thing in the same way struct packing is
| (with the additional complexity of variable-size columns)
| jhgb wrote:
| But the order of attributes in the presentation of the
| result relation has nothing to do with physical layout of
| base relvars -- primarily because any relationship
| between the two is purely coincidental. The vast majority
| of useful queries will _not_ reuse the order of
| attributes in base relvars, so optimizing for the unusual
| trivial case by prohibiting better rearrangements that
| could be useful for a much larger number of use cases
| seems rather pointless.
|
| And what PostgreSQL does is of course an implementation
| detail of PostgreSQL.
| fabian2k wrote:
| There is some small benefit to playing column tetris if you
| have columns with different sizes that waste space due to
| padding. I'm not convinced this would be worth the complexity
| of this feature, but in some cases reordering columns might
| have measurable benefits in reducing the size of the data.
| pbz wrote:
| I'm not talking about the physical layout of the data, just
| a thin UI layer that the DB tool could use. Maybe we could
| have two modes: physical vs UI ordering?
| andruby wrote:
| As mentioned by another comment: you can use views for
| that.
| pbz wrote:
| Views are not a good solution for this. The point is to
| be able see a specific order in 3rd party apps and when
| writing quick add-hoc queries (select *). Writing views
| for every table would just pollute the db.
| Tostino wrote:
| Ideally, postgres would play column Tetris behind the scenes
| and store the columns on disk in the most appropriate way,
| while allowing the representation to be changed at will.
| pbz wrote:
| Yeah with maybe an option to manually optimize (that would
| rebuild the table if needed).
| dboreham wrote:
| > Please don't think of it as trivial
|
| I've worked with relational databases for 20+ years. This is
| the very first time I heard of this.
| pbz wrote:
| I've worked with DBs for 20+ years as well. This is a quality
| of life type of improvement. If you've worked mainly with DBs
| that don't make this easy it's hard to know what you're
| missing. Do a search for column reordering for PG and you'll
| get a ton of hits.
| hans_castorp wrote:
| > It leaves the unfair impression that this is a "toy" db.
|
| So you consider Oracle, SQL Server and DB2 also to be "toy"
| databases?
| pbz wrote:
| With SQL Server the management tool does give you a way to do
| this. Yes, it does a table rebuild behind the scenes. The
| point is that it's easy. Don't have experience with the other
| two, but MySQL is the most popular so it kinda sets the tone
| whether we like it or not.
| hans_castorp wrote:
| Well, writing a procedure that rebuilds the complete table
| in Postgres or Oracle is easy as well. I never needed this,
| but I am sure, there are some sample implementations out
| there.
|
| Rebuilding the entire table doesn't seem feasible for large
| tables to begin with. Especially with a lot of incoming and
| outgoing foreign keys.
|
| I disagree that MySQL is the most "popular".
|
| It might be the "most used" one because of so many web
| hosting services included it for ages by default.
| jl6 wrote:
| Tangential, but there are almost certainly more SQLite
| databases in existence than every other RDBMS put
| together, probably by 3 or 4 orders of magnitude.
|
| It doesn't support column reordering either.
| jl6 wrote:
| Better to channel your displeasure into advocating for a future
| SQL standard to include this feature. Other DBs that support
| this do so via proprietary syntax extensions.
| masklinn wrote:
| Technically it doesn't require syntactic extensions, postgres
| stores the column index as the `attnum` attribute column of
| the `pg_catalog.pg_attributes` system table.
|
| So this could be made to work by hooking into the storage
| system and rewriting the table and all pointers to the table
| when `attnum` is updated.
|
| Without this rewriting this only works if the table was just
| created and has no data, and no external metadata referencing
| the columns themselves e.g. views, fks, indexes, defaults,
| rules, ...
|
| An alternative would be to add automatic packing (a la rustc)
| to postgres, decorrelating the "table position" and the
| "physical position" of the rows, this would also allow free
| "table position" reordering.
|
| And while it's by far the most complex option, one of the
| nice bits with it would be that the system columns could be
| packed as well. Currently there's quite a bit of waste
| because there are 6 system columns (by default), the first 5
| are 4 bytes, but the 6th (ctid) is 6 bytes, meaning 2 bytes
| of padding if your first column is a SERIAL, and 6 if the
| first column is a BIGSERIAL (or an other double-aligned
| column).
| anarazel wrote:
| > And while it's by far the most complex option, one of the
| nice bits with it would be that the system columns could be
| packed as well. Currently there's quite a bit of waste
| because there are 6 system columns (by default), the first
| 5 are 4 bytes, but the 6th (ctid) is 6 bytes, meaning 2
| bytes of padding if your first column is a SERIAL, and 6 if
| the first column is a BIGSERIAL (or an other double-aligned
| column).
|
| FWIW, system columns aren't stored as normal columns in
| tuples. Some of them are implied (e.g. tableoid doesn't
| need to be stored in each tuple, ctid is inferred from
| position), others are not stored in the way normal columns
| are stored (e.g. xmin, xmax).
| RedShift1 wrote:
| Couldn't an ordering column be added to the pg_attributes
| that determines in which way the columns are sorted when
| they are displayed? Standard SQL code can then be used to
| manipulate the display order of the columns.
| [deleted]
| anarazel wrote:
| The problem is that we hear a lot of different features touted
| as "the crucial missing one"...
|
| Anyway, there's been work on this in the past, which recently
| has been picked up again. It's not all that trivial to do well.
| pbz wrote:
| _The problem is that we hear a lot of different features
| touted as "the crucial missing one"_ -- I would definitely
| put this into the polish category, but items in that category
| are also important; especially for those with a MySQL
| background.
|
| _which recently has been picked up again_ -- that 's awesome
| to hear
| silvestrov wrote:
| > This fits with my mental model of how I think unique
| constraints should work with null values.
|
| Agree, this should be the default for unique indexes. Why would
| you ever want multiple NULLs in an unique index?
| mhaberl wrote:
| > Why would you ever want multiple NULLs in an unique index?
|
| You would maybe want some value to be unique if it exists.
| daveoc64 wrote:
| Some kind of secondary identifier, but only for records where
| it is applicable (e.g. loyalty card number).
| phoe-krk wrote:
| The rows (1, NULL), (1, NULL), and (1, NULL) - each NULL
| signifies an absence of a value - these NULLs may actually be
| later filled into (1, 1), (1, 2), and (1, 3). That's why may
| not know if any two NULL values are equal to one another.
|
| In some cases the above is actually important, whereas in
| others you may want to treat NULL as a "value" instead.
| larsnystrom wrote:
| Multiple NULLs in a unique index are useful when an entity has
| 0 or 1 of a simple scalar value. For example, you might have a
| column with an external ID, such as a facebook login id in your
| users table. Not all users have a facebook id, but those who do
| should have a unique value in that column.
| radiospiel wrote:
| You might want to look at a unique partial index: "CREATE
| UNIQUE INDEX foo ON users(facebook_id) WHERE facebook_id IS
| NOT NULL".
|
| I am not sure if this really is a great modelling. If a user
| happens to have two logins in your table (maybe they signed
| up with two different emails), and then connects one of his
| accounts with his FB identity, what does it mean that the
| other entry cannot be connected to the FB identity?
| masklinn wrote:
| Because it's very common for attributes to be optional but
| unique if specified e.g. your fleet management might have a
| 0-1:1 relationship between employees and cars, no car can be
| associated with 2 employees, an employee can't have two cars,
| but any number of employee can be car-less, or car employee-
| less (lending pool, or cars which have not been attributed
| yet).
|
| This also fits perfectly well with the normal interpretation /
| treatment of sql null as "not a value".
| faho wrote:
| But an employee can't be car-less twice!
|
| That's what this change prohibits.
| masklinn wrote:
| > But an employee can't be car-less twice!
|
| What.
|
| > That's what this change prohibits.
|
| This change prohibits nothing, it allows specifying
| something (which was a bit complicated to enforce before).
| faho wrote:
| >This change prohibits nothing, it allows specifying
| something (which was a bit complicated to enforce
| before).
|
| It allows specifying a _constraint_ - that you can 't
| have two rows with the same values even if one of the
| values is a NULL. That's prohibiting duplicate NULLs. The
| change allows you to prohibit duplicate NULLs.
|
| Say you have a table (EmployeeName, CarID). You could do
| a UNIQUE constraint on those two attributes, but that
| would still allow:
|
| EmployeeName | CarID
|
| Jeff | 2
|
| Kim | NULL
|
| Kim | NULL
|
| Here, "Kim" is car-less (NULL in the CarID field) _twice_
| , which makes no sense.
|
| Hence the new constraint.
| masklinn wrote:
| > It allows specifying a constraint - that you can't have
| two rows with the same values even if one of the values
| is a NULL. That's prohibiting duplicate NULLs. The change
| allows you to prohibit duplicate NULLs.
|
| Sure.
|
| > Here, "Kim" is car-less (NULL in the CarID field)
| twice, which makes no sense.
|
| The relation makes no sense to start with, it didn't need
| duplicate nulls for that.
| jhugo wrote:
| Kim being car-less would be represented by Kim not
| appearing in this table at all. The design you've
| presented makes no sense.
| [deleted]
| jamesmstone wrote:
| pre postgres 15 can you solve this with two unique
| constraints ?
|
| unique (EmployeeName), unique (EmployeeName, CarID)
| masklinn wrote:
| You'd most likely solve this by making the CarID NOT NULL
| because there's why would you even have a row with no
| useful data?
|
| And if this is your employee table, you'd have UNIQUE
| (EmployeeData) and UNIQUE (CarID), and it would do what
| you want.
| ryanbrunner wrote:
| That's not quite the same, since the previous solution of
| unique (EmployeeName, CarID) allowed for one employee to
| have 2 separate cars. You have to use conditional indexes
| to solve this.
| Beltalowda wrote:
| It's a 0-1:1 relationship, so you'd have columns like
| "name", "email", "car", etc. on the employees table, and
| "car" might have a unique index (which is probably an ID
| referencing a "cars" table).
|
| Since employees can only have one car and since a car can
| only be used by one employee, an unique index is fine, but
| there may be 20 people that don't have a car so you don't
| want NULL values to be unique.
|
| Of course there are always ways to work around this, but
| this is the general idea.
| seanhunter wrote:
| One way nullable columns are often used is like an option type.
| It's easy to see that you might want rows to be unique if they
| have Some(value) but have multiple rows where the column is
| None
| almog wrote:
| I agree that this is often the use case but depending on how
| the query (as well as how sparse the nulls are, the total
| data size and partitioning scheme I guess) it's also possible
| to achieve this through a join-table without introducing NULL
| values.
| gsk22 wrote:
| Unless you outer join, in which case NULL values pop up out
| of thin air...
| almog wrote:
| Good point. I did try to say that whether that's a viable
| technique would depend on your query (but omitted some
| verb that should have made it better resemble a sentence
| in English :)).
|
| But yes, even if no nulls are present in any table, outer
| join could (or maybe even should) introduce them.
| jhugo wrote:
| I can't think of a situation where I'd want "only a single
| NULL"; that seems much weirder. Another comment described a
| tree data structure where it could make sense, but I can't
| think of others. Nullable unique indexes usually describe some
| kind of optional identifier; if it's present, you want it to be
| unique, but it's perfectly fine for more than one row to omit
| it.
| orthoxerox wrote:
| You might want an index like this on some aggregate table.
| Like, shoe_sales_total, which is grouped by the id of the
| shop assistant that helped you with the selection. But
| customers that have marched straight to the till and asked
| for shoe polish result in records with a NULL for agent_id.
| With a regular unique constraint (month, agent_id), your
| aggregate table can end up with two records for (May-21,
| NULL).
|
| Of course, the workaround for such cases is simple: have a
| sentinel value for unassisted sales and make the column not
| nullable.
| goto11 wrote:
| Lets say you have a Person table with with a SSN column. The
| column has a unique constraint because no two people have the
| same SSN. But the column in nullable, since not everyone has a
| SSN.
|
| It would be weird to expect that only _one person_ does not
| have SSN.
| knorker wrote:
| E.g. if you have a "tax ID" column, and two people in your
| database don't have tax IDs because they're not <your country>
| tax payers.
|
| Or "payment vendor processing ID". You may have multiple
| entries not yet sent to the vendor, so they're NULL.
|
| This shows up literally all the time in SQL schemas.
| hrdwdmrbl wrote:
| Cool but this was trivially solved previously by excluding NULL
| from the index using a partial index.
|
| I'm more interested in problems that have no great solution, like
| taking the first element of a set of partitioned data. Right now
| you have to partition & rank, and then select the rank 1 elements
| in a subsequent query.
| masklinn wrote:
| > Cool but this was trivially solved previously by excluding
| NULL from the index using a partial index.
|
| That seems like the opposite solution, treating NULLs as
| different in database systems which treat them as identical by
| default (which I think only includes MSSQL).
|
| In postgres, and most DBMS, using a partial index excluding
| NULL allows any number of rows to have a NULL value... which is
| the same behaviour as if you're not excluding NULLs.
|
| The new opt-in behaviour is treating all nulls as the _same_
| value, and thus allowing a UNIQUE index to constraint them.
|
| This required two different partial indexes, or for constrained
| data subset in hacks like COALESCE-ing nulls to a sentinel.
| silon42 wrote:
| The new behavior is more useful because it allows a single
| unique null.
|
| For 2, have you looked at "LATERAL JOIN" ?
| _flux wrote:
| I don't think you understood what this does (or then I
| misunderstood this) but if you do that, then how are you going
| to get an index violation from having two NULL values in the
| index? Because that is what this allows: normally indexing NULL
| values permitted multiple NULL entries, but with this NULL is
| comparable to other NULLs, therefore you can have only one NULL
| value in the index.
|
| Previously I had worked this around with functional index on
| e.g. COALESCE(value_with_null, -1), but not always you have a
| good sentinel value easily available. A more complicated index
| (..CASE WHEN..) would solve that, but it's nice to have this
| directly expressible, seems efficient as well.
| jhugo wrote:
| Maybe it's what you meant be "subsequent query", but you don't
| actually need to do it in a separate query; you can do the
| partition & rank in a CTE and select from the resulting table,
| all in a single query. It's quite tidy actually.
| layer8 wrote:
| > Because NULL values are of unknown equality to one another,
| they do not violate UNIQUE constraints.
|
| Well, arguably you don't know whether they violate uniqueness, so
| inserting more than one shouldn't have been allowed.
___________________________________________________________________
(page generated 2022-07-11 23:01 UTC)