[HN Gopher] Unnatural Keys - Nature doesn't come with identifiers
       ___________________________________________________________________
        
       Unnatural Keys - Nature doesn't come with identifiers
        
       Author : thunderbong
       Score  : 163 points
       Date   : 2023-05-28 05:54 UTC (17 hours ago)
        
 (HTM) web link (matt-schellhas.medium.com)
 (TXT) w3m dump (matt-schellhas.medium.com)
        
       | 36097082 wrote:
       | [flagged]
        
       | cubefox wrote:
       | "To keep reading this story, get the free app or log in."
        
       | praptak wrote:
       | https://en.m.wikipedia.org/wiki/Identity_(philosophy)
       | 
       | For when you want to think real deep about what the identifier
       | identifies.
        
         | fatneckbeard wrote:
         | or whether it identifies
         | 
         | https://en.wikipedia.org/wiki/Philosophical_skepticism#Epist...
        
       | jeff-davis wrote:
       | "There are no real world natural keys -- only someone else's
       | surrogate key... SSN, VIN, email address, UPC, tax numbers,
       | country codes, language codes, URIs... they're all just someone
       | else's made up identifier."
       | 
       | Every time this topic comes up, there are major definitional
       | problems, and it seems like either surrogate keys or natural keys
       | get defined out of existence.
       | 
       | The author is drawing the line such that natural keys are
       | surrogate keys that are out of your control, and surrogate keys
       | are in your control. That doesn't seem to match the definition
       | from wikipedia: https://en.wikipedia.org/wiki/Surrogate_key
       | 
       | I agree with the author that control over the ID is an important
       | distinction, but if you're going to make up a new ID, why not
       | just say so? "Surrogate key" just introduces confusion -- it
       | makes it sound like something only programmers and database
       | admins care about, which is definitely not true. If customers are
       | seeing it, everyone in the organization should care that it's in
       | a recognizable pattern (so you don't mix up different IDs from
       | different places) and memorable and easy to understand when
       | spoken.
        
       | swayvil wrote:
       | one might even say that all form is interpretation. Or that all
       | facts are assertions.
        
       | tingletech wrote:
       | If you are mapping to a series of bits, then a checksum of some
       | sort seems like a fine identifier to a specific file.
       | 
       | But something like "song", minting ARKs[0] for FRBR[1]-esq
       | Work/Expression/Manifestation records would be my first instinct,
       | not a natural key. The Manifestation record could have the
       | checksums of the files.
       | 
       | [0] https://arks.org
       | 
       | [1]
       | https://en.wikipedia.org/wiki/Functional_Requirements_for_Bi...
       | [1]
        
         | 8organicbits wrote:
         | FRBR makes a lot of sense. There's some really good insights
         | from library sciences that folks in computer science aught to
         | spend some time studying.
         | 
         | [edit] removed a statement about an ARK URL not loading, it
         | loads fine now. Temp issue on my end?
        
           | blooalien wrote:
           | I agree with the substance of your comment re; FRBR, but ...
           | typo maybe? [1]
           | 
           | [1] https://grammarist.com/homophones/ought-vs-aught/
        
       | karmakaze wrote:
       | I deal with inventory systems and there is a difference between
       | your system's key and a 'natural' one--the natural one existed
       | first. The good answer is to support both and use either as
       | appropriate. There are many ways of ID'ing an item, usually a SKU
       | (literally stock keeping unit) is used. The owner of the items
       | can choose to use an existing one (manufacturer, UPC, etc) but
       | more commonly assigns their own so that interchangeable items
       | with different manufacturer codes can share a SKU. Ideally a
       | system should support as many methods of ID'ing as the user deems
       | useful.
       | 
       | If you do use surrogate keys everywhere, avoid exposing a one if
       | a pair of existing ones will do. e.g. rows of an association
       | table, unless of course that association has attributes and is a
       | well-known thing to itself.
       | 
       | Making every user of your systems always use system generated
       | identifiers (where external ones already in use commonly exist)
       | is kind of like a file-system that only uses tmp-generated names
       | and the user has to do external book-keeping to remember which is
       | which.
        
       | fwlr wrote:
       | Nature doesn't come with identifiers; that is why Apollo gave us
       | UUIDs. And always remember, the goal of a database is _not_ to
       | model the domain - it is to model your knowledge of the domain.
        
         | karmakaze wrote:
         | Ironically DNA is an identifier and nature does use identifiers
         | like facial recognition, etc--it only happens not to be text-
         | based nor 100% unique/accurate.
        
           | fanf2 wrote:
           | DNA is not an identifier
           | https://en.wikipedia.org/wiki/Chimera_%28genetics%29
        
       | Anduia wrote:
       | https://archive.ph/vuXrb
        
       | praptak wrote:
       | Slap a synthetic key on everything, you probably overestimate how
       | bad it is to have multiple records for "the same thing".
       | 
       | Even if it is indeed bad, it's probably unrealistic to expect to
       | never let it slip on entry (however good the natural keys might
       | be). Maybe it makes more sense to incorporate deduplication into
       | the design right from the start.
        
         | jack_squat wrote:
         | When you use a surrogate key, you should still put a unique
         | constraint on an alternate key. Multiple records for the same
         | thing is a fundamental problem in a database.
        
       | ur-whale wrote:
       | > we want a database of all of the songs in the world so that we
       | can properly identify unknown songs and provide attribution so
       | that folks can get paid appropriately. It is a noble goal
       | 
       | That it is a noble goal is this internet rando's own opinion, and
       | there certainly is no consensus on that.
       | 
       | Inflicting it as a fait-accompli to his readership does not bode
       | well for the rest of the article
        
       | TeMPOraL wrote:
       | This. I'm actually surprised the linked Wikipedia article on
       | "Natural key" only mentions one potential disadvantage, that
       | _somewhen in the future, maybe, at some point, somehow_ , the
       | other system may change, and then your keys are bust. They even
       | give US SSNs as an example of something that may stop working as
       | unique key _in the future_.
       | 
       | There are no "natural keys". They don't exist in nature. They
       | don't exist as a thing _in our physical reality_. And when it
       | comes to human-assigned identifiers, it should be emphasized much
       | more strongly that whatever external thing you _think_ is a good
       | unique ID, it isn 't.
       | 
       | Names? Addresses? Dates? Nope - have you heard of the "Falsehoods
       | programmers believe about..."?
       | 
       | Government IDs? Nope. US SSN was _not designed_ to be used as UID
       | from the start, but even when you have an identifier that was -
       | like e.g. PESEL number in Poland - it 's _still_ not an unique
       | ID, because _people and computers make mistakes_. Multiple people
       | having the same PESEL is something that 's been known to happen.
       | 
       | (Side note: "identification number" really means "opaque string".
       | IDs are not numbers. If adding or multiplying them together
       | doesn't make sense, they're not numbers.)
       | 
       | Anything business adjacent? Don't make me laugh. The article
       | covered a lot of the reasons why any kind of tracking IDs are
       | broken for the purpose - most of them dealing with the messy
       | nature of reality - but businesses are businesses. They cut
       | corners. They lie and cheat. That's why you see the same EAN code
       | covering multiple products a store might even try and price
       | differently. That's why SKUs get reused. That's why these days
       | you often have multiple distinct products under the same SKU -
       | first, quality batch, released to get early positive reviews, and
       | then the main batch made with much cheaper and lower-quality
       | parts. Product names? That's even worse - you'll have variations
       | such as "extra batch of super-low-quality, reduced functionality,
       | made specifically for Black Friday, so it can be sold under the
       | same name as the OG product, seemingly at big discount".
       | 
       | This is not to say "never use other people's IDs". There's a
       | trade-off here, but making a good choice _starts_ with assuming
       | that the external  "unique ID" is neither unique, nor a good ID,
       | and you'll have to deal with it at some point.
        
         | sabas123 wrote:
         | > (Side note: "identification number" really means "opaque
         | string". IDs are not numbers. If adding or multiplying them
         | together doesn't make sense, they're not numbers.)
         | 
         | Eeehhh I'm not sure if this necessarily true. A telephone
         | number is definitely a number but addition doesn't make all too
         | much sense on them.
         | 
         | If you want to specifically talk about objects that require
         | addition and multiplication to be properly defined on them then
         | you can always use Groups and Rings.
        
           | kweingar wrote:
           | A phone number is a number in a colloquial sense (a string of
           | digits) but I would hate to use any computer system where
           | phone numbers are stored or reasoned about as actual
           | mathematical numbers.
        
           | JadedBlueEyes wrote:
           | A phone number is definitely not a number - for example, it
           | can often be written as starting with a +. Not only that, but
           | one number can have multiple valid expressions (eg. a number
           | with and without a country code). That's ignoring the
           | different ways people like to format them (spaces, dashes,
           | letters, etc.)
        
             | fanf2 wrote:
             | Yes, and leading zeroes are significant in phone numbers.
        
           | OJFord wrote:
           | You could construct a telephone numbering system where the
           | sum of two 'real' numbers could be dialed to start a three-
           | way conference call. That would be.. amusing, to some subset
           | of users.
        
             | kroltan wrote:
             | Give everyone prime _numbers_ and a N-way conference can be
             | done with multiplication.
        
         | Throw10987 wrote:
         | I find myself coming back to Data and Reality on a periodic
         | basis to remind me that the model is just the model.
         | 
         | https://www.bkent.net/Doc/darxrp.htm
        
         | bruce511 wrote:
         | If I could go back 25 years and give myself advice, it would be
         | this. A Lot of pain over my career would have been avoided.
         | 
         | I would also tell myself not to use sequential numbers for
         | surrogate keys. They're massively convenient, but a real pain
         | later on (data security, and distributed data being just two
         | cases.)
         | 
         | I know there are folks who like natural keys, but frankly IMO
         | they're a straight-up bug. YMMV.
        
         | EdwardDiego wrote:
         | I was about to suggest just hashing our DNA, but then,
         | remembered identical twins+ exist.
        
           | bruce511 wrote:
           | Oh it's worse than that. There are lots of edge cases with
           | DNA.
           | 
           | Chimera exist [1], plus there are issues with transplant
           | recipients and so on. I expect there are more edge cases I
           | don't know about.
           | 
           | [1] https://www.scientificamerican.com/article/3-human-
           | chimeras-....
        
           | drivers99 wrote:
           | and gene therapy
        
         | amelius wrote:
         | Whatever you do, always let some manager decide about the
         | actual key, then you can always blame them for their mistake :)
        
         | genuine_smiles wrote:
         | > There are no "natural keys". They don't exist in nature. They
         | don't exist as a thing in our physical reality. And when it
         | comes to human-assigned identifiers, it should be emphasized
         | much more strongly that whatever external thing you think is a
         | good unique ID, it isn't.
         | 
         | It seems to me that nature has plenty of natural keys, it's
         | human created constructs that don't.
         | 
         | For example, the periodic table has natural keys.
        
           | dale_glass wrote:
           | Yeah, but you don't want to use that as a primary key for
           | your stock database.
           | 
           | It'll work fine until you find you want to keep track of
           | U-235 and U-238 separately.
           | 
           | What constitutes identity is context-dependent. Yeah, on one
           | level they're the same thing. On another they aren't.
        
             | rzzzt wrote:
             | Make it a composite key of atomic and mass numbers, then.
        
               | xmcqdpt2 wrote:
               | But then you need to have a stored procedure to maintain
               | integrity every time an atom undergoes radioactive decay,
               | that's got to be bad for performance!
        
               | c00lio wrote:
               | But then you'd miss on crystal structure to distinguish
               | your diamonds from your coal.
        
               | OJFord wrote:
               | Which single element are they then?
               | 
               | Surely such a system would have a products table with
               | something like nominal element fk, purity, mass columns
               | anyway. (And stock table with fk to products.)
               | 
               | Then nothing wrong with 'natural' key on the elements,
               | but sure, products are the thing you've (personally) made
               | up, so of course there's no 'someone else's' key to use.
        
               | TRiG_Ireland wrote:
               | Diamonds and coal are both pure carbon (mostly
               | carbon-12). So is graphite, and buckminsterfullerene, and
               | a number of other things. Crystal structure matters a
               | lot.
        
               | OJFord wrote:
               | They're not _pure_ carbon was my point. I wasn 't saying
               | crystal structure not important, I missed that but it'd
               | be in my 'product derived from element x' table.
        
           | MyThoughts100 wrote:
           | The periodic table is a human construct. So is the concept of
           | an identifier.
        
             | jameshart wrote:
             | I'd wager that nonhuman intelligences out there in the
             | universe have analogous tables to the periodic table.
             | 
             | And I suspect it would be possible for us to JOIN our
             | periodic table onto theirs by matching on the atomic number
             | field to create a mapping between, say, our names for the
             | elements and theirs.
             | 
             | That seems unlikely to be something we would be able to do
             | with literally any other dataset that an alien civilization
             | has. Their star catalog will contain many of the same
             | objects ours does - but there won't be an obvious key that
             | lets us join them up.
             | 
             | I guess there's a few mathematical objects that would work
             | - the list of Platonic solids maybe, or the symmetry
             | groups... but in terms of real world physical things?
             | Atomic numbers might be the only truly natural key there
             | is.
        
               | dkarl wrote:
               | The periodic table groups atoms by the number of protons
               | they have and ignores other differences. Uranium 235 and
               | Uranium 238 have the same "natural key" according to the
               | periodic table. The periodic table is useful for solving
               | a lot of problems, and for other problems where it
               | doesn't work to treat U-235 and U-238 as the same, we
               | (and other intelligences) use other ways of categorizing
               | atoms.
        
               | jameshart wrote:
               | Right, but there's a reason we call them 'isotopes of
               | Uranium' rather than saying U-238, Np-238 and Pu-238 are
               | isotopes in the 'family of atoms with atomic mass 238'.
               | 
               | Things with 92 protons in them behave chemically in
               | similar ways. They form the same crystals and molecules.
               | 
               | If you were building a table of isotopes you'd put a
               | column for 'proton count', and you'd find lots of the
               | properties of the isotopes depend on the proton count not
               | the atomic mass. So you could normalize that table
               | structure out and create another table with the atomic
               | number as the key, and put all the properties that are
               | common to all isotopes of a given element in that table.
               | 
               | You would have made a table of elements and it would have
               | the same primary key as the periodic table.
               | 
               | That's what a natural key is. One that emerges naturally
               | when you normalize data you have collected.
               | 
               | I suppose similarly 'electrons in outer shell' might
               | emerge as a natural key by that process.
        
             | rocqua wrote:
             | The concept of an identifier comes with the concept of a
             | bijection (or at least a injection). That is a truly basic
             | concept in Mathematics.
             | 
             | You could argue Mathematics is a human construction. But at
             | that point, isn't every concept a human construction? In
             | that case it is hardly helpful, or meaningful to say a
             | concept is a human construct, since all concepts are.
        
               | imtringued wrote:
               | The problem isn't the concept of a bijective function, it
               | is picking the right function and that is a social
               | problem.
        
             | xmcqdpt2 wrote:
             | Most elements are stable, especially most of the ones we
             | deal with, and they fall in clear periods and rows based on
             | chemical valence. This is true for main group elements and
             | mostly true for transition metals.
             | 
             | I think the periodic table was discovered, not invented. My
             | guess is that whenever we communicate with some advanced
             | alien species, the periodic table is going to be one of
             | those things we share.
        
               | TeMPOraL wrote:
               | Periodic table is recording _multiple_ different patterns
               | in atoms. Some better than others. By nature, its a
               | dimensional reduction. There are many alternative
               | renderings of the periodic table (or spiral, or cube, or
               | ...) emphasizing different regularities. The one you
               | refer to is just the most popular rendering.
        
           | rini17 wrote:
           | Most of the time you don't interact directly with elements,
           | but with molecules. Oh, and ions. And radicals. And and....
        
             | TeMPOraL wrote:
             | And isotopes. And transient, unstable configurations that
             | may exist for a brief moment before decaying into something
             | that slots into the periodic table better.
             | 
             | And also elements aren't a thing, quantum fields are. This
             | is why you sometimes see things that just don't fit the
             | idea of distinct atoms. But then quantum fields also aren't
             | a thing. We don't know what exactly is _the_ thing. We only
             | have increasingly sophisticated working models.
        
               | rini17 wrote:
               | No need to go to such philosophical depths. We're only
               | poor engineers trying to come up with workable model. The
               | periodical table of elements was a true boon and it is
               | unlikely these "foreign keys" will change, but still - it
               | alone isn't fully suitable as data model for chemistry.
        
       | jack_squat wrote:
       | I also tend toward surrogate keys, and would also point out that
       | the overhead concerns frequently weigh in favor of them as well
       | (though not always).
       | 
       | If your natural key is bigger than the appropriately sized
       | surrogate key, and it will be referenced across the database in
       | other tables, the overhead of referencing it everywhere (and
       | indexing it in those places -- potentially in much larger tables)
       | can easily outweigh the overhead of the extra bytes on the
       | original table.
       | 
       | Additionally, surrogate keys can realize much better performance
       | WRT cache locality.
        
       | efitz wrote:
       | Slightly OT but related: one of the problems with identifiers
       | that I have come to accept is that the things underlying the
       | identifiers change over time. Things come and go, businesses
       | launch and fail or are acquired/absorbed into other businesses,
       | humans are born and die, humans change their names or genders,
       | people buy and sell and subdivide or combine real property,
       | people paint things different colors- the only constant is
       | change. And it generally wreaks havoc because people who thought
       | up identifiers for things usually assumed immutability of a
       | mutable thing, and often don't have workable procedures for
       | keeping up with change.
       | 
       | I've come to the conclusion that identifiers should be structured
       | in such a way that they have no relationship to what they
       | identify - for instance, never ever ever use a name as an
       | identifier, no matter how you normalize it. All properties of
       | things should be mutable descriptors. Identifiers should be
       | numeric-ish. Identifiers are not user interface, either - don't
       | try to format them for readability - the audience for identifiers
       | should be automation, not humans. You should give humans a way to
       | serialize them, such as a URL or UUID or whatever, but the point
       | is just to port the thing unambiguously between systems, not to
       | be human readable. You can build UX that looks up the ID and
       | shows you what the descriptive properties are.
        
       | hdudhsyw wrote:
       | [flagged]
        
         | xigoi wrote:
         | Medium has had paywalls for a long time.
        
       | karmakaze wrote:
       | > DALL-E -- "A photograph of meat in the shape of a key on a
       | solid background"
       | 
       | I feel like I'm now reading Matrix code where we can do away with
       | the image itself and just keep the prompt.
        
       | eshnil wrote:
       | I ran into the same problem while building
       | https://learnawesome.org/ . Forget the broad class of "learning
       | resources", even the "books" category doesn't have a usable
       | unique ID. Not everything gets an ISBN for example. There's also
       | the ambiguity between a "work" and an "edition" of a work.
       | 
       | This is probably why OpenLibrary supports mapping of books with
       | 40+ identifiers:
       | https://github.com/internetarchive/openlibrary/blob/master/o...
        
         | makeitdouble wrote:
         | ISBN is an interesting case as the publisher has leeway on how
         | it's handled accross the lifetime of a book. In particular some
         | books will get multiple ISBN for the same version, and some
         | other books can keep the same ISBN through multiple minor
         | revisions.
        
       | TazeTSchnitzel wrote:
       | In Sweden, almost everyone is assigned a number at birth, a
       | _personnummer_ , with the format YYMMDD-NNNN (or YYYYMMDDNNNN in
       | computing, to avoid having a Y2K issue every time someone turns
       | 100).
       | 
       | A lot of computer systems here seem to treat it as a natural key,
       | which is a terrible idea, because:
       | 
       | * Not everyone has one (notably, asylum seekers and temporary
       | residents don't), so if you rely on them, you're excluding those
       | people.
       | 
       | * They can change. They won't for 99.9% of people, but not
       | everyone is so lucky. If you use these as your _primary_ key, you
       | 're going to create a headache for you and your users for that
       | 0.1% of cases.
        
         | schlowmo wrote:
         | Not so long ago I had to deal with the Austrian version of a
         | Social Security Number and it turns out that using birthdates
         | inside such a number isn't a good idea, since
         | 
         | * Some people doesn't have a birthdate with which the Austrian
         | state agrees
         | 
         | * For those people the Austrian state at least assumes a
         | birthyear with a birthday at January 1st or July 1st
         | 
         | * This leads to overflowing the available numbers for those two
         | dates
         | 
         | * If a date overflows, they get SSN with month "13". The 13th
         | month can have as many days as needed (so > 31 is possible).
         | 
         | * Now you have SSNs with a month part of e.g 2023-13-32.
         | 
         | Why even bother with birthdates then?
        
         | BlueTemplar wrote:
         | I'm curious, was there any pushback to its introduction ?
         | 
         | In France, when the government tried to introduce something
         | like this in 1973, it (eventually) caused a scandal (after a
         | scathing Le Monde article), which resulted in the whole project
         | being scrapped, and the French data protection authority CNIL
         | being created :
         | 
         | https://www.lemonde.fr/blog/bugbrother/2010/12/23/safari-et-...
         | (fr)
         | 
         | (CNIL was later defanged in 2004.)
        
         | Phelinofist wrote:
         | In what cases can this number change?
        
           | meithecatte wrote:
           | Poland has a very similar numbering system, the PESEL. It can
           | change if the date of birth was recorded inaccurately and you
           | need to correct it, or, because the PESEL encodes gender,
           | when you get your gender marker legally changed.
        
             | playingalong wrote:
             | PESEL system was bought from Sweden.
        
           | Philpax wrote:
           | The personnummer encodes gender in the last digit (which,
           | IMO, is very silly) - so when that changes, the number has to
           | change too.
        
             | hgsgm wrote:
             | That explains all the fuss of recent years. Transgenderism
             | is an affront against data schematization.
        
         | Pamar wrote:
         | Personally I think this is very bad idea especially because it
         | would be be very _tempting_ to use the id to - for example -
         | trivially  "decide" if you can, for example, smoke, drink
         | alcohol, watch porn.
         | 
         | But this would also make some tricks to extend the scheme to
         | corner cases (e.g.: refugees will get YYMMDD-NNNN where the
         | date part is when they were registered at entry) fail
         | miserably.
         | 
         | At the very least, a single letter code in the mix would
         | mitigate the problems a bit.
        
         | sebastialonso wrote:
         | Any scheme that covers 99.9% of cases is a success, something
         | quite hard to achieve. Calling it a terrible idea is a bit
         | melodramatic, no?
        
         | ttepasse wrote:
         | Should't there be discussion about changing this number scheme?
         | 
         | According to Wikipedia the NNNN part is really NNGC. C is a
         | checksum, giving NNG as a serial number with G even/odd for
         | gender. Sweden seems to have ca. 115000 births per year, making
         | 315 births per day. But those may not be evenly distributed.
         | The available space or serial numbers seems awfully close.
        
         | jackjeff wrote:
         | I think that's true for Social Security numbers in France/US,
         | and National Insurance numbers in UK.
         | 
         | About 15 years ago, after working in Germany I came back to
         | France for a couple of years, the country where I was born /
         | assigned a number at birth / following a similar pattern with
         | the year in the code.
         | 
         | My health insurance from Germany did not transfer according to
         | EU rules or something, at least that's what the regional French
         | National Security told me. The way they got me out of the
         | Kafkaesque situation was to give me a "temporary" social
         | security number, like they would give a foreign immigrant, and
         | have it revert back to my "normal" number after 3 months. At
         | some point I had "two" national security numbers. It was a real
         | mess to fix the whole thing afterwards as you can imagine, and
         | considered I left the country only after a couple of years, I
         | wonder why I even bothered.
        
           | tough wrote:
           | French bureaucracy will kill your soul slowly but surely
        
         | c00lio wrote:
         | Yes, but what would be the alternative?
         | 
         | Of course you can use artificial keys like a random UUID for
         | each of your db objects. Which will work quite well if you only
         | interact with your own system.
         | 
         | But as soon as one of your db objects needs to be linked to
         | some other system, you will need some common ground for a
         | correlation, and something like a personnummer will help
         | immensely and solve 99.9% of your problems. The 0.1% of
         | problematic cases is far less than the headaches that the
         | absence of some common ID scheme will cause you.
         | 
         | Imagine e.g. having to correlate on the usual "Name, First
         | Name, Birth Name, Place of Birth, Date of Birth" dance: My
         | place of birth used to be called Lower Unxton at the time of my
         | birth, now it has been reformed with Upper Unxton and
         | Exampleville into Greater Unxton. Which one do you want? Did I
         | always give the same answer to that question? My parents also
         | divorced and my birth name legally changed. Do you want the old
         | or the new birth name? Btw., my legal date of birth is actually
         | in a non-gregorian calendar. How do I input that into your form
         | field, it doesn't seem to like Showa 50 as a year, not even to
         | speak about the proper characters?
         | 
         | I'd claim any imperfect personnummer is still far superior to
         | all the problems one would have without it.
        
           | cpfohl wrote:
           | They're not suggesting you don't store it... I don't think
           | you're addressing the real concerns here.
           | 
           | How I read the concerns: External connections to your data
           | belong in a field or a separate table to ensure that your
           | data is not so tightly coupled to the key you've chosen that
           | you limit your data model's expressive power. Don't make an
           | external identifier the primary key in your database.
        
             | c00lio wrote:
             | Ah. Thanks for the explanation. With that I agree
             | completely.
        
           | Groxx wrote:
           | You deal with it the same way as you deal with any value that
           | may change and may not be unique: you don't make it your
           | primary key.
           | 
           | Just index it separately, and resolve your links at insertion
           | time to your internal actually-stable-and-unique keys. Then
           | when your system runs into one of these duplicates _you can
           | tell what is happening_ , and you have a chance to fix it,
           | rather than blindly plowing ahead and ruining even more data.
           | 
           | None of this has anything to do with how you represent your
           | API to other systems. Do whatever you need there. And if
           | system X is confused and thinks public-ID-Y is actually Z,
           | well now you can handle it - that's just another
           | discriminator, but you select this one based on the querier.
        
           | vaylian wrote:
           | > Yes, but what would be the alternative?
           | 
           | Not have a nation-wide ID scheme. The problem is that people
           | will rely too much on it being always correct and a lot of
           | infrastructure will grow on top of it so that it is really
           | hard to fix things.
           | 
           | A better approach is to keep the human in the loop. If a
           | government or a company wants to create a connection between
           | databases, let the human whose data is requested, provide the
           | external keys/IDs upon request. That's less automation, but
           | it allows for more flexibility and better personal data
           | protection.
        
             | wolfgang42 wrote:
             | We tried that in the US, and we ended up with an ad-hoc one
             | anyway. It seems to be far too useful of a concept in
             | modern society to avoid this happening whether you want it
             | to or not.
        
             | emodendroket wrote:
             | Yeah the US system of hijacking numbers that were never
             | meant for this purpose and treating them as sufficient to
             | take out credit in your name, and then having fifty
             | redundant DMVs handling identification cards makes much
             | more sense.
        
       | dvdkon wrote:
       | Consistency in one database is important and "natural keys" can
       | help, it's just a bad idea to apply them to messy nature. They're
       | great for derived or linked data, though.
        
       | xipho wrote:
       | Nature + Identifiers is an issue (e.g. see
       | https://github.com/tdwg/tag/issues/36). I've routinely mentioned
       | to others in my field to look at other industries for ways
       | forward, particularly ones like the music industry, so its
       | interesting, if unsurprising to see all the same problems arise
       | there. For those who know, in our field, when Identifiers comes
       | up in conversation at conferences etc., we tiptoe away, somehow
       | people can't learn from the past issues.
       | 
       | There are two issues that keep coming up in my mind: 1) People
       | want Identifiers to _do_ something (like resolve), rather than
       | just be identifiers and 2) People think that there are such
       | things as  "unique" identifiers (one identifier per "thing").
       | Neither, in my mind, are the purpose of identifiers. Identifiers
       | should do one thing, localize you to some concept. By localize I
       | mean that if you can find the digital space (or in physical
       | collections where identifiers are used the physical "printed"
       | identifier) that "contains" the identifier then you should have a
       | reasonable probability of finding the thing/concept that
       | identifier is for. That's all. No certainty, no uniqueness. It's
       | very akin to what we do when we cite something in a publication,
       | we are giving the researcher who reads it a reasonable chance of
       | finding the origin. This isn't to say that we shouldn't try to
       | keep identifiers unique though, it's to say that when it comes
       | down to crunch time we should never assume 1) that they are
       | unique, and 2) that their special properties (e.g. that they
       | resolve) actually work.
       | 
       | I've seen numerous identifier schemes come and go. We have
       | specifically designed a 1-many for things-to-identifiers in our
       | systems (sitting on top our internal IDs, yay, another ID). DOIs?
       | They must be unique, right? Nope. Institutional CODENs? Nope
       | (though the botanists have done it pretty well through community
       | peer-pressure). People IDs? Do you mean wikidata or ORCiD or?
       | 
       | As others have noted, identifiers really are just labels, though
       | things like UUIDs have the game-changing property of reducing the
       | probability that you're looking at a homonymous label.
        
       | IanCal wrote:
       | The really key thing with this is to identify what your
       | database/Id is solving _and what it is not_. What guarantees are
       | you giving and what problems should you raise to people. You can
       | rarely solve all of them.
       | 
       | I helped make GRID which was a database of research institutes in
       | the world (morphed into ROR). But what is one institute? Is it
       | the legal entity? Is a lab an institute? What about a research
       | boat? A hospital? Which is the parent, the hospital or the
       | university?
       | 
       | The answer to lots of that is "maybe, it depends". We had to
       | decide what use cases we were solving, and often it came down to
       | some human fuzzy definition, how do people use this in reality?
       | 
       | We built it because others had made very strict sets of rules or
       | were very open and then were just not _useful_. A curated list of
       | a 1000 institutes set up as people generally used them solved
       | most issues. Scaling it up another 10x solved most others. Not
       | all, not for everyone.
       | 
       | A recording of background noise or silence isn't a song. Unless
       | of course it's 4'33".
       | 
       | > They're all arbitrary, unnatural, human inventions
       | 
       | A really great summary.
        
         | marcosdumay wrote:
         | > The really key thing with this is to identify what your
         | database/Id is solving and what it is not.
         | 
         | On this context, it is very simple and straight-forward. It
         | also generalizes almost completely.
         | 
         | Your database id is solving the problem of creating/managing
         | the relationships within your data. This generalizes to
         | basically everybody because "database id" is a concept created
         | within a mathematical theory exactly and exclusively for that
         | purpose.
         | 
         | If you try to add more purposes for it, you _will_ get
         | conflicting requirements.
        
         | narag wrote:
         | _The really key thing with this is to identify what your
         | database /Id is solving and what it is not._
         | 
         | That's the money quote. If the indirection of long, composite,
         | maybe changing foreign keys could be managed by the system,
         | natural keys would be the way to go. But that doesn't happen.
         | 
         |  _> They're all arbitrary, unnatural, human inventions_
         | 
         | That part from the article is the weaker point. Semantics.
         | "Natural" only means that it comes from the problem domain, it
         | doesn't really matter.
        
         | fauigerzigerk wrote:
         | Which goes to show that we often conflate identification and
         | classification because it's convenient. It's not always the
         | best decision longer term but it may be good enough.
         | 
         | For your particular task, you could have chosen to identify
         | things of interest and assign them an ID. Separately, you could
         | have made (and revised) a decision on whether a particular
         | thing is a lab or a research institute and/or some other kind
         | of entity.
         | 
         | In my experience, organisations are some of the most difficult
         | things to model. The ways in which they can merge, split, be
         | part of one another, be the same and not the same in different
         | jurisdictions or according to different sets of rules are
         | basically infinite.
         | 
         | So when you say "The really key thing with this is to identify
         | what your database/Id is solving and what it is not" I totally
         | agree. But I would add that identification and classification
         | solve overlapping problems that are often deceptively similar
         | but not quite the same.
        
           | IanCal wrote:
           | > For your particular task, you could have chosen to identify
           | things of interest and assign them an ID. Separately, you
           | could have made (and revised) a decision on whether a
           | particular thing is a lab or a research institute and/or some
           | other kind of entity.
           | 
           | I get the point but we couldn't. The very concept of what
           | _one_ thing is, is a fuzzy decision in itself. There 's also
           | not a single hierarchy that makes sense for different uses so
           | you can't just group and then classify - legal ownership,
           | financing and operational control can all be distinct
           | structures. Then how people actually work can be another.
           | 
           | Even _then_ , fundamentally what we needed was to get away
           | from trying to model the world then work out how to use it.
           | Because then you have one Id that maps one way for one user
           | and another for another user and now their work is
           | incompatible - but in a subtle and hidden way.
           | 
           | Edit - this doesn't take away from your point, classification
           | is a distinct problem.
        
         | cwillu wrote:
         | 4'33" isn't a song either. If you call a tail a leg, a dog
         | still has 4 legs.
        
           | drivers99 wrote:
           | So in OP's example of tracking every song, if I want/need
           | 4'33" in my database of all songs, but I need to keep track
           | of whether cwillu think's it's a song for some reason (they
           | seem pretty authoritative, better make note of it), I could
           | create a foreign key for that and set it to values like
           | "cwillu said it's a song", "cwillu said it's NOT a song", and
           | "cwillu has not specified if it's a song yet". That's why
           | natural keys don't exist.
        
           | dgellow wrote:
           | In real life dogs can exist with more or less than 4 legs.
        
             | dpierce9 wrote:
             | Yes! The existence of a five-legged dog makes "all dogs
             | have four legs" false but it doesn't make "dogs have four
             | legs" false. Similarly, the existence of a two-legged dog
             | makes "some dogs have only two legs" true while "dogs have
             | two legs" is still not (unless you just mean the former).
             | 
             | Dog is a categorical abstraction over individuals. The
             | properties of a category tend to have a complex
             | relationship to the members of the category. This is true
             | even if it is indisputable who the members are and what
             | properties the members have.
        
           | IanCal wrote:
           | For a whole bunch of useful definitions and use cases, it's a
           | song. Arguing about the precise delineations of song/not song
           | is missing my point.
           | 
           | > If you call a tail a leg, a dog still has 4 legs.
           | 
           | I can't think of a useful definition of legs that includes
           | the tail. I can for calling a recording of a musical piece a
           | song.
        
             | [deleted]
        
             | [deleted]
        
           | stefncb wrote:
           | Since we're going into this meta-philosophical thing, your
           | statement is false. If I call a tail a leg, dogs definitely
           | have five legs. Unless they were of course born without a
           | leg, or have otherwise lost one. There is no "reality" from a
           | human perspective, it's all sensory input that goes through a
           | lot of very subjective processing.
           | 
           | You can't _really_ categorize things. You can mostly sort of
           | do it, but it never actually works out, because categories
           | are human stuff, and human stuff is subjective and thus
           | different depending on who you ask. Sometimes unclear even to
           | the same person.
           | 
           | I can say dogs have five legs and the only reason most people
           | will think I'm wrong is that society has come to categorize
           | limbs based on function instead of length.
        
             | tough wrote:
             | Getting a little pedantic 2. a large branch of a tree. "the
             | bare limbs of a high tree" Similar: branch
             | 
             | limb (n.2) late 14c., "edge of a quadrant or other
             | instrument," from Latin limbus "ornamental border, hem,
             | fringe, edge," a word of uncertain origin.
             | 
             | Just thought of the limbic system and how that might be
             | related, doesn't seem to be about function but about
             | positioning against a core system
        
               | stefncb wrote:
               | My point was that categorizing limbs (differentiating
               | them from one another) is done by function. Now that I
               | think about it limb isn't what I had in mind anyway, what
               | I meant to say was that we categorize parts of the body
               | in terms of function.
               | 
               | But you're right, limbs are not defined by function.
        
       | wolfgang42 wrote:
       | a "natural key" is frequently just a _really_ foreign key in a
       | database you and your org don 't manage. -- 'wwweston,
       | https://news.ycombinator.com/item?id=27349246
        
       | atoav wrote:
       | As someone who has been thinking about a system to organize my
       | improvisation band's rehearsal recordings I have made many of the
       | same realizations.
       | 
       | Only with us some of the points are even wilder, because for us
       | "a song" is the abstract concept or idea for an improvisation
       | rather than a fixed set of notes played in a fixed way in a fixed
       | order. Sometimes that idea can be a word, sometimes it can be
       | very precise sound settings, high-level structural ideas, a set
       | of lyrics, rhythmic patterns, melodic themes etc.
       | 
       | So any given recording can contain within itself any number of
       | songs (including zero). Those songs can exist or not. And my
       | conclusion for our case was that there is no automated way of
       | deciding which is which, as at times it could be hard even for us
       | ourselves.
        
       | chefandy wrote:
       | I hate to throw OT shade, but as a commercial artist, I'm gonna-
       | the hero photo in this article shows the importance of my
       | profession. Creating images is a lot easier than deciding how
       | best to visually communicate an idea and conceptually manipulate
       | the most relevant components. I probably could have made a vector
       | drawing in 10 minutes, or a photorealistic image in photoshop in
       | about 15 that would have done the job a million times better, and
       | considering how much cheaper artists and designers are than
       | developers, it probably would have been about the same labor cost
       | if money was involved. I get it... It's a blog post... But it's
       | seriously like the clipart revolution all over again, just
       | without the guard rails provided by pre-made imagery.
        
       | ak39 wrote:
       | Always use surrogate primary keys. Use alternate keys (AK) for
       | your "natural keys". Modify the AK business rules as and when
       | needed. There are several advantages to using meaningless random
       | integers, sequential integers or UUIDs as your table's PK. The
       | most important one is that if the PK participates (or is expected
       | to participate) as a foreign key in other tables, it's crucial to
       | us a design where you never ever have to change the PK!
        
         | bruce511 wrote:
         | All this, but I'd steer away from sequential numbers. They
         | suffer from security problems, data-merging problems, data-
         | distribution and replication problems, and more.
         | 
         | Over a long career I've come to favor UUID, the cost (space and
         | performance) are well worth the advantages.
        
           | scrollaway wrote:
           | UUIDv7 has all the advantages of UUID, plus sortability, plus
           | an embedded timestamp which can replace your "created_at"
           | thus reducing the cost aspect. It's also soon becoming an
           | official part of the UUID spec.
           | 
           | In fact, if your created_at is indexed and you can get rid of
           | it via uuid7, there is practically no difference in terms of
           | table size. The size cost is solely on foreign keys.
        
             | evntdrvn wrote:
             | I really hope that I can find someone on the MS SQL Server
             | team to champion implementing native UUIDV7 support, from
             | what I've heard they are really hesitant to add more data
             | types but a person's gotta try :)
        
       | karol wrote:
       | I think in the future the idea of labelling every passing moment
       | and every sounds that anything ever makes will lose prominence.
       | Wondering what will replace them. I think we will either drop
       | this need altogether or the "natural key" will come from "the
       | state of reality".
        
       | SnowHill9902 wrote:
       | This is not true. Given two songs A, B, an informed human can
       | tell whether A and B are the same song or not. Now you've created
       | a UNIQUE constraint on your data, so you have a unique
       | identifier.
        
         | quesera wrote:
         | This is not true. The article gives the great examples of a
         | song through its lifecycle: covered, remixed, remastered,
         | compiled on a greatest hits record, included on a soundtrack,
         | etc. Same song, sometimes. Same recording, sometimes. Different
         | database entries, always.
         | 
         | To the extent that "an informed human" and "the same song" have
         | meaningful definitions of "informed" and "same", you've just
         | recreated the problem.
        
       ___________________________________________________________________
       (page generated 2023-05-28 23:02 UTC)