hngopher.com

       [HN Gopher] API design note: Beware of adding an "Other" enum value
       ___________________________________________________________________
        
       API design note: Beware of adding an "Other" enum value
        
       Author : luu
       Score  : 80 points
       Date   : 2025-02-27 10:47 UTC (3 days ago)
        
 (HTM) web link (devblogs.microsoft.com)
 (TXT) w3m dump (devblogs.microsoft.com)
        
       | zdw wrote:
       | I wonder how this aligns with the protobuf best practice of
       | having the first value be UNSPECIFIED:
       | 
       | https://protobuf.dev/best-practices/dos-donts/#unspecified-e...
        
         | jmole wrote:
         | The example code used added "other" as the last option, which
         | was the source of the problems he described.
         | 
         | This doesn't happen when you make the first value in the enum
         | unknown/unspecified
        
           | plorkyeran wrote:
           | No, the problem described in the article is entirely
           | unrelated to where in the enum the Other option is located.
           | There is a different problem where _keeping_ the Other option
           | at the end of the enum changes the value of Other, but that
           | is not the problem that the article is about.
        
             | jmole wrote:
             | Well it simplifies the logic considerably - if you see an
             | enum value you don't recognize (mint), you treat it as
             | uninitialized (0).
             | 
             | So any future new flavor will be read back as '0' in older
             | versions.
        
         | beart wrote:
         | "Unspecified" is semantically different from "other". The
         | former is more like a default value whereas the latter is
         | actually "specified, but not one of these listed options".
        
           | hamandcheese wrote:
           | Standard practice in protobuf is to never assign semantic
           | meaning to the default value. I think some linters enforce
           | that enum 0 is named "unknown" which is actually more
           | semantically correct than "other" or "unspecified".
        
         | seeknotfind wrote:
         | This is the same as a null pointer, and the requirement is very
         | deeply tied to protobuf as it is used on large distributed
         | systems that always need to handle version mismatch, and this
         | advice doesn't necessarily apply to API design in general.
        
           | eddd-ddde wrote:
           | Even in the simplest web apps you can encounter version
           | mismatch when a client requests a response from a server that
           | just updated.
        
             | hansvm wrote:
             | Hence the advice to make that situation not happen. Update
             | the client and server to support both versions and prefer
             | the new one, then update both to not support the old
             | version. With load balancers and other real-world problems
             | you might have to break that down into 4 coordinated steps.
        
               | Joker_vD wrote:
               | That only really works if you control the clients, or can
               | force them to update.
        
               | LoganDark wrote:
               | > or can force them to update.
               | 
               | I've used a few clients that completely lock me out for
               | every tiniest minor version update. Very top-tier
               | annoying imho.
        
             | seeknotfind wrote:
             | This implies an API where the server has a single shared
             | implementation. Imagine for instance that the server
             | implements a shim for each version of the interface, then
             | there isn't a need for the null in the API. Imagine another
             | alternative, that the same API never adds a field, but you
             | add a new method which takes the new type. Imagine yet
             | again an API where you are able to version the clients in
             | lockstep. So, it's a decision about how the API is used and
             | evolves that recommends the API encoding or having a null
             | default. However in a different environment or with
             | different practices, you can avoid the null. Of course the
             | reason to avoid the null is so that you can statically
             | enforce this value is provided in new clients, though this
             | also assumes your client language is typed. So in the end,
             | protobuf teaches us, but it's not always the best in every
             | situation.
        
         | bocahtie wrote:
         | When the deserializing half of the protobuf definitions
         | encounter an unknown value, it gets deserialized as the zero
         | value. When that client updates, it will then be able to
         | deserialize the new value appropriately (in this case, "Mint").
         | The advice on that page also specifies to not make the value
         | semantically meaningful, which I take to mean to never set it
         | to that value explicitly.
        
           | chen_dev wrote:
           | > it gets deserialized as the zero value
           | 
           | It's more complicated:
           | 
           | https://protobuf.dev/programming-guides/enum/
           | 
           | >> What happens when a program parses binary data that
           | contains field 1 with the value 2?
           | 
           | >- Open enums will parse the value 2 and store it directly in
           | the field. Accessor will report the field as being set and
           | will return something that represents 2.
           | 
           | >- Closed enums will parse the value 2 and store it in the
           | message's unknown field set. Accessors will report the field
           | as being unset and will return the enum's default value.
        
       | NoboruWataya wrote:
       | > Just document that the enumeration is open-ended, and programs
       | should treat any unrecognized values as if they were "Other".
       | 
       | Possibly just showing my lack of knowledge here but are open-
       | ended enumerations a common thing? I always thought the whole
       | point of an enum is that it is closed-ended?
        
         | sd9 wrote:
         | I've worked on systems which where the set of enum values was
         | fixed at any particular point in time, but could change over
         | time as business requirements changed.
         | 
         | For instance, we had an enum that represented a sport that we
         | supported. Initially we supported some sports (say FOOTBALL and
         | ICE_HOCKEY), and over time we added support for other sports,
         | so the enum had to be expanded.
         | 
         | Unfortunately this always required the entire estate to be
         | redeployed. Thankfully this didn't happen often.
         | 
         | At great expense, we eventually converted this and other enums
         | to "open-ended" enums (essentially Strings with a bit more
         | structure around them, so that you could operate on them as if
         | they were "real" enums). This made upgrades significantly
         | easier.
         | 
         | Now, whether those things should have been enums in the first
         | place is open for debate. But that decision had been made long
         | before I joined the team.
         | 
         | Another example is gender. Initially an enum might represent
         | MALE, FEMALE, UNKNOWN. But over time you might decide you have
         | need for other values: PREFER_NOT_TO_SAY, OTHER, etc.
        
         | XorNot wrote:
         | The first time you have to add a new schema value, you'll
         | realise you needed "unknown" or similar - because during an
         | upgrade your old systems need a way to deal with new values (or
         | during a rollback you need to handle new entries in the
         | database).
        
           | sitkack wrote:
           | Your comment is the only in the entire discussion that
           | mentions "schema". Having an "other" in a schema is a way to
           | ensure you can run n and n+1 versions at the same time.
           | 
           | It is Data Model design, of which API design a subset.
           | 
           | You can only ever avoid having an other if 1) your schema is
           | fixed and 2) if it is total over the universe of values.
        
         | hansvm wrote:
         | It's common when mixing many executables over time.
         | 
         | I prefer to interpret those as an optional/nullable _closed_
         | enum (or, situationally, a parse error) if I have to switch on
         | them and let ordinary language conventions guide my code rather
         | than having to understand some sort of pseudo-null without
         | language support.
         | 
         | In something like A/B tests it's not uncommon to have something
         | that's effectively runtime reflection on enum fields too. Your
         | code has one or more enums of experiments you support. The UI
         | for scaling up and down is aware of all of those. Those two
         | executables have to be kept in sync somehow. A common solution
         | is for the UI to treat everything as strings with weights
         | attached and for the parsers/serializers in your application
         | code to handle that via some scheme or another (usually
         | handling it poorly when people scale up experiments that no
         | longer exist in your code). The UI though is definitely open-
         | ended as it interprets that enum data, and the only question is
         | how it's represented internally.
        
         | int_19h wrote:
         | Both are valid depending on what you're modelling.
         | 
         | As far as programming languages go, all enums are explicitly
         | open-ended in C, C++, and C#, at least, because casting an
         | integer (of the underlying type) to enum is a valid operation.
        
           | jay_kyburz wrote:
           | My pet hate is when folks start doing math on enums or
           | assuming ranges of values within an enum have meaning.
        
         | tbrownaw wrote:
         | Does a foreign key count as an enum type?
        
         | fweimer wrote:
         | Enumerations are open-ended in C and C++. They are just integer
         | types with some extra support for defining constants (although
         | later C++ versions give more control over the available
         | operations).
        
       | o11c wrote:
       | The approach in the link is fine for _consumers_ , but for
       | producers you really do need some way of saying "create a value
       | that's not one of the known values". Still, there's nothing that
       | says this needs to be pretty.
        
       | coin wrote:
       | Just call it "unknown" or "unspecified" or better yet use an
       | optional to hold the enum.
        
         | 101011 wrote:
         | This ended up being the preferred pattern we moved into.
         | 
         | If, like us, you were passing the object between two
         | applications, the owning API would serialize the enum value as
         | a String value, then we had a client helper method that would
         | parse the string value into an Optional enum value.
         | 
         | If the original service started transferring a new String
         | object between services, it wouldn't break any downstream
         | clients, because the clients would just end up with Optional
         | empty
        
           | janci wrote:
           | How that works when you need to distinguish between "no value
           | provided" and "a value that is not in the list" - in some
           | applications they have different semantics.
        
       | remram wrote:
       | Rust has the "non_exhaustive" attribute that lets you declare
       | that an enum might get more fields in the future. In practice
       | that means that when you match on an enum value, you have to add
       | a default case. It's like a "other" field in the enum except you
       | can't reference it directly, you use a default case.
       | 
       | IIRC a secret 'other' field (or '__non_exhaustive' or something)
       | is actually how we did thing before non_exhaustive was
       | introduced.
        
         | airstrike wrote:
         | TIL
         | 
         | https://doc.rust-lang.org/reference/attributes/type_system.h...
        
         | kibwen wrote:
         | Note that the stance of the OP here is broadly in agreement
         | with what Rust does. His main objection is this:
         | 
         |  _> The word "other" means "not mentioned elsewhere", so the
         | presence of an Other logically implies that the enumeration is
         | exhaustive._
         | 
         | In Rust, because all enums are exhaustive by default and
         | exhaustive matching is enforced by the compiler, there is no
         | risk of this sort of confusion. And then the fact that his
         | proposed solution is:
         | 
         |  _> Just document that the enumeration is open-ended_
         | 
         | The non_exhaustive attribute is effectively compiler-enforced
         | documentation; users now cannot forget to treat the enum as
         | open-ended.
         | 
         | Of course, adding non_exhaustive to Rust was not without its
         | own detractors; it usage for any given enum fundamentally means
         | shifting power away from library consumers (who lose the
         | ability to guarantee exhaustive matching) and towards library
         | authors (who gain the ability to evolve their API without
         | causing guaranteed compilation errors in all of their users
         | (which some users desire!)). As such, the guidance is that it
         | should be used sparingly, mostly for things like error types.
         | But that's an argument against open-ended enums in general, not
         | against the mechanisms we use to achieve those (which, as you
         | say, was already possible in Rust via hacks).
        
           | tyre wrote:
           | Maybe there should be a compiler option or function to assert
           | that a match is exhaustive. If the match does not handle a
           | defined case, it blows up.
        
             | aecsocket wrote:
             | Rust already asserts that a match is exhaustive at compile
             | time - if you don't include a branch for each option, it
             | will fail to compile. This extends to integer range
             | matching and string matching as well.
             | 
             | It's just that with #[non_exhaustive], you _must_ specify a
             | default branch (`_ = > { .. }`), even if you've already
             | explicitly matched on all the values. The idea being that
             | you've written code which matches on all the values _which
             | exist right now_ , but the library author is free to add
             | new variants without breaking your code - since it's now
             | your responsibility as a user of the library to handle the
             | default case.
        
         | hchja wrote:
         | This is why language syntax is so important.
         | 
         | Swift allows a 'default' enum case which is similar to other
         | but you should use it with caution.
         | 
         | It's better to not use it unless you're 110% sure that there
         | will not be additional enums added in the future.
         | 
         | Otherwise, in Swift when you add an additional enum case, the
         | code where you use the enum will not work unless you handle
         | each enum occurrence at it's respective call site.
        
           | layer8 wrote:
           | The better solution is to have two different "default" cases
           | in the language, one that expresses handling "future" values
           | (values that aren't currently defined), and one that
           | expresses "the rest of the currently defined values". The
           | "future" case wouldn't be considered for exhaustiveness
           | checks.
        
             | SkiFire13 wrote:
             | What would the "future" default case actually do though?
             | When you're in the past there's no value for it, and the
             | moment you get to the future the values will become part of
             | the "present" and will still not fall under the "future"
             | case. You would need some kind of versioning support in the
             | enum itself, but that's a much bigger change.
        
               | layer8 wrote:
               | "Future" values only become defined ("present" in your
               | sense) at compile-time, but may occur before that at
               | runtime. Note that this mostly presumes a language with
               | separate compilation, or situations like coding against a
               | remote-API spec, where the server may deploy a newer
               | version but your client remains unchanged. Once you
               | compile against the new spec, you'd get errors/warnings
               | about the new, not explicitly handled values, but your
               | existing binary would nevertheless handle those values
               | under the "future" case.
               | 
               | The issue with traditional "default" cases is that they
               | shadow warnings/errors about unhandled cases, but you'd
               | still want to have some form of default case for forward
               | compatibility.
        
             | mayoff wrote:
             | Swift allows an enum to be marked `@frozen`, which is an
             | API (and ABI) stability guarantee that the enum will never
             | gain more cases. Apple uses this quite sparingly in their
             | APIs.
             | 
             | Swift also has two versions of a `default` case in switch
             | statements, like you described. It has regular `default`
             | and it has `@unknown default`. The `@unknown default` case
             | is specifically for use with non-frozen enums, and gives a
             | warning if you haven't handled all known cases.
             | 
             | So with `@unknown default`, the compiler tells you if you
             | haven't been exhaustive (vs. the current API), but doesn't
             | complain that your `@unknown default` case is unreachable.
        
         | sunshowers wrote:
         | There is currently a missing middle ground in stable Rust,
         | which is to _lint_ on a missing variant rather than fail
         | compilation. There 's an unstable option for it, but it would
         | be very useful for non-exhaustive enums where consumers care
         | about matching against every known variant.
         | 
         | You can practically use it today by gating on a nightly-only
         | cfg flag. See https://github.com/guppy-
         | rs/guppy/blob/fa61210b67bea233de52c... and
         | https://github.com/guppy-rs/guppy/blob/fa61210b67bea233de52c...
        
       | esafak wrote:
       | Just add a free-form text field to hold the other value, and
       | revise your enum as necessary, while migrating the data.
        
         | AceJohnny2 wrote:
         | I can't even tell if you're trolling.
        
       | 1oooqooq wrote:
       | jr: add other option
       | 
       | sr: omit other option
       | 
       | illuminated: add other option in front end only and alert when
       | the backend crashes.
        
       | layer8 wrote:
       | Slight counterpoint: Unless there is some guarantee that the
       | respective enum type will never ever be extended with a new
       | value, each and every case distinction on an enum value needs to
       | consider the case of receiving an unexpected value (like Mint in
       | the example). When case distinctions do adhere to that principle,
       | then the problem described doesn't arise.
       | 
       | On the other hand, if the above principle is adhered to as it
       | should, then there is also little benefit in having an Other
       | value. One minor conceivable benefit is that intermediate code
       | can map unsupported values to Other in order to simplify logic in
       | lower-level code. But I agree that it's usually better to not
       | have it.
       | 
       | A somewhat related topic that comes to mind is error codes. There
       | is a common pattern, used for example by the HTTP status codes,
       | where error codes are organized into categories by using
       | different prefixes. For example in a five-digit error code
       | scheme, the first three digits might indicate the category (e.g.
       | 123 for "authentication errors"), and the remaining two digits
       | represent a more specific error condition in that category. In
       | that setup, the all-zeros code in each category represents a
       | generic error for that category (i.e. 12300 would be "generic
       | authentication error").
       | 
       | When implementing code that detects a new error situation not
       | covered by the existing specific error codes, the implementer has
       | now the choice of either introducing a new error code (e.g. 12366
       | -- this is analogous to adding a new enum value), which has to be
       | documented and maybe its message text be localized, or else using
       | the generic error code of the appropriate category.
       | 
       | In any case, when error-processing code receives an unknown --
       | maybe newly assigned -- error code, they can still map it
       | according to the category. For example, if the above 12366 is
       | unknown, it can be handled like 12300 (e.g. for the purpose of
       | mapping it to a corresponding error message). This is quite
       | similar to the case of having an Other enum value, but with a
       | better justification.
        
       | vadim_phystech wrote:
       | ...since the set of all possible behaviour, that is not
       | specified, it much greater, and densier, than one would initially
       | feel and assume, one might cause lot's of possible bad outcomes
       | and success-breaking-points if use "Other" type in their API.
       | Because "Other" if the 1st thing to look for vulnerabilities, for
       | attack vectors. Because the spirit of UB the Terrible lurks
       | there! The spirit of UB feeds upon thee juices of "Other"
       | omnimorphic (fel) type! skvernyi besformennyi "LIuBOI" tip!
       | razvrat i disgarmonichnost'! razlozhenie i redutsiruiushchie
       | geteromorfizmy! decomposition, descriptive semantic matrix rank
       | reduction, richness degradation,
       | devolution...empoorness...scarcity pressure increase...
       | 
       | </shutting_the_fuck_up_my_wetware_machine_whispering_kek>
        
       | akamoonknight wrote:
       | One of the tactics I end up using in Verilog, for better or
       | worse, is to define enums with a'0 value (repeat 0s for the size
       | of the variable), and '1 value (repeat 1s for the size of the
       | value)
       | 
       | '0 stays as "null"-like (e.g INVALID), and '1 (which would be
       | 0xFF in an 8 bit byte for instance) becomes "something, but I'm
       | not sure what" (e.g. UNKNOWN).
       | 
       | Definitely has the same issues as referenced when needing to grow
       | the variable, and the times where it's useful aren't super
       | common, but I do feel like the general concept of an unknown-but-
       | not-invalid value can help with tracking down errors in
       | processing chains Definitely do run into the need to "beware"
       | though with enums for sure.
        
       | _3u10 wrote:
       | I usually use Unknown / Other as 0.
        
       | bob1029 wrote:
       | Making things into enums that shouldn't be enums is a fun trap to
       | fall into. Much of the time what you really want is a complex
       | type so that you can communicate these additional facts. In this
       | case I'd do something like:                 class Widget        {
       | WidgetFlavor Flavor; //Undefined, Vanilla, Chocolate, Strawberry
       | string? OtherFlavor;       }
       | 
       | This is easy to work from a consumer standpoint because if you
       | have a deviant flavor to specify, you don't bother setting the
       | Flavor member to anything at all. You just set OtherFlavor. Fewer
       | moving pieces == less chance for bad times.
       | 
       | The first (default) member in an enum should generally be
       | something approximating "Undefined". This also makes working with
       | serializers and databases easier.
        
         | ryanschaefer wrote:
         | > Fewer moving pieces == less chance for bad times.
         | 
         | Is this not a case for explicitly specifying all flavors? Other
         | flavor has essentially introduced infinite moving pieces.
        
         | IshKebab wrote:
         | This is not a good design. You've introduced representable
         | invalid states (Flavor=Vanilla, Other flavor="DarkChocolate").
         | 
         | At the least you want this...                 enum Flavor {
         | Chocolate,         Banana,         Strawberry,
         | Other(String),       }
         | 
         | But that's not right either. What you really want is
         | #[non_exhaustive]       enum Flavor {         Chocolate,
         | Banana,         Strawberry,       }            impl Display for
         | Flavor ...
        
       ___________________________________________________________________
       (page generated 2025-03-02 23:00 UTC)