[HN Gopher] API design note: Beware of adding an "Other" enum value
___________________________________________________________________
API design note: Beware of adding an "Other" enum value
Author : luu
Score : 80 points
Date : 2025-02-27 10:47 UTC (3 days ago)
(HTM) web link (devblogs.microsoft.com)
(TXT) w3m dump (devblogs.microsoft.com)
| zdw wrote:
| I wonder how this aligns with the protobuf best practice of
| having the first value be UNSPECIFIED:
|
| https://protobuf.dev/best-practices/dos-donts/#unspecified-e...
| jmole wrote:
| The example code used added "other" as the last option, which
| was the source of the problems he described.
|
| This doesn't happen when you make the first value in the enum
| unknown/unspecified
| plorkyeran wrote:
| No, the problem described in the article is entirely
| unrelated to where in the enum the Other option is located.
| There is a different problem where _keeping_ the Other option
| at the end of the enum changes the value of Other, but that
| is not the problem that the article is about.
| jmole wrote:
| Well it simplifies the logic considerably - if you see an
| enum value you don't recognize (mint), you treat it as
| uninitialized (0).
|
| So any future new flavor will be read back as '0' in older
| versions.
| beart wrote:
| "Unspecified" is semantically different from "other". The
| former is more like a default value whereas the latter is
| actually "specified, but not one of these listed options".
| hamandcheese wrote:
| Standard practice in protobuf is to never assign semantic
| meaning to the default value. I think some linters enforce
| that enum 0 is named "unknown" which is actually more
| semantically correct than "other" or "unspecified".
| seeknotfind wrote:
| This is the same as a null pointer, and the requirement is very
| deeply tied to protobuf as it is used on large distributed
| systems that always need to handle version mismatch, and this
| advice doesn't necessarily apply to API design in general.
| eddd-ddde wrote:
| Even in the simplest web apps you can encounter version
| mismatch when a client requests a response from a server that
| just updated.
| hansvm wrote:
| Hence the advice to make that situation not happen. Update
| the client and server to support both versions and prefer
| the new one, then update both to not support the old
| version. With load balancers and other real-world problems
| you might have to break that down into 4 coordinated steps.
| Joker_vD wrote:
| That only really works if you control the clients, or can
| force them to update.
| LoganDark wrote:
| > or can force them to update.
|
| I've used a few clients that completely lock me out for
| every tiniest minor version update. Very top-tier
| annoying imho.
| seeknotfind wrote:
| This implies an API where the server has a single shared
| implementation. Imagine for instance that the server
| implements a shim for each version of the interface, then
| there isn't a need for the null in the API. Imagine another
| alternative, that the same API never adds a field, but you
| add a new method which takes the new type. Imagine yet
| again an API where you are able to version the clients in
| lockstep. So, it's a decision about how the API is used and
| evolves that recommends the API encoding or having a null
| default. However in a different environment or with
| different practices, you can avoid the null. Of course the
| reason to avoid the null is so that you can statically
| enforce this value is provided in new clients, though this
| also assumes your client language is typed. So in the end,
| protobuf teaches us, but it's not always the best in every
| situation.
| bocahtie wrote:
| When the deserializing half of the protobuf definitions
| encounter an unknown value, it gets deserialized as the zero
| value. When that client updates, it will then be able to
| deserialize the new value appropriately (in this case, "Mint").
| The advice on that page also specifies to not make the value
| semantically meaningful, which I take to mean to never set it
| to that value explicitly.
| chen_dev wrote:
| > it gets deserialized as the zero value
|
| It's more complicated:
|
| https://protobuf.dev/programming-guides/enum/
|
| >> What happens when a program parses binary data that
| contains field 1 with the value 2?
|
| >- Open enums will parse the value 2 and store it directly in
| the field. Accessor will report the field as being set and
| will return something that represents 2.
|
| >- Closed enums will parse the value 2 and store it in the
| message's unknown field set. Accessors will report the field
| as being unset and will return the enum's default value.
| NoboruWataya wrote:
| > Just document that the enumeration is open-ended, and programs
| should treat any unrecognized values as if they were "Other".
|
| Possibly just showing my lack of knowledge here but are open-
| ended enumerations a common thing? I always thought the whole
| point of an enum is that it is closed-ended?
| sd9 wrote:
| I've worked on systems which where the set of enum values was
| fixed at any particular point in time, but could change over
| time as business requirements changed.
|
| For instance, we had an enum that represented a sport that we
| supported. Initially we supported some sports (say FOOTBALL and
| ICE_HOCKEY), and over time we added support for other sports,
| so the enum had to be expanded.
|
| Unfortunately this always required the entire estate to be
| redeployed. Thankfully this didn't happen often.
|
| At great expense, we eventually converted this and other enums
| to "open-ended" enums (essentially Strings with a bit more
| structure around them, so that you could operate on them as if
| they were "real" enums). This made upgrades significantly
| easier.
|
| Now, whether those things should have been enums in the first
| place is open for debate. But that decision had been made long
| before I joined the team.
|
| Another example is gender. Initially an enum might represent
| MALE, FEMALE, UNKNOWN. But over time you might decide you have
| need for other values: PREFER_NOT_TO_SAY, OTHER, etc.
| XorNot wrote:
| The first time you have to add a new schema value, you'll
| realise you needed "unknown" or similar - because during an
| upgrade your old systems need a way to deal with new values (or
| during a rollback you need to handle new entries in the
| database).
| sitkack wrote:
| Your comment is the only in the entire discussion that
| mentions "schema". Having an "other" in a schema is a way to
| ensure you can run n and n+1 versions at the same time.
|
| It is Data Model design, of which API design a subset.
|
| You can only ever avoid having an other if 1) your schema is
| fixed and 2) if it is total over the universe of values.
| hansvm wrote:
| It's common when mixing many executables over time.
|
| I prefer to interpret those as an optional/nullable _closed_
| enum (or, situationally, a parse error) if I have to switch on
| them and let ordinary language conventions guide my code rather
| than having to understand some sort of pseudo-null without
| language support.
|
| In something like A/B tests it's not uncommon to have something
| that's effectively runtime reflection on enum fields too. Your
| code has one or more enums of experiments you support. The UI
| for scaling up and down is aware of all of those. Those two
| executables have to be kept in sync somehow. A common solution
| is for the UI to treat everything as strings with weights
| attached and for the parsers/serializers in your application
| code to handle that via some scheme or another (usually
| handling it poorly when people scale up experiments that no
| longer exist in your code). The UI though is definitely open-
| ended as it interprets that enum data, and the only question is
| how it's represented internally.
| int_19h wrote:
| Both are valid depending on what you're modelling.
|
| As far as programming languages go, all enums are explicitly
| open-ended in C, C++, and C#, at least, because casting an
| integer (of the underlying type) to enum is a valid operation.
| jay_kyburz wrote:
| My pet hate is when folks start doing math on enums or
| assuming ranges of values within an enum have meaning.
| tbrownaw wrote:
| Does a foreign key count as an enum type?
| fweimer wrote:
| Enumerations are open-ended in C and C++. They are just integer
| types with some extra support for defining constants (although
| later C++ versions give more control over the available
| operations).
| o11c wrote:
| The approach in the link is fine for _consumers_ , but for
| producers you really do need some way of saying "create a value
| that's not one of the known values". Still, there's nothing that
| says this needs to be pretty.
| coin wrote:
| Just call it "unknown" or "unspecified" or better yet use an
| optional to hold the enum.
| 101011 wrote:
| This ended up being the preferred pattern we moved into.
|
| If, like us, you were passing the object between two
| applications, the owning API would serialize the enum value as
| a String value, then we had a client helper method that would
| parse the string value into an Optional enum value.
|
| If the original service started transferring a new String
| object between services, it wouldn't break any downstream
| clients, because the clients would just end up with Optional
| empty
| janci wrote:
| How that works when you need to distinguish between "no value
| provided" and "a value that is not in the list" - in some
| applications they have different semantics.
| remram wrote:
| Rust has the "non_exhaustive" attribute that lets you declare
| that an enum might get more fields in the future. In practice
| that means that when you match on an enum value, you have to add
| a default case. It's like a "other" field in the enum except you
| can't reference it directly, you use a default case.
|
| IIRC a secret 'other' field (or '__non_exhaustive' or something)
| is actually how we did thing before non_exhaustive was
| introduced.
| airstrike wrote:
| TIL
|
| https://doc.rust-lang.org/reference/attributes/type_system.h...
| kibwen wrote:
| Note that the stance of the OP here is broadly in agreement
| with what Rust does. His main objection is this:
|
| _> The word "other" means "not mentioned elsewhere", so the
| presence of an Other logically implies that the enumeration is
| exhaustive._
|
| In Rust, because all enums are exhaustive by default and
| exhaustive matching is enforced by the compiler, there is no
| risk of this sort of confusion. And then the fact that his
| proposed solution is:
|
| _> Just document that the enumeration is open-ended_
|
| The non_exhaustive attribute is effectively compiler-enforced
| documentation; users now cannot forget to treat the enum as
| open-ended.
|
| Of course, adding non_exhaustive to Rust was not without its
| own detractors; it usage for any given enum fundamentally means
| shifting power away from library consumers (who lose the
| ability to guarantee exhaustive matching) and towards library
| authors (who gain the ability to evolve their API without
| causing guaranteed compilation errors in all of their users
| (which some users desire!)). As such, the guidance is that it
| should be used sparingly, mostly for things like error types.
| But that's an argument against open-ended enums in general, not
| against the mechanisms we use to achieve those (which, as you
| say, was already possible in Rust via hacks).
| tyre wrote:
| Maybe there should be a compiler option or function to assert
| that a match is exhaustive. If the match does not handle a
| defined case, it blows up.
| aecsocket wrote:
| Rust already asserts that a match is exhaustive at compile
| time - if you don't include a branch for each option, it
| will fail to compile. This extends to integer range
| matching and string matching as well.
|
| It's just that with #[non_exhaustive], you _must_ specify a
| default branch (`_ = > { .. }`), even if you've already
| explicitly matched on all the values. The idea being that
| you've written code which matches on all the values _which
| exist right now_ , but the library author is free to add
| new variants without breaking your code - since it's now
| your responsibility as a user of the library to handle the
| default case.
| hchja wrote:
| This is why language syntax is so important.
|
| Swift allows a 'default' enum case which is similar to other
| but you should use it with caution.
|
| It's better to not use it unless you're 110% sure that there
| will not be additional enums added in the future.
|
| Otherwise, in Swift when you add an additional enum case, the
| code where you use the enum will not work unless you handle
| each enum occurrence at it's respective call site.
| layer8 wrote:
| The better solution is to have two different "default" cases
| in the language, one that expresses handling "future" values
| (values that aren't currently defined), and one that
| expresses "the rest of the currently defined values". The
| "future" case wouldn't be considered for exhaustiveness
| checks.
| SkiFire13 wrote:
| What would the "future" default case actually do though?
| When you're in the past there's no value for it, and the
| moment you get to the future the values will become part of
| the "present" and will still not fall under the "future"
| case. You would need some kind of versioning support in the
| enum itself, but that's a much bigger change.
| layer8 wrote:
| "Future" values only become defined ("present" in your
| sense) at compile-time, but may occur before that at
| runtime. Note that this mostly presumes a language with
| separate compilation, or situations like coding against a
| remote-API spec, where the server may deploy a newer
| version but your client remains unchanged. Once you
| compile against the new spec, you'd get errors/warnings
| about the new, not explicitly handled values, but your
| existing binary would nevertheless handle those values
| under the "future" case.
|
| The issue with traditional "default" cases is that they
| shadow warnings/errors about unhandled cases, but you'd
| still want to have some form of default case for forward
| compatibility.
| mayoff wrote:
| Swift allows an enum to be marked `@frozen`, which is an
| API (and ABI) stability guarantee that the enum will never
| gain more cases. Apple uses this quite sparingly in their
| APIs.
|
| Swift also has two versions of a `default` case in switch
| statements, like you described. It has regular `default`
| and it has `@unknown default`. The `@unknown default` case
| is specifically for use with non-frozen enums, and gives a
| warning if you haven't handled all known cases.
|
| So with `@unknown default`, the compiler tells you if you
| haven't been exhaustive (vs. the current API), but doesn't
| complain that your `@unknown default` case is unreachable.
| sunshowers wrote:
| There is currently a missing middle ground in stable Rust,
| which is to _lint_ on a missing variant rather than fail
| compilation. There 's an unstable option for it, but it would
| be very useful for non-exhaustive enums where consumers care
| about matching against every known variant.
|
| You can practically use it today by gating on a nightly-only
| cfg flag. See https://github.com/guppy-
| rs/guppy/blob/fa61210b67bea233de52c... and
| https://github.com/guppy-rs/guppy/blob/fa61210b67bea233de52c...
| esafak wrote:
| Just add a free-form text field to hold the other value, and
| revise your enum as necessary, while migrating the data.
| AceJohnny2 wrote:
| I can't even tell if you're trolling.
| 1oooqooq wrote:
| jr: add other option
|
| sr: omit other option
|
| illuminated: add other option in front end only and alert when
| the backend crashes.
| layer8 wrote:
| Slight counterpoint: Unless there is some guarantee that the
| respective enum type will never ever be extended with a new
| value, each and every case distinction on an enum value needs to
| consider the case of receiving an unexpected value (like Mint in
| the example). When case distinctions do adhere to that principle,
| then the problem described doesn't arise.
|
| On the other hand, if the above principle is adhered to as it
| should, then there is also little benefit in having an Other
| value. One minor conceivable benefit is that intermediate code
| can map unsupported values to Other in order to simplify logic in
| lower-level code. But I agree that it's usually better to not
| have it.
|
| A somewhat related topic that comes to mind is error codes. There
| is a common pattern, used for example by the HTTP status codes,
| where error codes are organized into categories by using
| different prefixes. For example in a five-digit error code
| scheme, the first three digits might indicate the category (e.g.
| 123 for "authentication errors"), and the remaining two digits
| represent a more specific error condition in that category. In
| that setup, the all-zeros code in each category represents a
| generic error for that category (i.e. 12300 would be "generic
| authentication error").
|
| When implementing code that detects a new error situation not
| covered by the existing specific error codes, the implementer has
| now the choice of either introducing a new error code (e.g. 12366
| -- this is analogous to adding a new enum value), which has to be
| documented and maybe its message text be localized, or else using
| the generic error code of the appropriate category.
|
| In any case, when error-processing code receives an unknown --
| maybe newly assigned -- error code, they can still map it
| according to the category. For example, if the above 12366 is
| unknown, it can be handled like 12300 (e.g. for the purpose of
| mapping it to a corresponding error message). This is quite
| similar to the case of having an Other enum value, but with a
| better justification.
| vadim_phystech wrote:
| ...since the set of all possible behaviour, that is not
| specified, it much greater, and densier, than one would initially
| feel and assume, one might cause lot's of possible bad outcomes
| and success-breaking-points if use "Other" type in their API.
| Because "Other" if the 1st thing to look for vulnerabilities, for
| attack vectors. Because the spirit of UB the Terrible lurks
| there! The spirit of UB feeds upon thee juices of "Other"
| omnimorphic (fel) type! skvernyi besformennyi "LIuBOI" tip!
| razvrat i disgarmonichnost'! razlozhenie i redutsiruiushchie
| geteromorfizmy! decomposition, descriptive semantic matrix rank
| reduction, richness degradation,
| devolution...empoorness...scarcity pressure increase...
|
| </shutting_the_fuck_up_my_wetware_machine_whispering_kek>
| akamoonknight wrote:
| One of the tactics I end up using in Verilog, for better or
| worse, is to define enums with a'0 value (repeat 0s for the size
| of the variable), and '1 value (repeat 1s for the size of the
| value)
|
| '0 stays as "null"-like (e.g INVALID), and '1 (which would be
| 0xFF in an 8 bit byte for instance) becomes "something, but I'm
| not sure what" (e.g. UNKNOWN).
|
| Definitely has the same issues as referenced when needing to grow
| the variable, and the times where it's useful aren't super
| common, but I do feel like the general concept of an unknown-but-
| not-invalid value can help with tracking down errors in
| processing chains Definitely do run into the need to "beware"
| though with enums for sure.
| _3u10 wrote:
| I usually use Unknown / Other as 0.
| bob1029 wrote:
| Making things into enums that shouldn't be enums is a fun trap to
| fall into. Much of the time what you really want is a complex
| type so that you can communicate these additional facts. In this
| case I'd do something like: class Widget {
| WidgetFlavor Flavor; //Undefined, Vanilla, Chocolate, Strawberry
| string? OtherFlavor; }
|
| This is easy to work from a consumer standpoint because if you
| have a deviant flavor to specify, you don't bother setting the
| Flavor member to anything at all. You just set OtherFlavor. Fewer
| moving pieces == less chance for bad times.
|
| The first (default) member in an enum should generally be
| something approximating "Undefined". This also makes working with
| serializers and databases easier.
| ryanschaefer wrote:
| > Fewer moving pieces == less chance for bad times.
|
| Is this not a case for explicitly specifying all flavors? Other
| flavor has essentially introduced infinite moving pieces.
| IshKebab wrote:
| This is not a good design. You've introduced representable
| invalid states (Flavor=Vanilla, Other flavor="DarkChocolate").
|
| At the least you want this... enum Flavor {
| Chocolate, Banana, Strawberry,
| Other(String), }
|
| But that's not right either. What you really want is
| #[non_exhaustive] enum Flavor { Chocolate,
| Banana, Strawberry, } impl Display for
| Flavor ...
___________________________________________________________________
(page generated 2025-03-02 23:00 UTC)