[HN Gopher] Idempotence Now Prevents Pain Later
___________________________________________________________________
Idempotence Now Prevents Pain Later
Author : zdw
Score : 95 points
Date : 2021-04-07 15:37 UTC (1 days ago)
(HTM) web link (ericlathrop.com)
(TXT) w3m dump (ericlathrop.com)
| ibirman wrote:
| Watch out for edge cases. What happens to accounts that change at
| 11:35PM on the last day of the month?
| bob1029 wrote:
| I feel like immutability and a 100% leak-proof layer of domain
| methods which in turn manage domain mutations would ultimately
| bring more value than explicitly adding idempotence throughout.
|
| If I have an idempotent method like "CreateCustomerRecord", this
| can cause a lot of pain for audit features and other aspects of
| the domain model if it is internally making determinations about
| whether to actually create or silently skip creation. For me, I
| would much rather that the method throw an exception if there is
| a duplicate business key than have it silently complete without
| taking any actual action. Exceptions indicating attempts at
| invalid state transitions can be extremely valuable if you have
| the discipline to create & use them properly.
|
| Generally, seeking idempotence in otherwise mutable methods is a
| band-aid for when you have broken immutability rules and allowed
| things to leak out of the sacred garden of unit-tested state
| machines and other provably-correct code items.
|
| If you should only conditionally execute some method, perhaps the
| solution is to investigate the caller(s) of the method, rather
| than attempt to infer the intent of all possible callers within
| the method itself.
| eweise wrote:
| You have the caller pass in a unique ID to the
| CreateCustomerRecord. Don't create a new customer record if you
| receive a duplicate ID.
| jayd16 wrote:
| This can create false negatives when a request must be retried
| due to network failure but actually succeeded because the
| failure was during the response.
|
| Idempotency is great for "debouncing" requests. If you want to
| tell difference between identical requests that are different
| transactions, add a unique transaction id of some kind.
| adrusi wrote:
| There's a place for idempotency tokens. They're relatively
| easy to retrofit onto old systems, and occasionally they are
| the best way to go about making changes idempotent, but they
| should be a red flag - an indication that you should step
| back and see if maybe you can redesign an API to make
| idempotency a natural guarantee rather than something you
| artificially strap on with a token. As a rule of thumb, I
| would always mention the idea of adding an idempotency token,
| and prompt for alternatives, with all stakeholders present.
| smiley1437 wrote:
| I had only ever encountered idempotence in the context of system
| management (Ansible, Puppet, Chef, etc)
|
| Article made me think it's actually applicable to other
| management as well
| void_mint wrote:
| PUT requests are intended to be idempotent, which is one of the
| things that distinguishes them from POST requests. This (in my
| experience) is most non-CS-backgrounded software developers'
| exposure to idempotence, but it actually has tons of value
| pretty much wherever you can apply it. The ability to
| (sometimes accidentally) do a thing twice and have it leave no
| unintended consequence is huge.
|
| UPSERTs can be idempotent as well. "If this doesn't exist,
| create it, and if it does, update it to match this state",
| implies that running it twice will leave no unintended side
| effects.
| Animats wrote:
| It's a basic property of HTTP "GET". Or it's supposed to be.
| "GET" is not supposed to change server site state. That's what
| "POST" is for. This matters if there's a cache in the middle,
| since caches tend to assume that GET requests are idempotent
| and can be served from cache. Cloudflare assumes that. POST
| requests have to go through to the real server.
| elcomet wrote:
| Get is idempotent because it's the identity function. It does
| not change the state of the data. So it's a trivial case of
| idempotency.
|
| A more interesting function is PUT (idempotent) vs POST (not
| idempotent)
| Justsignedup wrote:
| I've done this exact thing many times before...
|
| I can honestly say that Eric is 100% right with his approach. It
| always leads to less headaches, more flexibility (oh trust me,
| someone is always gonna have a "but... there's like a special
| thing that I sometimes have to do" and it breaks some
| assumptions.
|
| In any case... yeah... let's just say any time you have to be
| worried "did we already schedule this", really think "can this
| never care if it was or not? Should be always safe to schedule it
| again"
| brsg wrote:
| Idempotency is a pretty critical concept in system design, and I
| think most developers have run into issues related to it even if
| they aren't directly familiar with the term.
|
| To give another simple example as the OP - Suppose you have a
| product that relies on time series data. For demo purposes you
| might create a curated data set to present to clients, but the
| presenter doesn't want to show data from 2019 as the "most
| recent"
|
| Naturally, you decide to write a script. Do you
|
| A) Write as script that moves the data forward by 1 week
| explicitly, and simply run this once per week or
|
| B) Write a script that compares the current date to the data and
| moves it forward as much as it needs
|
| At first glance, these two approaches work the same, but what if
| (A) triggers twice? What if it runs once every 6 days by mistake?
| (B) is idempotent however - subsequent executions won't change
| the state. It's usually impossible to predict all of the ways
| that software breaks, but designing with idempotency in mind
| eliminates a lot of them.
| jayd16 wrote:
| I don't think B is technically idempotent either. Change still
| occurs but with minimal difference. You cannot cache the
| results and use them again next week.
|
| An idempotent change would be to pass in the current time
| instead of checking system time. In this case, as long as the
| input is the same, the result is the same. You could use cached
| results, but most likely you want to use new inputs.
| pbreit wrote:
| The idempotency I've seen is usually an unnecessary extra
| complexity.
| jacobsenscott wrote:
| If you design for it from the start it makes your system much
| less complex. Consider all the errors, special cases, and
| ultimately data cleanup you need to handle about if your
| transactions are not idempotent. Idempotency is table stakes
| for any production app.
| mrbadideas wrote:
| Is that really idempotence?
| omarhaneef wrote:
| I'm assuming a lot of people click on it to see what the word
| Idempotence means. From the article:
|
| "Idempotence is the property of a software that when run 1 or
| more times, it only has the effect of being run once."
|
| And the example is, instead of a chron job just running a process
| once a month or on some other schedule, it runs more frequently
| but checks if the change has already been made.
|
| (From the latin Idem which means "same" and potence is of course
| power/potent, so it has the same power/effect however many times
| you run it)
| bobbylarrybobby wrote:
| When writing a Jupyter notebook, always try to make your cells
| idempotent. You'll save yourself a lot of headache down the
| line.
| throwawayboise wrote:
| One of my first jobs was at an investment bank. They had a lot
| of programs that ran overnight, in a batch fashion. Everything
| had to be done before the markets opened the next morning. The
| term they used for idempotency was "free rerun." Being able to
| rerun any program with no special setup work was a high
| priority.
|
| The value in programs being a "free rerun" was that every so
| often the program would barf on a bad bit of data in a record.
|
| The programming environemnt was interpreted BASIC, so if an
| error occurred the program would print a message on the console
| and drop to an interactive prompt.
|
| The operators running the batch schedule would see this and
| call the programmer on call for that night. You'd log in (over
| dial up at this time) and attach to the process, look at the
| error, figure out what went wrong, either correct the data or
| (more likely) skip the record and deal with it the next day. It
| was more important to have the programs finish on time;
| individual issues could be dealt with later.
|
| Often you could just start up the program from where it left
| off, but if things were more screwed up it was important to be
| able to re-run it without any negative consequence.
|
| Edit: this was ~30 years ago, so my point is that it's not any
| kind of new idea or something that wasn't recognized long ago.
| omarhaneef wrote:
| I hope this example makes it evident that one of the primary
| innovations of the last 30 years is defaulting to Latin terms
| so that they are taken more seriously in business and
| technology circles to acquire ... you know... gravitas.
| 6t6t6t6 wrote:
| I used to be an operator in night shift in my twenties and
| the job was exactly how you said. Good memories. Lots of
| sleeping at work and some days of panic when shit broke.
|
| And a lot of "secret" scripts that automated a big part of
| our job.
| treve wrote:
| > And the example is, instead of a chron job just running a
| process once a month or on some other schedule, it runs more
| frequently but checks if the change has already been made.
|
| As a property, I think it's even nicer if a script can
| literally fully run twice and for the outcome to be the same if
| it only ran once (so skipping the 'did I run before?' check).
|
| Even though this check is useful in general, if you can define
| your data in such a way if it _did_ somehow run, that this is
| not destructive / creates incorrect data, it makes the system
| more robust.
|
| Of course this is not always possible though. For example, if
| the process results in an email being sent, you need an
| explicit check to not do that twice.
| gen220 wrote:
| In situations like these, it's a legitimate goal to implement
| an idempotent, or "functional" core.
|
| So the goal of your functional core is to fully construct the
| email, and return it to the caller, who then has the choice
| to send the email, print it, write it to disk, etc.
|
| The program you deploy looks like this
|
| EmailSender().send_email(construct_email(args))
|
| You can test by implementing a "safe" EmailSender interface,
| so that you're executing the same code that's in prod.
|
| In general, if a job/function is mutating state deep in the
| syntax tree (i.e. sending emails in the middle of a batch
| job), I personally see that as a violation of the Single
| Responsibility Principle.
| sdenton4 wrote:
| Mathematically, it's x^2 = x, which implies x^n = x for all
| positive integers n.
|
| Nilpotence (x^2 = 0) is also very helpful some times: it's a
| process which is self-reversing. Like the discrete Fourier
| transform (if you set up the constants properly).
| elcomet wrote:
| self-reversing is not nilpotence.
|
| In mathematics, a self-reversing function is called an
| involution, and it's f^2 (or f(f) ) = Id, the identity
| function.
|
| Nilpotence is very different. It says that if you apply your
| function a certain number of times, you end up with zero no
| matter what the input is. For example, projection on x axis +
| 90 deg rotation of a vector is nilpotent.
| carreau wrote:
| No, you are confusing with involution. 1/x is an involution.
| Symetries are often involutions.
|
| Squaring a upper triangular matrix with 0 on the diagonal is
| nilpotent. Derivatiting a polynomial of degree N is nilpotent
| after N iteration.
| contravariant wrote:
| Careful, for nilpotence the power doesn't have to be 2.
|
| Also you may be confusing it with x^n = 1 (which I'm not sure
| how to name, 'root of unity' perhaps). This would be the case
| for the Fourier transform (with n=4).
|
| If x^2 = 0 then applying the Fourier transform twice would
| null your function, which isn't the case.
| pdpi wrote:
| X^2 is a weird way to describe it. A function f is idempotent
| if f(f(x)) = f(x).
| creata wrote:
| It's not _that_ weird. People often write iterated
| composition as f^k, and this is especially true with
| matrices, where composition and multiplication mean the
| same thing.
| corty wrote:
| It is quite common in some fields. Operator application is
| written without parentheses, and functions are a kind of
| operator. Therefore:
|
| f(x) = f(f(x)) = f f x = f^2 x = f x
|
| And leaving out the x, because it is just a placeholder
| anyways:
|
| f f = f^2 = f
|
| And of course this means that
|
| f^n = f because f^n-1 f = f^n-1 by induction.
| globular-toast wrote:
| I try to write idempotent software whenever I can. It's usually
| not much more difficult to make it work and affords so much more
| flexibility and less worry when it's done.
| staticassertion wrote:
| https://lostechies.com/jimmybogard/2013/06/06/acid-2-0-in-ac...
|
| If you can build a system with ACID 2.0 life gets really easy.
| You can reason about your system without worrying about ordering,
| time, 'exactly once' semantics, etc.
|
| Idempotency is usually one of the simplest pieces to implement,
| and you definitely get a ton of benefit right off the bat - it's
| worth designing systems from scratch with it in mind.
| [deleted]
| firebaze wrote:
| We had one "special" team member who insisted on everything being
| idempotent.
|
| This was his only leading principle. Result: absolute chaos - the
| code aspired to be idempotent, but due to idempotency he avoided
| thinking problems through and just created a mess of individual
| functions - each being idempotent, aside from the unavoidable
| bugs - which didn't form a coherent flow at all.
|
| We did a major refactoring, threw out about all that code,
| rewrote everything in a logical manner. Now everything is still
| idempotent, but comprehensible.
|
| TLDR: idempotency is the same snakeoil as the majority of guiding
| principles: alone, it doesn't help at all. There are lots of
| other factors to consider, which make the developer/architect
| role demanding (and fun).
|
| Craftmanship at least, a sense for architecture (better) or
| understanding the whole picture of the requirements as a team of
| developers (best) is still required.
| longhairedhippy wrote:
| I don't see this as any reflection on idempotency as principle
| (or other principles in general). Building systems poorly,
| without a plan, and no testing, will result in a bug-riddled
| mess, regardless of what pattern is being used.
| smitty1e wrote:
| Sure, one little hobby horse, e.g. "inversion of control" can
| run amok to negative effect (looking at you, Java projects with
| object traces 75 layers deep) but that doesn't make idempotency
| or inversion of control into bad ideas.
|
| A bit of pragmatism goes a long way, like Python's odd
|
| x = (some_tuple,)
|
| . . .syntax amidst its generally clean approach.
|
| Inflexibility itself is the bugaboo.
| spaetzleesser wrote:
| That is a general problem with any principle used as rigid
| ideology. Almost very principle becomes a problem if applied
| too dogmatically . This applies to software dev but also others
| like politics or economics.
| telekid wrote:
| In general, you should be thinking about the delivery semantics
| of the systems calling your code. Many very useful callers offer
| "at least once" delivery guarantees, implying that your system
| should behave idempotently to their calls.
| dang wrote:
| Possibly related past threads:
|
| _What Is Idempotence?_ -
| https://news.ycombinator.com/item?id=19570815 - April 2019 (51
| comments)
|
| _Idempotence: What is it and why should I care?_ -
| https://news.ycombinator.com/item?id=17804617 - Aug 2018 (73
| comments)
|
| _You know how HTTP GET requests are meant to be idempotent?_ -
| https://news.ycombinator.com/item?id=16964907 - May 2018 (304
| comments)
|
| _Implementing Stripe-Like Idempotency Keys in Postgres_ -
| https://news.ycombinator.com/item?id=15569478 - Oct 2017 (41
| comments)
|
| _APIs, robustness, and idempotency_ -
| https://news.ycombinator.com/item?id=13707681 - Feb 2017 (50
| comments)
|
| _A simple distributed algorithm for small idempotent
| information_ - https://news.ycombinator.com/item?id=7276491 - Feb
| 2014 (14 comments)
|
| _Idempotent Web APIs: What benefit do I get?_ -
| https://news.ycombinator.com/item?id=5662138 - May 2013 (53
| comments)
|
| A word like that is particularly easy to search for:
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
| tigger0jk wrote:
| > 1. Query the database to find all dormant accounts with a
| balance, which haven't been charged the fee this month.
|
| > 2. Charge each of these accounts a fee
|
| > 3. Setup a cron job to run this every hour
|
| Note that if this job ever runs successfully, but takes more than
| an hour, you will double-count. Can easily happen if the box
| running these crons is overloaded. One fix is to automatically
| halt the job after 55 minutes, another would be to have the
| middle step be impotent, for each user you're doing the process
| on, ensure (ideally in a threadsafe manner) that they need the
| operation to be done still.
| alex_young wrote:
| Sounds like a good reason to use a pidfile or mutex so you can
| eliminate the possibility of any concurrent jobs.
| jchw wrote:
| This is good but not enough. You also need to be sure that you
| can't charge twice if the job runs twice. When you do that same
| query twice, you will get the same list of users. This could be
| done by exploiting database consistency rules, like using
| strongly isolated transactions. One simple more general approach
| is to use an idempotence token. You could, say, have a table with
| a uniqueness constraint, and generate IDs that will match for the
| same user in the same month. Then add that in the same
| transaction that subtracts the money. The table could be cleaned
| up periodically.
|
| If you're making or using an API where repeating would be bad,
| consider using idempotency tokens for those too. I believe Stripe
| supports them. The basic idea is the same: if you pass a token
| into them, they will guarantee that in a certain time frame, no
| other requests with that ID can be duplicated. This is useful
| when the network flakes during the response. Is it safe to retry?
|
| Things get trickier when you combine network and database
| consistency measures; that's when you get into locks and multi
| stage commit and etc. and it helps to know your database's
| consistency model, since it's often not as solid as you think!
| (In the past, even PostgreSQL had issues with providing
| serializable isolation.)
___________________________________________________________________
(page generated 2021-04-08 23:00 UTC)