[HN Gopher] Show HN: Retry a command with exponential backoff an...
___________________________________________________________________
Show HN: Retry a command with exponential backoff and jitter (+
Starlark exprs)
Author : networked
Score : 65 points
Date : 2024-11-15 08:04 UTC (5 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| evgpbfhnr wrote:
| Looks similar to https://github.com/rye/eb
| networked wrote:
| Thanks. I have added eb to
| https://github.com/dbohdan/recur#alternatives.
| mariusor wrote:
| I created a library for Go that includes retry mechanisms. I
| wouldn't call it an alternative, but it can be used to do
| similar things:
| https://pkg.go.dev/git.sr.ht/~mariusor/ssm#example-Retry
| chubot wrote:
| Great list! I appreciate whenever an open source project
| links to similar ones
|
| (FWIW my own contribution is https://github.com/oils-for-
| unix/oils/wiki/Alternative-Shell... )
| greatgib wrote:
| Typically the kind of library that is useless and root cause of
| the dependency hell we are now living in.
|
| That kind of simple things should be a basic inside once program
| or at worse a simple snipper copied from stack overflow or
| anything like that
| yoavm wrote:
| It does not seem to be a library at all, so very little to do
| with dependency hell. It's something you prepend to your
| commands if you want them to retry until they succeed. Seems
| pretty useful to me.
| bobnamob wrote:
| Also, retries are more nuanced than most people expect, see
| [1][2]. Getting them right is exactly something I'd
| appreciate in a library and not something I'd want to
| reimplement per project/service.
|
| [1] https://brooker.co.za/blog/2022/02/28/retries.html
|
| [2] https://medium.com/yandex/good-retry-bad-retry-an-
| incident-s...
| crest wrote:
| A non-trivial application should not add a dependency for just
| exponential backoff + proportional jitter, an easy to use
| wrapper to put around a quick script is a good idea that and
| the lack of such a "basic" defensive programming technique has
| made untold initially small problems a lot worse by creating a
| thundering herd.
| Terretta wrote:
| For Python, consider Tenacity:
| https://tenacity.readthedocs.io/en/latest/
|
| At the CLI, this is nice for not depending on Node.
| lordswork wrote:
| Does recur depend on node?
| derhuerst wrote:
| for Node.js, consider retry:
| https://www.npmjs.com/package/retry
| yodon wrote:
| >For Python, consider Tenacity
|
| >>For node.js, consider retry
|
| For .NET, consider Polly[0]
|
| [0]https://www.pollydocs.org/
| derr1 wrote:
| Would also recommend opnieuw:
| https://github.com/channable/opnieuw
| ddorian43 wrote:
| Stamina for good defaults on Tenacity:
| https://github.com/hynek/stamina
| maleldil wrote:
| backoff is good, too: https://github.com/litl/backoff
|
| I moved away from tenacity because of type-checking issues. I
| might check out stamina next time.
| stevekemp wrote:
| I have a collection of small sysadming/scripting utilities
| distributed as a single binary here:
|
| https://github.com/skx/sysbox
|
| One of those is "splay" to sleep a random amount of time, before
| running a command. Very useful to avoid lots of things running
| across a fleet at the same time.
| networked wrote:
| I actually had SysBox starred, but `splay` wasn't on the list
| of alternatives. This has been fixed.
| iamjackg wrote:
| Very cool project! Just a suggestion: since you do have pre-built
| releases on GitHub, you should mention that in the Installation
| section of your readme.
| networked wrote:
| Thank you. It is a good suggestion. I have mentioned the
| binaries in the readme and linked to
| https://github.com/dbohdan/recur/releases.
| broken_broken_ wrote:
| Hey, that's funny, I wrote a blog post about the many ways you
| can implement such a program, and it was discussed on HN:
| https://news.ycombinator.com/item?id=42103200
| netvarun wrote:
| This looks cool - will give it a try (hah!) Curious on why you
| picked starlark instead of cel for the conditional scripting
| part?
| networked wrote:
| All right, let me tell you the history of recur to explain this
| choice. :-)
|
| I wrote the initial version in Python in 2023 and used
| simpleeval [1] for the condition expressions. The readme for
| simpleeval states:
|
| > I've done the best I can with this library - but there's no
| warranty, no guarantee, nada. A lot of very clever people think
| the whole idea of trying to sandbox CPython is impossible. Read
| the code yourself, and use it at your own risk.
|
| In early 2024, simpleeval had a vulnerability report that drove
| the point home [2]. I wanted to switch to a safer expression
| library in the rare event someone passed untrusted arguments to
| `--condition` and as a matter of craft. (I like simpleeval, but
| I think it should be used for expressions in trusted or semi-
| trusted environments.)
|
| First, I evaluated cel-python [3]. I didn't adopt it over the
| binary dependency on PyYAML and concerns about maturity. I also
| wished to avoid drastically changing the condition language.
|
| Next in line was python-starlark-go [4], which I had only used
| in a project that ultimately didn't need complex config. I had
| been interested in Starlark for a while. It was an attractive
| alternative to conventional software configuration. I saw an
| opportunity to really try it.
|
| A switch to python-starlark-go would have made platform-
| independent zipapps I built with shiv [5] no longer an option.
| This was when I realized I might as well port recur to Go, use
| starlark-go natively, and get static binaries out of it. I
| could have gone with cel-go, but like I said, I was interested
| in Starlark and wanted to keep expressions similar to how they
| were with simpleeval.
|
| [1] https://github.com/danthedeckie/simpleeval
|
| [2] https://github.com/danthedeckie/simpleeval/issues/138
|
| [3] https://github.com/cloud-custodian/cel-python
|
| [4] https://github.com/caketop/python-starlark-go
|
| [5] https://github.com/linkedin/shiv#gotchas
| itslennysfault wrote:
| If I had a nickel for every time I've written exponential backoff
| with jitter I'd have like several nickels.
| jstanley wrote:
| My view is that you basically never want exponential backoff.
|
| The only time exponential backoff is useful is if the failure is
| due to a _rate limit_ and you specifically need a mechanism to
| reduce the rate at which you are attempting to use it.
|
| In the common case that the thing you're trying to talk is just
| down, exponential backoff with base N (e.g. wait 2x longer each
| time) increases your expected downtime by a _factor_ of N (e.g.
| 2), because by the time your dependency is working again, you may
| be waiting up to the same amount of time again before you even
| retry it! Meanwhile, your service is down and your customers can
| 't use it and your program is doing nothing but sleeping for
| another 30 minutes before it even _checks_ to see if it can work.
|
| And for what? What is the downside to you if your program retries
| much more frequently?
|
| I much prefer setting a fixed time period to wait between retries
| (would you call that linear backoff? no backoff?), so for example
| if the thing fails you just sleep 1 second and try again,
| forever. And then your service is working again within 1 second
| of your dependency coming back up.
|
| If you really must use exponential backoff then pick a quite-low
| upper bound on how long you'll wait between retries. It is
| extremely frustrating to find out that something wasn't working
| just because it was sleeping for a long time because the previous
| handful of attempts failed.
| jperras wrote:
| > The only time exponential backoff is useful is if the failure
| is due to a rate limit and you specifically need a mechanism to
| reduce the rate at which you are attempting to use it.
|
| That's what you should be using exponential backoff for. In
| actuality, the new latency introduced by the backoff should be
| maintained for some time even after a successful request has
| been received, and gradually over time the interval reduced.
|
| > I much prefer setting a fixed time period to wait between
| retries (would you call that linear backoff? no backoff?)
|
| I've heard it referred to as truncated exponential backoff.
| dragonwriter wrote:
| That's only truncated exponential backoff if you do
| exponential backoff to some point.
|
| If its just a fixed retry interval, then its... a fixed retry
| interval.
| throwaway314155 wrote:
| > basically never want exponential backoff.
|
| > [unless] due to a rate limit
|
| Pretty common use-case for automatic retries...
| mplewis wrote:
| I've used a fixed-time retry before. It DOSed the target
| server. Now, I use exponential backoff.
| thegrim33 wrote:
| The jitter part is important too, to spread the retries out
| more in time.
| dragonwriter wrote:
| > The only time exponential backoff is useful is if the failure
| is due to a rate limit and you specifically need a mechanism to
| reduce the rate at which you are attempting to use it.
|
| Exponential backoff is applicable to any failure where the time
| it has so far gone unresolved is the primary piece of available
| data on how lilong it is likely to take before being resolved,
| which is a very common situation, which is why it is a good
| default for most situations where you don't have a better
| knowable-in-advance information at hand and the probability
| distribution ofn time to resolve, and where delays aren't super
| costly (though knowledge of when delays become costly can be
| used to set a cap on exponential backoff, too.)
___________________________________________________________________
(page generated 2024-11-20 23:01 UTC)