[HN Gopher] Show HN: Retry a command with exponential backoff an...
       ___________________________________________________________________
        
       Show HN: Retry a command with exponential backoff and jitter (+
       Starlark exprs)
        
       Author : networked
       Score  : 65 points
       Date   : 2024-11-15 08:04 UTC (5 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | evgpbfhnr wrote:
       | Looks similar to https://github.com/rye/eb
        
         | networked wrote:
         | Thanks. I have added eb to
         | https://github.com/dbohdan/recur#alternatives.
        
           | mariusor wrote:
           | I created a library for Go that includes retry mechanisms. I
           | wouldn't call it an alternative, but it can be used to do
           | similar things:
           | https://pkg.go.dev/git.sr.ht/~mariusor/ssm#example-Retry
        
           | chubot wrote:
           | Great list! I appreciate whenever an open source project
           | links to similar ones
           | 
           | (FWIW my own contribution is https://github.com/oils-for-
           | unix/oils/wiki/Alternative-Shell... )
        
       | greatgib wrote:
       | Typically the kind of library that is useless and root cause of
       | the dependency hell we are now living in.
       | 
       | That kind of simple things should be a basic inside once program
       | or at worse a simple snipper copied from stack overflow or
       | anything like that
        
         | yoavm wrote:
         | It does not seem to be a library at all, so very little to do
         | with dependency hell. It's something you prepend to your
         | commands if you want them to retry until they succeed. Seems
         | pretty useful to me.
        
           | bobnamob wrote:
           | Also, retries are more nuanced than most people expect, see
           | [1][2]. Getting them right is exactly something I'd
           | appreciate in a library and not something I'd want to
           | reimplement per project/service.
           | 
           | [1] https://brooker.co.za/blog/2022/02/28/retries.html
           | 
           | [2] https://medium.com/yandex/good-retry-bad-retry-an-
           | incident-s...
        
         | crest wrote:
         | A non-trivial application should not add a dependency for just
         | exponential backoff + proportional jitter, an easy to use
         | wrapper to put around a quick script is a good idea that and
         | the lack of such a "basic" defensive programming technique has
         | made untold initially small problems a lot worse by creating a
         | thundering herd.
        
       | Terretta wrote:
       | For Python, consider Tenacity:
       | https://tenacity.readthedocs.io/en/latest/
       | 
       | At the CLI, this is nice for not depending on Node.
        
         | lordswork wrote:
         | Does recur depend on node?
        
         | derhuerst wrote:
         | for Node.js, consider retry:
         | https://www.npmjs.com/package/retry
        
           | yodon wrote:
           | >For Python, consider Tenacity
           | 
           | >>For node.js, consider retry
           | 
           | For .NET, consider Polly[0]
           | 
           | [0]https://www.pollydocs.org/
        
         | derr1 wrote:
         | Would also recommend opnieuw:
         | https://github.com/channable/opnieuw
        
         | ddorian43 wrote:
         | Stamina for good defaults on Tenacity:
         | https://github.com/hynek/stamina
        
         | maleldil wrote:
         | backoff is good, too: https://github.com/litl/backoff
         | 
         | I moved away from tenacity because of type-checking issues. I
         | might check out stamina next time.
        
       | stevekemp wrote:
       | I have a collection of small sysadming/scripting utilities
       | distributed as a single binary here:
       | 
       | https://github.com/skx/sysbox
       | 
       | One of those is "splay" to sleep a random amount of time, before
       | running a command. Very useful to avoid lots of things running
       | across a fleet at the same time.
        
         | networked wrote:
         | I actually had SysBox starred, but `splay` wasn't on the list
         | of alternatives. This has been fixed.
        
       | iamjackg wrote:
       | Very cool project! Just a suggestion: since you do have pre-built
       | releases on GitHub, you should mention that in the Installation
       | section of your readme.
        
         | networked wrote:
         | Thank you. It is a good suggestion. I have mentioned the
         | binaries in the readme and linked to
         | https://github.com/dbohdan/recur/releases.
        
       | broken_broken_ wrote:
       | Hey, that's funny, I wrote a blog post about the many ways you
       | can implement such a program, and it was discussed on HN:
       | https://news.ycombinator.com/item?id=42103200
        
       | netvarun wrote:
       | This looks cool - will give it a try (hah!) Curious on why you
       | picked starlark instead of cel for the conditional scripting
       | part?
        
         | networked wrote:
         | All right, let me tell you the history of recur to explain this
         | choice. :-)
         | 
         | I wrote the initial version in Python in 2023 and used
         | simpleeval [1] for the condition expressions. The readme for
         | simpleeval states:
         | 
         | > I've done the best I can with this library - but there's no
         | warranty, no guarantee, nada. A lot of very clever people think
         | the whole idea of trying to sandbox CPython is impossible. Read
         | the code yourself, and use it at your own risk.
         | 
         | In early 2024, simpleeval had a vulnerability report that drove
         | the point home [2]. I wanted to switch to a safer expression
         | library in the rare event someone passed untrusted arguments to
         | `--condition` and as a matter of craft. (I like simpleeval, but
         | I think it should be used for expressions in trusted or semi-
         | trusted environments.)
         | 
         | First, I evaluated cel-python [3]. I didn't adopt it over the
         | binary dependency on PyYAML and concerns about maturity. I also
         | wished to avoid drastically changing the condition language.
         | 
         | Next in line was python-starlark-go [4], which I had only used
         | in a project that ultimately didn't need complex config. I had
         | been interested in Starlark for a while. It was an attractive
         | alternative to conventional software configuration. I saw an
         | opportunity to really try it.
         | 
         | A switch to python-starlark-go would have made platform-
         | independent zipapps I built with shiv [5] no longer an option.
         | This was when I realized I might as well port recur to Go, use
         | starlark-go natively, and get static binaries out of it. I
         | could have gone with cel-go, but like I said, I was interested
         | in Starlark and wanted to keep expressions similar to how they
         | were with simpleeval.
         | 
         | [1] https://github.com/danthedeckie/simpleeval
         | 
         | [2] https://github.com/danthedeckie/simpleeval/issues/138
         | 
         | [3] https://github.com/cloud-custodian/cel-python
         | 
         | [4] https://github.com/caketop/python-starlark-go
         | 
         | [5] https://github.com/linkedin/shiv#gotchas
        
       | itslennysfault wrote:
       | If I had a nickel for every time I've written exponential backoff
       | with jitter I'd have like several nickels.
        
       | jstanley wrote:
       | My view is that you basically never want exponential backoff.
       | 
       | The only time exponential backoff is useful is if the failure is
       | due to a _rate limit_ and you specifically need a mechanism to
       | reduce the rate at which you are attempting to use it.
       | 
       | In the common case that the thing you're trying to talk is just
       | down, exponential backoff with base N (e.g. wait 2x longer each
       | time) increases your expected downtime by a _factor_ of N (e.g.
       | 2), because by the time your dependency is working again, you may
       | be waiting up to the same amount of time again before you even
       | retry it! Meanwhile, your service is down and your customers can
       | 't use it and your program is doing nothing but sleeping for
       | another 30 minutes before it even _checks_ to see if it can work.
       | 
       | And for what? What is the downside to you if your program retries
       | much more frequently?
       | 
       | I much prefer setting a fixed time period to wait between retries
       | (would you call that linear backoff? no backoff?), so for example
       | if the thing fails you just sleep 1 second and try again,
       | forever. And then your service is working again within 1 second
       | of your dependency coming back up.
       | 
       | If you really must use exponential backoff then pick a quite-low
       | upper bound on how long you'll wait between retries. It is
       | extremely frustrating to find out that something wasn't working
       | just because it was sleeping for a long time because the previous
       | handful of attempts failed.
        
         | jperras wrote:
         | > The only time exponential backoff is useful is if the failure
         | is due to a rate limit and you specifically need a mechanism to
         | reduce the rate at which you are attempting to use it.
         | 
         | That's what you should be using exponential backoff for. In
         | actuality, the new latency introduced by the backoff should be
         | maintained for some time even after a successful request has
         | been received, and gradually over time the interval reduced.
         | 
         | > I much prefer setting a fixed time period to wait between
         | retries (would you call that linear backoff? no backoff?)
         | 
         | I've heard it referred to as truncated exponential backoff.
        
           | dragonwriter wrote:
           | That's only truncated exponential backoff if you do
           | exponential backoff to some point.
           | 
           | If its just a fixed retry interval, then its... a fixed retry
           | interval.
        
         | throwaway314155 wrote:
         | > basically never want exponential backoff.
         | 
         | > [unless] due to a rate limit
         | 
         | Pretty common use-case for automatic retries...
        
         | mplewis wrote:
         | I've used a fixed-time retry before. It DOSed the target
         | server. Now, I use exponential backoff.
        
           | thegrim33 wrote:
           | The jitter part is important too, to spread the retries out
           | more in time.
        
         | dragonwriter wrote:
         | > The only time exponential backoff is useful is if the failure
         | is due to a rate limit and you specifically need a mechanism to
         | reduce the rate at which you are attempting to use it.
         | 
         | Exponential backoff is applicable to any failure where the time
         | it has so far gone unresolved is the primary piece of available
         | data on how lilong it is likely to take before being resolved,
         | which is a very common situation, which is why it is a good
         | default for most situations where you don't have a better
         | knowable-in-advance information at hand and the probability
         | distribution ofn time to resolve, and where delays aren't super
         | costly (though knowledge of when delays become costly can be
         | used to set a cap on exponential backoff, too.)
        
       ___________________________________________________________________
       (page generated 2024-11-20 23:01 UTC)