[HN Gopher] Show HN: FlakyBot - identify and suppress flaky tests
       ___________________________________________________________________
        
       Show HN: FlakyBot - identify and suppress flaky tests
        
       Author : ankitdce
       Score  : 18 points
       Date   : 2021-10-28 17:32 UTC (5 hours ago)
        
 (HTM) web link (www.flakybot.com)
 (TXT) w3m dump (www.flakybot.com)
        
       | tossaway9000 wrote:
       | How about you fix the flaky tests? Am I insane for thinking that?
       | The whole concept of "just reboot it" or "re run it again" and
       | "fixing" the problem is at least one reason the modern world sits
       | on a mountain of complete garbage software.
        
         | ankitdce wrote:
         | Haha great point. Well from what we have learned from our users
         | is "fixing" test typically end up with "delete most of them".
         | Fixing tests can be time consuming effort.
         | 
         | Another way to think about it is, whether Flaky tests are worth
         | keeping? At some point if the tests fail often, do these really
         | add value. And we think - it does. If you are able to identify
         | flakiness from real failure and reduce noise, you can still
         | avoid real failures.
        
           | rio517 wrote:
           | Wow. That works like really poor technical leadership. Fixing
           | flaky tests (as opposed to deleting them) is indeed time
           | consuming, but it is a far cheaper choice than getting to the
           | point your test suite is untrustworthy.
           | 
           | There may be a point where the cost of ownership for a
           | specific test exceeds its utility, but the way to resolve
           | that is usually to reevaluate your code and supporting tests.
           | Suppressing flaky tests seems a very unwise choice.
           | 
           | Perhaps under extreme circumstances and with unhealthy code
           | bases there may be a case for this, but I struggle to imagine
           | it.
        
             | ankitdce wrote:
             | That is a fair argument. Not all organizations have the
             | bandwidth to measure and manage stability of builds. Some
             | companies build internal tools / dev productivity team for
             | this purpose. There are always right intentions to comment
             | out the flaky test with the mindset of coming back to it,
             | but it is also a very low priority item in most cases when
             | you have to ship new features.
             | 
             | Fixing flaky tests can very commonly take longer than
             | writing new tests.
        
         | manacit wrote:
         | This is how we think about testing for the most part - if a
         | test is 'flaky', it gets looked at very quickly, and if it's
         | not urgent (e.g. the behavior is fine and it's actually a
         | flake), it's skipped in code.
         | 
         | Once the test is skipped, a domain expert can come back and
         | take a look and figure out why it was flaky, and fix it.
         | 
         | If it's urgently broken (e.g. there is real impact), we treat
         | it like an incident and gather people with the right context to
         | fix it quickly.
         | 
         | As long as everyone agrees to these norms, it's not a huge
         | burden to keep this up with thousands of tests. People
         | generally write their tests to be more resilient when they know
         | they're on the hook for them not being flaky, and nobody stays
         | blocked for long when they are permitted to skip a flaky test.
        
           | ankitdce wrote:
           | Curious, how often do you see a flaky test in your system? In
           | my past experience at one of the mid-size startups, we used
           | to get a new flaky test almost on a weekly basis in a
           | monorepo. We started the process of actually flagging them as
           | ignored (we created a separate tag for flaky tests), but
           | later realized that the backlog of fixing flaky test never
           | came down.
           | 
           | In another case observed, devs just got used to rerunning the
           | entire suite (the flakiness here was about 10-20%)
        
       | ankitdce wrote:
       | Hi HN, we are Spriha and Ankit building Flakybot is a tool to
       | automatically identify and suppress test-flakiness so that
       | developers are better able to trust their test results.
       | 
       | Most CI systems leave it up to teams to manually identify and
       | debug test flakiness. Since most CI systems today don't handle
       | test reruns, teams just end up with manually rerunning tests that
       | are flaky. Ultimately, tribal knowledge gets built over time
       | where certain tests are known to be flaky, but the flakiness
       | isn't specifically addressed. Our solution, Flakybot, removes one
       | of the hardest parts of the problem: identifying flaky tests in
       | the first place.
       | 
       | We ingest test artifacts from CI systems, and note when builds
       | are healthy (so that we can mark them as "known-good builds" to
       | use while testing for flakiness). This helps automatically
       | identify flakiness, and proactively offer mitigation strategies,
       | both in the short term and long term. You can read more about
       | this here: https://ritzy-angelfish-3de.notion.site/FlakyBot-How-
       | it-work...
       | 
       | We're in the early stages of development and are opening up
       | Flakybot for private beta to companies that have serious test-
       | flakiness issues. The CI systems we currently support are
       | Jenkins, CircleCI and BuildKite, but if your team uses a
       | different CI and has very serious test-flakiness problems, sign
       | up anyway and we'll reach out. During the private beta, we'll
       | work closely with our users to ensure their test flakiness issues
       | are resolved before we open it up more broadly.
        
       | ncmncm wrote:
       | What I want is a tool to make flaky tests fail reliably.
       | 
       | They won't be fixed until they start actually preventing commits.
       | If somebody deletes a test, that is on that person. I don't want
       | a tool automatically suppressing testing.
        
         | ankitdce wrote:
         | That's very interesting feedback. We certainly don't have a way
         | to force simulate failure.
         | 
         | A related capability we are working on is to also rerun the
         | identified flaky tests X times so they pass. This depends on
         | the capabilities of the test runner, so it will work with
         | specific ones first (cypress, pytest, etc). That way you still
         | make sure that flaky tests pass instead of supressing.
        
       | parthi wrote:
       | We've been relying on manual testing so far. We're just starting
       | to think about unit tests and integration tests. We don't know
       | where to start. Would be cool if you could provide guidance on
       | setting up good testing practices in the first place so that we
       | avoid flaky tests all together.
        
         | ankitdce wrote:
         | Yeh flaky tests generally creep up in a big service due to
         | several issues. There are some best practices to avoid the
         | tests that requires some discipline and good oversight! We
         | wrote some stuff around it: https://www.flakybot.com/blog/five-
         | causes-for-flaky-tests
         | 
         | This is by no means an exhaustive list, but our goal with
         | FlakyBot is to get better at identifying root causes as we
         | identify flakiness across the systems.
        
       ___________________________________________________________________
       (page generated 2021-10-28 23:02 UTC)