[HN Gopher] Introducing Qodo Cover: Automate Test Coverage
___________________________________________________________________
Introducing Qodo Cover: Automate Test Coverage
Author : timbilt
Score : 12 points
Date : 2024-12-04 17:10 UTC (5 hours ago)
(HTM) web link (www.qodo.ai)
(TXT) w3m dump (www.qodo.ai)
| m3kw9 wrote:
| Why can't i just use cursor to just "generate tests" instead?
| timbilt wrote:
| > validates each test to ensure it runs successfully, passes,
| and increases code coverage
|
| This seems to be based on the cover agent open source which
| implements Meta's TestGen-LLM paper.
| https://www.qodo.ai/blog/we-created-the-first-open-source-im...
|
| After generating each test, it's automatically run -- it needs
| to pass and increase coverage, otherwise it's discarded.
|
| This means you're guaranteed to get working tests that aren't
| repetitions of existing tests. You just need to do a quick
| review to check that they aren't doing something strange and
| they're good to go.
| torginus wrote:
| What the reasoning behind generating tests until they pass?
| Isn't the point of tests to discover erroneous corner cases?
|
| What purpose does this serve besides the bragging rights of
| 'we need 90% coverage otherwise Sonarqube fails the build'?
| timbilt wrote:
| Unit tests are more commonly written to future proof code
| from issues down the road, rather than to discover existing
| bugs. A code base with good test coverage is considered
| more maintainable -- you can make changes without worrying
| that it will break something in an unexpected place.
|
| I think automating test coverage would be really useful if
| you needed to refactor a legacy project -- you want to be
| sure that as you change the code, the existing
| functionality is preserved. I could imagine running this to
| generate tests and get to good coverage before starting the
| refactor.
| HideousKojima wrote:
| >Unit tests are more commonly written to future proof
| code from issues down the road, rather than to discover
| existing bugs. A code base with good test coverage is
| considered more maintainable -- you can make changes
| without worrying that it will break something in an
| unexpected place.
|
| The problem is a lot of unit tests could accurately be
| described as testing "that the code does what the code
| does." If the future changes to your code also require
| you to modify your tests (which they likely will) then
| your tests are largely useless. And if tests for parts of
| your code that you _aren 't_ changing start failing when
| you make code changes, that means you made terrible
| design decisions in the first place that led to your code
| being too tightly coupled (or had too many side effects,
| or something like global mutable state).
|
| Integration tests are far, far more useful than unit
| tests. A good type system and avoiding the bad design
| patterns I mentioned handle 95% of what unit tests could
| conceivably be useful for.
| torginus wrote:
| I disagree in my experience, poorly designed tests test
| implementation rather than behavior. To test behavior you
| must know what is actually supposed to happen when the
| user presses a button.
|
| One of the issues with getting high coverage is that
| often tests need to be written for testing
| implementation, rather than desired outcomes.
|
| Why is this an issue? As you mentioned, testing is useful
| for future proofing codebases and making sure changing
| the code doesn't break existing use cases.
|
| When test look for desired behavior, this usually means
| that unless the spec changes, all tests should pass.
|
| The problem is when you test implementation - suppose you
| do a refactoring, cleanup, or extend the code to support
| future use cases - the test start failing. Clearly
| something must be changed in the tests - but what? Which
| cases encode actual important rules about how the code
| should behave, and which ones were just tautologically
| testing that the code did what it did?
|
| This introduces murkiness and diminishes the value of
| tests.
| swyx wrote:
| congrats team! we just had Itamar back on the pod who
| reintroduced Qodo, AlphaCodium and teased Qodo Cover:
| https://www.latent.space/p/bolt
| foundry27 wrote:
| First off, congratulations folks! It's never easy getting a new
| product off the ground, and I wish you the best of luck. So
| please don't take this as anything other than genuine
| constructive criticism as a potential customer: generating tests
| to increase coverage is a misunderstanding of the point of
| collecting code coverage metrics, and businesses that depend on
| getting verification activities right will know this when they
| evaluate your product.
|
| A high-quality test passes when the functionality of the software
| under test is consistent with the design intent of that software.
| If the software doesn't do the Right Thing, the test must fail.
| It's why TDD is effective: you're essentially specifying the
| intent and then implementing code against it, like a self-
| verifying requirements specification. When we look at Qodo tests
| in the GitHub MRs you've linked, it's argued that a high-quality
| test is defined as one that:
|
| 1. Executes successfully
|
| 2. Passes all assertions
|
| 3. Increases overall code coverage
|
| 4. Tests previously uncovered behaviors (as specified in the LLM
| prompt)
|
| So, given source code for a project as input, a hypothetical
| "perfect AI" built into Qodo that always writes a high-quality
| test would (naturally!) _never fail_ to write a passing test for
| that code; the semantics of the code would be perfectly encoded
| in the test. If the code had a defect, it follows logically that
| optimizing the quality of your AI for the metrics Qodo is aiming
| for will actually LOWER the probability of finding that defect!
| The generated test would have successfully managed to validate
| the code against itself, enshrining defective behavior as
| correct. It's easy to say that higher code coverage is good, more
| maintainable, etc., but this outcome is actually the exact
| opposite of maintainable and actively undermines confidence in
| the code under test and the ability to refactor.
|
| There are better ways to do this, and you've got competitors who
| are already well on the way to doing them using a diverse range
| of inputs besides code. It boils down to answering two questions:
|
| 1. Can a technique be applied so that a LLM, with or without
| explicit specifications and understanding of developer
| intentions, will reliably reconstruct the intended behavior of
| code?
|
| 2. Can a technique be applied so that tests generated by a LLM
| truly verify the specific behaviors the LLM was prompted to test,
| as opposed to writing a valid test but not the one that was asked
| for?
___________________________________________________________________
(page generated 2024-12-04 23:02 UTC)