[HN Gopher] My Python testing style guide (2017)
       ___________________________________________________________________
        
       My Python testing style guide (2017)
        
       Author : rbanffy
       Score  : 176 points
       Date   : 2021-03-24 11:24 UTC (11 hours ago)
        
 (HTM) web link (blog.thea.codes)
 (TXT) w3m dump (blog.thea.codes)
        
       | tjpnz wrote:
       | What are people using in terms of testing frameworks now?
        
         | globular-toast wrote:
         | Most of the codebases I maintain use unittest, but pytest is
         | much better and my preferred framework.
        
           | codethief wrote:
           | What makes pytest so much better in your opinion?
        
           | teddyh wrote:
           | unittest is in the standard library; this counts for _a lot_.
        
             | globular-toast wrote:
             | I used to use unittest for this reason, but it's pretty
             | silly. Having extra dependencies for the tests makes no
             | difference for end users and these days it barely makes a
             | difference to developers.
        
               | codethief wrote:
               | > Having extra dependencies for the tests
               | 
               | What do you mean, _extra_ dependencies? The only
               | difference between pytest and unittest in this regard is
               | that tests using unittest declare their dependency
               | _explicitly_ , using an import[0]. Most pytest tests
               | still implicitly require pytest as a dependency, though.
               | (Think of fixtures etc. etc.)
               | 
               | I actually like unittest's approach here - in my book,
               | explicit is better than implicit.
        
           | Rendello wrote:
           | This is a good talk by the core developer Raymond Hettinger
           | [1]. He prefers pytest too. I don't do any crazy testing, but
           | I really like property-based testing with Hypothesis, which
           | is also mentioned. This video isn't Python but it's a great
           | intro to property-based testing [2].
           | 
           | 1. https://www.youtube.com/watch?v=ARKbfWk4Xyw
           | 
           | 2. https://www.youtube.com/watch?v=AfaNEebCDos
        
       | tcbasche wrote:
       | I've been looking for something like this for ages. I'm excited
       | to try some of this stuff out, like spec'd Mocks.
       | 
       | I'm curious if anyone else who has been drawn in by the allure of
       | the Mock has some strategies to avoid the footguns associated
       | with them? (Python specifically)
        
         | animal_spirits wrote:
         | What footguns are you looking to avoid with mocks? I've been
         | using them for about a year now and haven't run into much
         | issues.
        
           | AlexCoventry wrote:
           | If the mock's model of the mocked-out component is
           | inaccurate, it reduces the relevance of the test using the
           | mock.
        
           | [deleted]
        
           | nerdponx wrote:
           | Check out the talk by Edwin Jung, "Mocking and Patching
           | pitfalls": https://www.youtube.com/watch?v=Ldlz4V-UCFw
        
           | UK-Al05 wrote:
           | I've seen things like the mock returns null for an error. But
           | the real thing throws an exception.
        
         | travisjungroth wrote:
         | I'd really recommend this video. I had seen it years ago and
         | just came back to it. It's about changing your architecture,
         | one of the effects of that being changing how you need/use
         | mock. https://www.youtube.com/watch?v=DJtef410XaM&t
        
         | globular-toast wrote:
         | Just don't use them unless you have to. The two main reasons to
         | mock are for network requests and because something is too
         | slow. Other than that, test things for real. Do not isolate
         | parts of your code from other parts of your code by using
         | mocks. If your code does side effects on your own machine, like
         | writes to the file system, let it write to a temporary
         | directory.
         | 
         | I linked this excellent talk in another thread recently. I'll
         | put it here again: https://www.youtube.com/watch?v=EZ05e7EMOLM
        
           | scrollaway wrote:
           | Indeed. And if you're mocking eg. API calls in a client
           | library, try to have tests for the real things as well. They
           | don't have to be part of your normal test suite, they can run
           | only if env vars are set with the API keys needed.
        
             | tmarice wrote:
             | VCR.py (https://github.com/kevin1024/vcrpy) is a great
             | utility for mocking APIs. It will run each request once,
             | save the responses to YAML files, and then replay the
             | responses every time you re-run the tests. It's also very
             | useful for caching API responses (e.g. you have a trial
             | account with limited number of requests). Unfortunately, if
             | used for testing, it will not cover the case when the
             | original API changes its interface.
        
           | mumblemumble wrote:
           | That Ian Cooper talk is just fantastic. It's perhaps best
           | contribution to the subject of TDD anyone has produced since
           | Kent Beck popularized the idea in the first place.
        
         | f00_ wrote:
         | assert on the call_count attribute of a mock instead of trying
         | to use methods on it like .assert_called_once_with()
         | 
         | "a mock's job is to say, "You got it, boss" whenever anyone
         | calls it. It will do real work, like raising an exception, when
         | one of its convenience methods is called, like
         | assert_called_once_with. But it won't do real work when you
         | call a method that only resembles a convenience method, such as
         | assert_called_once (no _with!)."
         | 
         | https://engineeringblog.yelp.com/2015/02/assert_called_once-...
        
           | alasdairnicol wrote:
           | This behaviour has changed in Python 3.5 [1], and it was also
           | backported to the mock package.
           | 
           | When unsafe=False (the default), accessing an attribute that
           | begins with assert will raise an error.
           | 
           | [1]:
           | https://docs.python.org/3/library/unittest.mock.html#the-
           | moc...
        
       | returningfory2 wrote:
       | The author doesn't like pytest fixtures, but personally they're
       | one of my favorite features of pytest.
       | 
       | Here's an example use case: I have a test suite that tests my
       | application's interactions with the DB. In my experience, the
       | most tedious part of these kinds of tests is setting up the
       | initial DB state. The initial DB state will generally consist of
       | a few populated rows in a few different tables, many linked
       | together through foreign keys. The initial DB state varies in
       | each test.
       | 
       | My approach is to create a pytest fixture for each row of data I
       | want in a test. (I'm using SQLAlchemy, so a row is 1-1 with a
       | populated SQLAlchemy model.) If the row requires another row to
       | exist through a foreign key constraint, the fixture for the child
       | row will depend on the fixture for the parent. This way, if you
       | add the child test fixture to insert the child row, pytest will
       | automatically insert the parent row first. The fixtures
       | ultimately form a dependency tree.
       | 
       | Finally in a test, creating initial DB state is simple: you just
       | add fixtures corresponding to the rows you want to exist in the
       | test. All dependencies will be created automatically behind the
       | scenes by pytest using the fixtures graph. (In the end I have
       | about ~40 fixtures which are used in ~240 tests.)
        
         | epage wrote:
         | I'm mixed on fixtures.
         | 
         | One one hand, I've been impressed with how they compose and
         | have let me do some great things. For example, I had system
         | tests that needed hardware identifiers. I had a `conftest.py`
         | to add CLI args for them. I then made fixtures to wrap the
         | lookup of these. In the fixture, I marked it as Skip if the arg
         | was missing. This was then propagated to all of the tests, only
         | running the ones the end-user had the hardware for.
         | 
         | On the other hand, when I need to vary the data between tests
         | and that data is an input to something that I'd like to
         | abstract the creation of, fixtures break down and I have to
         | instead use a function call.
        
         | emptysea wrote:
         | One thing I've encounter with pytest fixtures is they have a
         | tendency to balloon in size.
         | 
         | We started out with like 50 fixtures, but now we have a
         | conftest.py file that has `institution_1`, ...,
         | `institution_10`.
         | 
         | My end conclusion is that fixtures are nice for some things,
         | like managing mocks, and clearing the databases after tests,
         | but for data it's better to write some functions to create
         | stuff.
         | 
         | So instead of `def
         | test_something(institution_with_some_flag_b)` you'd write in
         | your test body:                   def test_something() -> None:
         | institution = create_institution(some_flag="b")
         | 
         | Also another benefit is you can click into the function whereas
         | fixtures you have to grep.
        
           | sirlantis wrote:
           | I've rewritten a bunch of our tests to this factory pattern
           | last week, too (the factory is a fixture though - FactoryBoy
           | is worth a look).
           | 
           | I'd argue that too many global fixtures in conftest have a
           | high risk of becoming a "Mystery Guests" or too general
           | fixtures. For a test reader it's impossible to know the
           | semantics of "institution_10".
           | 
           | I believe this to be rooted in DRY obsession leading to
           | coupling of tests: "We need a second institution in two
           | modules? Let's lift it up to global!"
        
         | codethief wrote:
         | I'm the exact opposite, I absolutely _hate_ pytest fixtures.
         | They are effectively global state, so adding a fixture
         | _somewhere_ in your code base might affect the tests in a
         | completely different location. This gets even worse with every
         | fixture you add because, being global state, fixtures can
         | interact with one another - often in unexpected ways. Finally,
         | readers unfamiliar with your code won 't know where the
         | arguments for a given `test_xy()` function come from, i.e. the
         | dependency injection is completely unclear and your IDE won't
         | help you much.
         | 
         | There are _so many_ other (better) ways to achieve the same
         | goal, such as decorators or - as already mentioned by emptysea
         | in their sibling comment - explicitly invoking some function
         | from within the test to do the setup /teardown.
        
       | michaericalribo wrote:
       | I'm curious how others test code that operates on large datasets
       | --eg, transformations of a dataframe, parsing complicated
       | responses, important implementations of analytics functions.
       | 
       | I've previously used serialized data--JSON, or joblib if there
       | are complex types (eg, numpy)--but these seem pretty brittle...
        
       | sambalbadjak wrote:
       | I'd add to that, that test should be readable. personally I
       | prefer to use: GIVEN, WHEN, THEN as comments in the tests. Also;
       | it's ok not to be DRY while writing tests.
        
         | mumblemumble wrote:
         | > it's ok not to be DRY
         | 
         | Depending on context and implementation details, I'd say DRYing
         | tests can be anywhere from indispensable to toxic.
         | 
         | I'm fine with creating libraries of shared functionality that
         | tests can use, especially when it helps readability. If you've
         | got several tests with the same precondition, having them all
         | call a function named "givenTheUserHasLoggedIn()" in order to
         | do the setup is a nice readability win. And, since it's a
         | function call, it's not too difficult to pick apart if a test's
         | preconditions diverge from the others' at a later date.
         | 
         | What I absolutely cannot stand is using inheritance to make
         | tests DRY. If you've got an inheritance hierarchy for handling
         | test setup, the cost of implementing a change to the test setup
         | requirements is O(N) where N is the hierarchy depth, with
         | constant factors on the order of, "Welp, there goes my
         | afternoon."
        
           | BurningFrog wrote:
           | I'm an "it depends" fan myself.
           | 
           | It does annoy the many programmers who want clear and
           | absolute rules for everything.
           | 
           | Then again they are always annoyed, living in a world where
           | so many things "depend".
        
           | travisjungroth wrote:
           | I've gotten lured into the inheritance stuff and it's super
           | nice at the very, very beginning and becomes a nightmare to
           | maintain. Obviously a horrible tradeoff for software.
           | 
           | I've found that having a class/function as a parameter and
           | explicitly listing the classes/functions that get tested is a
           | small step back and way easier to maintain and read. It sets
           | off some DRY alarms, cause usually that whole list is just
           | "subclasses of X". And it seems like burden to update. "So if
           | I make a new subclass, I have to add it everywhere?". Yes.
           | Yes you do. Familiarity with the test suite is table stakes
           | for development. You'll need to add your class name to like
           | ten lists, and get 90% coverage for your work, then write a
           | few tests about what's special about your class. When
           | something breaks, you'll know exactly what's being tested.
           | And you'll be able to opt out a class from that test with one
           | keystroke.
           | 
           | That being said... I still have a dream of writing a library
           | for generating tests for things that inherit from
           | collections.abc. Something like "oh, you made a
           | MutableSequence? let's test it works like a list except where
           | you opt-out."
        
         | mxz3000 wrote:
         | The given, when, then breakdown is interesting, though I've
         | never seen language test utilities actually enforce that
         | structure. Maybe an interesting potential experiment
         | (regardless of language) ?
         | 
         | I feel like your last point is especially important. Sooooo
         | many times have I seen over-abstracted unit tests that are
         | unreadable and are impossible to reason about, because somebody
         | decided that they needed to be concise (which they don't).
         | 
         | I'd much rather tests be excessively verbose and
         | obvious/straightforward than over abstracted. It also avoids
         | gigantic test helper functions that have a million flags
         | depending on small variations in desired test behaviour...
        
           | disgruntledphd2 wrote:
           | As always, there are tradeoffs.
           | 
           | Personally, I work with some incredibly (100+line) long
           | "unit" tests and they are a nightmare to work with.
           | 
           | Especially when the logic is repeated across multiple tests,
           | and it's incorrect (or needs to be changed).
           | 
           | I really, really like shorter tests with longer names, but
           | I'd imagine there are definitely pathologies at either end.
        
       | psing wrote:
       | If you're in the serverless space, a useful addendum:
       | https://towardsdatascience.com/how-i-write-meaningful-tests-...
        
       | mumblemumble wrote:
       | Personally, I've come to really dislike test names like
       | "test_refresh_failure". They tell you what component is being
       | tested, but not what kind of behavior is expected. Which can lead
       | to a whole lot unnecessary confusion (or bugs) when you're trying
       | to maintain a test whose implementation is difficult to
       | understand, or if you're not sure it's asserting the right
       | things.
       | 
       | It also encourages tests that do too much. If the test is named
       | "test_refresh", well, it says right there in the name that it's a
       | test for any old generic refresh behavior. So why not just keep
       | dumping assertions in there?
       | 
       | I'm much more happy with names like,
       | "test_displays_error_message_when_refresh_times_out". Right
       | there, you know _exactly_ what 's being verified, because it's
       | been written in plain English. Which means you can recognize a
       | buggy test implementation when you see it, and you know what
       | behavior you're supposed to be restoring if the test breaks, and
       | you are prepared to recognize an erroneously passing test, and
       | all sorts of useful things like that.
        
         | BurningFrog wrote:
         | We don't need to put this much responsibility on test names. If
         | there is more to explain, write a few words in a comment.
        
           | exdsq wrote:
           | You don't see comments in a test report though -- maybe have
           | an optional description as part of the framework for more
           | detail along with -v
        
             | masklinn wrote:
             | `unittest` prints the docstring alongside the name of the
             | test in verbose mode (e.g. failure).
             | 
             | pytest does not though.
        
             | nerdponx wrote:
             | I know that the docstring in Unittest is part of the
             | reported output, and I was pretty sure that it's the same
             | in Pytest.
        
               | masklinn wrote:
               | I would've thought so but no, pytest will show the
               | docstring in `--collect-only -v` (badly), but it doesn't
               | show any sort of description when running the tests, even
               | in verbose mode (see issue #7005 which doesn't seem to be
               | very popular)
        
             | munchbunny wrote:
             | I think this is one of those conventions where your team
             | agrees on one convention, and you just try to follow it
             | consistently. If that's looking at the test in code to find
             | descriptive comments, great. If that's long test names,
             | cool. Just do the same thing consistently.
        
               | exdsq wrote:
               | True - I think it depends on who uses your test outputs.
               | If PMs and BAs care, having a more verbose output
               | (descriptions, pretty graphs, etc) helps a lot. If it's
               | just for devs then they can more happily go through the
               | code base.
        
           | stinos wrote:
           | It's a choice of course, but I look at test functions like
           | other functions. If you can name a function such that it
           | doesn't need a comment (but also doesn't use 100 characters
           | to do that), I'll gladly take that over a comment. Same like
           | in all other code: if you need comments to explain what it
           | does, it's likely not good enough and/or needs to be put in a
           | function which tells what it does. Comments on _why_ it does
           | stuff the way it does are of course where it 's at.
        
           | xapata wrote:
           | s/comment/docstring/
        
             | BurningFrog wrote:
             | I have yet to understand the point of docstrings.
             | 
             | How are they, in practical reality, better than comments?
        
               | powersnail wrote:
               | The difference is in tooling. Docstrings are collected
               | and displayed in many contexts. The intended purpose is
               | writing "documentation" in the same place as code.
               | Comments are only seen if you open the source file.
        
               | BurningFrog wrote:
               | OK, I can see how that's useful for a certain workflow.
               | 
               | The way we work, it's just a different comment syntax. I
               | do like have a dedicated place for it.
        
               | f00_ wrote:
               | you can access it from object.__doc__, and there is
               | tooling in ide's like pycharm to quick view them to see
               | what a function does, auto generated documentation
               | 
               | prior to mypy/type hints it allowed you to document the
               | types of a function
        
           | mumblemumble wrote:
           | I've just never seen the "this belongs in comments" approach
           | work out in practice. Maybe it's something about human
           | psychology. Perhaps things might seem obvious when you've
           | just written the test, so you don't think to comment them.
           | Perhaps code reviewers feel less comfortable saying, "I don't
           | understand this, could you please comment it?" Perhaps it's
           | the simple fact that optional means optional, which means,
           | "You don't have to do it if you aren't in the mood."
           | Regardless of the reason, though, it's a thing I've seen play
           | out so many times that I've become convinced that asking
           | people to do otherwise is spitting into the wind.
        
             | BurningFrog wrote:
             | All this is true, but doesn't it apply just as much to
             | writing descriptive test function names?
             | 
             | Test function names are less sensitive than regular
             | functions, since they're not explicitly called, but I still
             | don't want to read
             | a_sentence_with_spaces_replaced_by_underscores.
        
               | mumblemumble wrote:
               | I don't want to either, but until Python gives us
               | backtick symbols or we something like Kotlin's kotest
               | that lets the test name just be an actual string, that's
               | sort of the choice we're left with. And I'm inclined to
               | take the option that I've known to lead to more
               | maintainable tests over the long run over the option that
               | I've known to engender problems, even if it is harder on
               | the eyes. Form after function.
               | 
               | As far as whether or not people do a better job with
               | descriptive test function names, what I've seen is that
               | they do? I of course can't share any data because this is
               | all in private codebases, so I guess this could quickly
               | devolve into a game of dueling anecdotes. But what I've
               | observed is that people tend to take function names -
               | and, by extension, function naming conventions -
               | seriously, and they are more likely to think of comments
               | as expendable clutter. (Probably because they usually
               | are.) Which means that they'll think harder about
               | function names in the first place, and also means that
               | code reviewers are more likely to mention if they think a
               | function name isn't quite up to snuff.
               | 
               | And I just don't like to cut those sorts of human
               | behavior factors out of the picture, even when they're
               | annoying or hard to understand. Because, at the end of
               | the day, it's all about human factors.
        
               | BurningFrog wrote:
               | I don't disagree with much of this.
               | 
               | I was talking specifically about a `def
               | test_displays_error_message_when_refresh_times_out()`
               | function.
               | 
               | That's too big a name for me to keep in my head, so I'd
               | look for other solutions.
        
             | rbanffy wrote:
             | I really hate, but I can see its virtue, the pylint rule
             | that complains about lacking docstrings.
             | 
             | In tests, however, I prefer not to have them as my favorite
             | test runners replace the full name of the test method with
             | its docstring, which makes it a lot harder to find the test
             | in the code.
        
               | ben509 wrote:
               | My only issue is that the rule doesn't give you a good
               | way to annotate that the object is documented by its
               | signature. Sometimes devs are lazy, but the rule doesn't
               | make them not lazy so you get pointless docs like:
               | def matriculate_flange(via: Worble):
               | "Matriculates flange via a Worble."
        
         | darioush wrote:
         | have you considered using "test_refresh__failure" instead?
         | makes clear "refresh" is the component being tested and
         | "failure" is a description of the behavior
        
         | [deleted]
        
         | rowanseymour wrote:
         | Sometimes there are practical reasons to avoid this approach
         | and have fewer test methods that test multiple behaviors of a
         | single thing. For example a lot of our projects setup and
         | teardown a database between test methods so a few fatter test
         | methods run a lot faster than a large number of small test
         | methods. We rely on good commenting within the methods to
         | understand what exactly is being checked.
        
           | codethief wrote:
           | In that case, why not use some auxiliary function to load the
           | resources (in your case, the database) and decorate it with
           | functools.cache[0] to avoid the function from getting
           | executed multiple times? Sure, this means re-using resources
           | between multiple tests (which is discouraged for good
           | reasons) but your current test effectively does the same
           | thing, the only difference being that everything is being
           | tested inside one single test function.
           | 
           | PS: How come your setup and teardown operations are so
           | expensive in the first place? Why do you even need to set up
           | an entire database? Can't you mock out the database, set up
           | only a few tables or use a lighter database layer?
           | 
           | [0]: https://docs.python.org/3/library/functools.html#functoo
           | ls.c...
        
         | jxub wrote:
         | I think that behave (https://behave.readthedocs.io/en/stable/)
         | is more useful for testing these more real-life usecases. I
         | tend to have a `test_my_function` in pytest tests and the more
         | integration and functionality-related testing in the behave
         | tests.
        
         | wodenokoto wrote:
         | One of the most common test modules in R is called "test that"
         | and you invoke a test by calling a function (rather than
         | defining one) called "test_that", the first argument is a
         | string containing a description of what you want to test and
         | the second argument is the code you want to test.
         | 
         | That way, all your unit tests reads: "test that error message
         | is displayed when refresh times out" etc.
         | 
         | I think it's a really nice way to lay things out and it avoids
         | all the "magic" of some functions being executed by virtue of
         | their name.
        
       | MrPowers wrote:
       | I've found pytest to encourage tests with really long method
       | names, examples from the post:
       | test_refresh_failure       test_refresh_with_timeout
       | 
       | These get even longer like
       | test_refresh_with_timeout_when_username_is_not_found for example.
       | 
       | pytest-describe allows for a much nicer testing syntax. There's a
       | great comparison here: https://github.com/pytest-dev/pytest-
       | describe#why-bother
       | 
       | TL;DR, this is nicer:                 def describe_my_function():
       | def with_default_arguments():         def
       | with_some_other_arguments():
       | 
       | This isn't as nice:                 def
       | test_my_function_with_default_arguments():       def
       | test_my_function_with_some_other_arguments():
        
         | TuringNYC wrote:
         | I just saw the github readme for this project. How is the
         | describe variant different from just grouping the tests
         | together into a module called test_describe_my_function.py and
         | then having smaller named functions inside?
        
           | klenwell wrote:
           | This grouping convention reminds me a lot of Better Specs
           | from the Ruby world:
           | 
           | https://www.betterspecs.org/
           | 
           | With rspec, you use the describe and context keywords.
           | 
           | At one level, yes, it's mainly syntactical sugar. As the
           | test-writer, the two approaches may seem interchangeable.
           | 
           | Where I find it really helps is when I'm not the test-writer
           | but rather I'm reviewing another developer's tests, say in
           | PR. I find this syntax and hierarchy produces a much more
           | coherent test suite and makes it easier for me to twig
           | different use cases and test quality generally.
        
             | theptip wrote:
             | The readme says:
             | 
             | > With pytest, it's possible to organize tests in a similar
             | way with classes. However, I think classes are awkward. I
             | don't think the convention of using camel-case names for
             | classes fit very well when testing functions in different
             | cases. In addition, every test function must take a "self"
             | argument that is never used.
             | 
             | So there's no reason to do this, aside from aesthetics.
             | 
             | I'd recommend against doing un-Pythonic stuff like this, it
             | makes your code harder to pick up for new engineers.
        
               | ben509 wrote:
               | You could call it aesthetics, but it's also readability,
               | and that's an important aspect of tests.
        
         | w0tintarnation wrote:
         | Do you have an opinion on grouping pytest tests in classes?
         | class Test_my_function:             def
         | with_default_arguments(self):             def
         | with_some_other_arguments(self):
         | 
         | If you can make your eye stop twitching after seeing snake
         | cased class names, this is at least another option of grouping
         | tests for a single function.
        
           | poooogles wrote:
           | They're already grouped by module which normally provides
           | enough granularity (in my experience, I've only scaled this
           | up to 50k LOC apps though so YMMV).
        
         | ben509 wrote:
         | I like the concept, but using the profiler to grab locally
         | declared tests is a bit more magic than I'm comfortable with in
         | my tests.
         | 
         | Something like this might be a good compromise:
         | def describe_my_function(register):             @register
         | def with_this_thing():                 ...
         | 
         | I think most Python devs understand that "register" can have a
         | side-effect.
        
       ___________________________________________________________________
       (page generated 2021-03-24 23:01 UTC)