[HN Gopher] My Python testing style guide (2017)
___________________________________________________________________
My Python testing style guide (2017)
Author : rbanffy
Score : 176 points
Date : 2021-03-24 11:24 UTC (11 hours ago)
(HTM) web link (blog.thea.codes)
(TXT) w3m dump (blog.thea.codes)
| tjpnz wrote:
| What are people using in terms of testing frameworks now?
| globular-toast wrote:
| Most of the codebases I maintain use unittest, but pytest is
| much better and my preferred framework.
| codethief wrote:
| What makes pytest so much better in your opinion?
| teddyh wrote:
| unittest is in the standard library; this counts for _a lot_.
| globular-toast wrote:
| I used to use unittest for this reason, but it's pretty
| silly. Having extra dependencies for the tests makes no
| difference for end users and these days it barely makes a
| difference to developers.
| codethief wrote:
| > Having extra dependencies for the tests
|
| What do you mean, _extra_ dependencies? The only
| difference between pytest and unittest in this regard is
| that tests using unittest declare their dependency
| _explicitly_ , using an import[0]. Most pytest tests
| still implicitly require pytest as a dependency, though.
| (Think of fixtures etc. etc.)
|
| I actually like unittest's approach here - in my book,
| explicit is better than implicit.
| Rendello wrote:
| This is a good talk by the core developer Raymond Hettinger
| [1]. He prefers pytest too. I don't do any crazy testing, but
| I really like property-based testing with Hypothesis, which
| is also mentioned. This video isn't Python but it's a great
| intro to property-based testing [2].
|
| 1. https://www.youtube.com/watch?v=ARKbfWk4Xyw
|
| 2. https://www.youtube.com/watch?v=AfaNEebCDos
| tcbasche wrote:
| I've been looking for something like this for ages. I'm excited
| to try some of this stuff out, like spec'd Mocks.
|
| I'm curious if anyone else who has been drawn in by the allure of
| the Mock has some strategies to avoid the footguns associated
| with them? (Python specifically)
| animal_spirits wrote:
| What footguns are you looking to avoid with mocks? I've been
| using them for about a year now and haven't run into much
| issues.
| AlexCoventry wrote:
| If the mock's model of the mocked-out component is
| inaccurate, it reduces the relevance of the test using the
| mock.
| [deleted]
| nerdponx wrote:
| Check out the talk by Edwin Jung, "Mocking and Patching
| pitfalls": https://www.youtube.com/watch?v=Ldlz4V-UCFw
| UK-Al05 wrote:
| I've seen things like the mock returns null for an error. But
| the real thing throws an exception.
| travisjungroth wrote:
| I'd really recommend this video. I had seen it years ago and
| just came back to it. It's about changing your architecture,
| one of the effects of that being changing how you need/use
| mock. https://www.youtube.com/watch?v=DJtef410XaM&t
| globular-toast wrote:
| Just don't use them unless you have to. The two main reasons to
| mock are for network requests and because something is too
| slow. Other than that, test things for real. Do not isolate
| parts of your code from other parts of your code by using
| mocks. If your code does side effects on your own machine, like
| writes to the file system, let it write to a temporary
| directory.
|
| I linked this excellent talk in another thread recently. I'll
| put it here again: https://www.youtube.com/watch?v=EZ05e7EMOLM
| scrollaway wrote:
| Indeed. And if you're mocking eg. API calls in a client
| library, try to have tests for the real things as well. They
| don't have to be part of your normal test suite, they can run
| only if env vars are set with the API keys needed.
| tmarice wrote:
| VCR.py (https://github.com/kevin1024/vcrpy) is a great
| utility for mocking APIs. It will run each request once,
| save the responses to YAML files, and then replay the
| responses every time you re-run the tests. It's also very
| useful for caching API responses (e.g. you have a trial
| account with limited number of requests). Unfortunately, if
| used for testing, it will not cover the case when the
| original API changes its interface.
| mumblemumble wrote:
| That Ian Cooper talk is just fantastic. It's perhaps best
| contribution to the subject of TDD anyone has produced since
| Kent Beck popularized the idea in the first place.
| f00_ wrote:
| assert on the call_count attribute of a mock instead of trying
| to use methods on it like .assert_called_once_with()
|
| "a mock's job is to say, "You got it, boss" whenever anyone
| calls it. It will do real work, like raising an exception, when
| one of its convenience methods is called, like
| assert_called_once_with. But it won't do real work when you
| call a method that only resembles a convenience method, such as
| assert_called_once (no _with!)."
|
| https://engineeringblog.yelp.com/2015/02/assert_called_once-...
| alasdairnicol wrote:
| This behaviour has changed in Python 3.5 [1], and it was also
| backported to the mock package.
|
| When unsafe=False (the default), accessing an attribute that
| begins with assert will raise an error.
|
| [1]:
| https://docs.python.org/3/library/unittest.mock.html#the-
| moc...
| returningfory2 wrote:
| The author doesn't like pytest fixtures, but personally they're
| one of my favorite features of pytest.
|
| Here's an example use case: I have a test suite that tests my
| application's interactions with the DB. In my experience, the
| most tedious part of these kinds of tests is setting up the
| initial DB state. The initial DB state will generally consist of
| a few populated rows in a few different tables, many linked
| together through foreign keys. The initial DB state varies in
| each test.
|
| My approach is to create a pytest fixture for each row of data I
| want in a test. (I'm using SQLAlchemy, so a row is 1-1 with a
| populated SQLAlchemy model.) If the row requires another row to
| exist through a foreign key constraint, the fixture for the child
| row will depend on the fixture for the parent. This way, if you
| add the child test fixture to insert the child row, pytest will
| automatically insert the parent row first. The fixtures
| ultimately form a dependency tree.
|
| Finally in a test, creating initial DB state is simple: you just
| add fixtures corresponding to the rows you want to exist in the
| test. All dependencies will be created automatically behind the
| scenes by pytest using the fixtures graph. (In the end I have
| about ~40 fixtures which are used in ~240 tests.)
| epage wrote:
| I'm mixed on fixtures.
|
| One one hand, I've been impressed with how they compose and
| have let me do some great things. For example, I had system
| tests that needed hardware identifiers. I had a `conftest.py`
| to add CLI args for them. I then made fixtures to wrap the
| lookup of these. In the fixture, I marked it as Skip if the arg
| was missing. This was then propagated to all of the tests, only
| running the ones the end-user had the hardware for.
|
| On the other hand, when I need to vary the data between tests
| and that data is an input to something that I'd like to
| abstract the creation of, fixtures break down and I have to
| instead use a function call.
| emptysea wrote:
| One thing I've encounter with pytest fixtures is they have a
| tendency to balloon in size.
|
| We started out with like 50 fixtures, but now we have a
| conftest.py file that has `institution_1`, ...,
| `institution_10`.
|
| My end conclusion is that fixtures are nice for some things,
| like managing mocks, and clearing the databases after tests,
| but for data it's better to write some functions to create
| stuff.
|
| So instead of `def
| test_something(institution_with_some_flag_b)` you'd write in
| your test body: def test_something() -> None:
| institution = create_institution(some_flag="b")
|
| Also another benefit is you can click into the function whereas
| fixtures you have to grep.
| sirlantis wrote:
| I've rewritten a bunch of our tests to this factory pattern
| last week, too (the factory is a fixture though - FactoryBoy
| is worth a look).
|
| I'd argue that too many global fixtures in conftest have a
| high risk of becoming a "Mystery Guests" or too general
| fixtures. For a test reader it's impossible to know the
| semantics of "institution_10".
|
| I believe this to be rooted in DRY obsession leading to
| coupling of tests: "We need a second institution in two
| modules? Let's lift it up to global!"
| codethief wrote:
| I'm the exact opposite, I absolutely _hate_ pytest fixtures.
| They are effectively global state, so adding a fixture
| _somewhere_ in your code base might affect the tests in a
| completely different location. This gets even worse with every
| fixture you add because, being global state, fixtures can
| interact with one another - often in unexpected ways. Finally,
| readers unfamiliar with your code won 't know where the
| arguments for a given `test_xy()` function come from, i.e. the
| dependency injection is completely unclear and your IDE won't
| help you much.
|
| There are _so many_ other (better) ways to achieve the same
| goal, such as decorators or - as already mentioned by emptysea
| in their sibling comment - explicitly invoking some function
| from within the test to do the setup /teardown.
| michaericalribo wrote:
| I'm curious how others test code that operates on large datasets
| --eg, transformations of a dataframe, parsing complicated
| responses, important implementations of analytics functions.
|
| I've previously used serialized data--JSON, or joblib if there
| are complex types (eg, numpy)--but these seem pretty brittle...
| sambalbadjak wrote:
| I'd add to that, that test should be readable. personally I
| prefer to use: GIVEN, WHEN, THEN as comments in the tests. Also;
| it's ok not to be DRY while writing tests.
| mumblemumble wrote:
| > it's ok not to be DRY
|
| Depending on context and implementation details, I'd say DRYing
| tests can be anywhere from indispensable to toxic.
|
| I'm fine with creating libraries of shared functionality that
| tests can use, especially when it helps readability. If you've
| got several tests with the same precondition, having them all
| call a function named "givenTheUserHasLoggedIn()" in order to
| do the setup is a nice readability win. And, since it's a
| function call, it's not too difficult to pick apart if a test's
| preconditions diverge from the others' at a later date.
|
| What I absolutely cannot stand is using inheritance to make
| tests DRY. If you've got an inheritance hierarchy for handling
| test setup, the cost of implementing a change to the test setup
| requirements is O(N) where N is the hierarchy depth, with
| constant factors on the order of, "Welp, there goes my
| afternoon."
| BurningFrog wrote:
| I'm an "it depends" fan myself.
|
| It does annoy the many programmers who want clear and
| absolute rules for everything.
|
| Then again they are always annoyed, living in a world where
| so many things "depend".
| travisjungroth wrote:
| I've gotten lured into the inheritance stuff and it's super
| nice at the very, very beginning and becomes a nightmare to
| maintain. Obviously a horrible tradeoff for software.
|
| I've found that having a class/function as a parameter and
| explicitly listing the classes/functions that get tested is a
| small step back and way easier to maintain and read. It sets
| off some DRY alarms, cause usually that whole list is just
| "subclasses of X". And it seems like burden to update. "So if
| I make a new subclass, I have to add it everywhere?". Yes.
| Yes you do. Familiarity with the test suite is table stakes
| for development. You'll need to add your class name to like
| ten lists, and get 90% coverage for your work, then write a
| few tests about what's special about your class. When
| something breaks, you'll know exactly what's being tested.
| And you'll be able to opt out a class from that test with one
| keystroke.
|
| That being said... I still have a dream of writing a library
| for generating tests for things that inherit from
| collections.abc. Something like "oh, you made a
| MutableSequence? let's test it works like a list except where
| you opt-out."
| mxz3000 wrote:
| The given, when, then breakdown is interesting, though I've
| never seen language test utilities actually enforce that
| structure. Maybe an interesting potential experiment
| (regardless of language) ?
|
| I feel like your last point is especially important. Sooooo
| many times have I seen over-abstracted unit tests that are
| unreadable and are impossible to reason about, because somebody
| decided that they needed to be concise (which they don't).
|
| I'd much rather tests be excessively verbose and
| obvious/straightforward than over abstracted. It also avoids
| gigantic test helper functions that have a million flags
| depending on small variations in desired test behaviour...
| disgruntledphd2 wrote:
| As always, there are tradeoffs.
|
| Personally, I work with some incredibly (100+line) long
| "unit" tests and they are a nightmare to work with.
|
| Especially when the logic is repeated across multiple tests,
| and it's incorrect (or needs to be changed).
|
| I really, really like shorter tests with longer names, but
| I'd imagine there are definitely pathologies at either end.
| psing wrote:
| If you're in the serverless space, a useful addendum:
| https://towardsdatascience.com/how-i-write-meaningful-tests-...
| mumblemumble wrote:
| Personally, I've come to really dislike test names like
| "test_refresh_failure". They tell you what component is being
| tested, but not what kind of behavior is expected. Which can lead
| to a whole lot unnecessary confusion (or bugs) when you're trying
| to maintain a test whose implementation is difficult to
| understand, or if you're not sure it's asserting the right
| things.
|
| It also encourages tests that do too much. If the test is named
| "test_refresh", well, it says right there in the name that it's a
| test for any old generic refresh behavior. So why not just keep
| dumping assertions in there?
|
| I'm much more happy with names like,
| "test_displays_error_message_when_refresh_times_out". Right
| there, you know _exactly_ what 's being verified, because it's
| been written in plain English. Which means you can recognize a
| buggy test implementation when you see it, and you know what
| behavior you're supposed to be restoring if the test breaks, and
| you are prepared to recognize an erroneously passing test, and
| all sorts of useful things like that.
| BurningFrog wrote:
| We don't need to put this much responsibility on test names. If
| there is more to explain, write a few words in a comment.
| exdsq wrote:
| You don't see comments in a test report though -- maybe have
| an optional description as part of the framework for more
| detail along with -v
| masklinn wrote:
| `unittest` prints the docstring alongside the name of the
| test in verbose mode (e.g. failure).
|
| pytest does not though.
| nerdponx wrote:
| I know that the docstring in Unittest is part of the
| reported output, and I was pretty sure that it's the same
| in Pytest.
| masklinn wrote:
| I would've thought so but no, pytest will show the
| docstring in `--collect-only -v` (badly), but it doesn't
| show any sort of description when running the tests, even
| in verbose mode (see issue #7005 which doesn't seem to be
| very popular)
| munchbunny wrote:
| I think this is one of those conventions where your team
| agrees on one convention, and you just try to follow it
| consistently. If that's looking at the test in code to find
| descriptive comments, great. If that's long test names,
| cool. Just do the same thing consistently.
| exdsq wrote:
| True - I think it depends on who uses your test outputs.
| If PMs and BAs care, having a more verbose output
| (descriptions, pretty graphs, etc) helps a lot. If it's
| just for devs then they can more happily go through the
| code base.
| stinos wrote:
| It's a choice of course, but I look at test functions like
| other functions. If you can name a function such that it
| doesn't need a comment (but also doesn't use 100 characters
| to do that), I'll gladly take that over a comment. Same like
| in all other code: if you need comments to explain what it
| does, it's likely not good enough and/or needs to be put in a
| function which tells what it does. Comments on _why_ it does
| stuff the way it does are of course where it 's at.
| xapata wrote:
| s/comment/docstring/
| BurningFrog wrote:
| I have yet to understand the point of docstrings.
|
| How are they, in practical reality, better than comments?
| powersnail wrote:
| The difference is in tooling. Docstrings are collected
| and displayed in many contexts. The intended purpose is
| writing "documentation" in the same place as code.
| Comments are only seen if you open the source file.
| BurningFrog wrote:
| OK, I can see how that's useful for a certain workflow.
|
| The way we work, it's just a different comment syntax. I
| do like have a dedicated place for it.
| f00_ wrote:
| you can access it from object.__doc__, and there is
| tooling in ide's like pycharm to quick view them to see
| what a function does, auto generated documentation
|
| prior to mypy/type hints it allowed you to document the
| types of a function
| mumblemumble wrote:
| I've just never seen the "this belongs in comments" approach
| work out in practice. Maybe it's something about human
| psychology. Perhaps things might seem obvious when you've
| just written the test, so you don't think to comment them.
| Perhaps code reviewers feel less comfortable saying, "I don't
| understand this, could you please comment it?" Perhaps it's
| the simple fact that optional means optional, which means,
| "You don't have to do it if you aren't in the mood."
| Regardless of the reason, though, it's a thing I've seen play
| out so many times that I've become convinced that asking
| people to do otherwise is spitting into the wind.
| BurningFrog wrote:
| All this is true, but doesn't it apply just as much to
| writing descriptive test function names?
|
| Test function names are less sensitive than regular
| functions, since they're not explicitly called, but I still
| don't want to read
| a_sentence_with_spaces_replaced_by_underscores.
| mumblemumble wrote:
| I don't want to either, but until Python gives us
| backtick symbols or we something like Kotlin's kotest
| that lets the test name just be an actual string, that's
| sort of the choice we're left with. And I'm inclined to
| take the option that I've known to lead to more
| maintainable tests over the long run over the option that
| I've known to engender problems, even if it is harder on
| the eyes. Form after function.
|
| As far as whether or not people do a better job with
| descriptive test function names, what I've seen is that
| they do? I of course can't share any data because this is
| all in private codebases, so I guess this could quickly
| devolve into a game of dueling anecdotes. But what I've
| observed is that people tend to take function names -
| and, by extension, function naming conventions -
| seriously, and they are more likely to think of comments
| as expendable clutter. (Probably because they usually
| are.) Which means that they'll think harder about
| function names in the first place, and also means that
| code reviewers are more likely to mention if they think a
| function name isn't quite up to snuff.
|
| And I just don't like to cut those sorts of human
| behavior factors out of the picture, even when they're
| annoying or hard to understand. Because, at the end of
| the day, it's all about human factors.
| BurningFrog wrote:
| I don't disagree with much of this.
|
| I was talking specifically about a `def
| test_displays_error_message_when_refresh_times_out()`
| function.
|
| That's too big a name for me to keep in my head, so I'd
| look for other solutions.
| rbanffy wrote:
| I really hate, but I can see its virtue, the pylint rule
| that complains about lacking docstrings.
|
| In tests, however, I prefer not to have them as my favorite
| test runners replace the full name of the test method with
| its docstring, which makes it a lot harder to find the test
| in the code.
| ben509 wrote:
| My only issue is that the rule doesn't give you a good
| way to annotate that the object is documented by its
| signature. Sometimes devs are lazy, but the rule doesn't
| make them not lazy so you get pointless docs like:
| def matriculate_flange(via: Worble):
| "Matriculates flange via a Worble."
| darioush wrote:
| have you considered using "test_refresh__failure" instead?
| makes clear "refresh" is the component being tested and
| "failure" is a description of the behavior
| [deleted]
| rowanseymour wrote:
| Sometimes there are practical reasons to avoid this approach
| and have fewer test methods that test multiple behaviors of a
| single thing. For example a lot of our projects setup and
| teardown a database between test methods so a few fatter test
| methods run a lot faster than a large number of small test
| methods. We rely on good commenting within the methods to
| understand what exactly is being checked.
| codethief wrote:
| In that case, why not use some auxiliary function to load the
| resources (in your case, the database) and decorate it with
| functools.cache[0] to avoid the function from getting
| executed multiple times? Sure, this means re-using resources
| between multiple tests (which is discouraged for good
| reasons) but your current test effectively does the same
| thing, the only difference being that everything is being
| tested inside one single test function.
|
| PS: How come your setup and teardown operations are so
| expensive in the first place? Why do you even need to set up
| an entire database? Can't you mock out the database, set up
| only a few tables or use a lighter database layer?
|
| [0]: https://docs.python.org/3/library/functools.html#functoo
| ls.c...
| jxub wrote:
| I think that behave (https://behave.readthedocs.io/en/stable/)
| is more useful for testing these more real-life usecases. I
| tend to have a `test_my_function` in pytest tests and the more
| integration and functionality-related testing in the behave
| tests.
| wodenokoto wrote:
| One of the most common test modules in R is called "test that"
| and you invoke a test by calling a function (rather than
| defining one) called "test_that", the first argument is a
| string containing a description of what you want to test and
| the second argument is the code you want to test.
|
| That way, all your unit tests reads: "test that error message
| is displayed when refresh times out" etc.
|
| I think it's a really nice way to lay things out and it avoids
| all the "magic" of some functions being executed by virtue of
| their name.
| MrPowers wrote:
| I've found pytest to encourage tests with really long method
| names, examples from the post:
| test_refresh_failure test_refresh_with_timeout
|
| These get even longer like
| test_refresh_with_timeout_when_username_is_not_found for example.
|
| pytest-describe allows for a much nicer testing syntax. There's a
| great comparison here: https://github.com/pytest-dev/pytest-
| describe#why-bother
|
| TL;DR, this is nicer: def describe_my_function():
| def with_default_arguments(): def
| with_some_other_arguments():
|
| This isn't as nice: def
| test_my_function_with_default_arguments(): def
| test_my_function_with_some_other_arguments():
| TuringNYC wrote:
| I just saw the github readme for this project. How is the
| describe variant different from just grouping the tests
| together into a module called test_describe_my_function.py and
| then having smaller named functions inside?
| klenwell wrote:
| This grouping convention reminds me a lot of Better Specs
| from the Ruby world:
|
| https://www.betterspecs.org/
|
| With rspec, you use the describe and context keywords.
|
| At one level, yes, it's mainly syntactical sugar. As the
| test-writer, the two approaches may seem interchangeable.
|
| Where I find it really helps is when I'm not the test-writer
| but rather I'm reviewing another developer's tests, say in
| PR. I find this syntax and hierarchy produces a much more
| coherent test suite and makes it easier for me to twig
| different use cases and test quality generally.
| theptip wrote:
| The readme says:
|
| > With pytest, it's possible to organize tests in a similar
| way with classes. However, I think classes are awkward. I
| don't think the convention of using camel-case names for
| classes fit very well when testing functions in different
| cases. In addition, every test function must take a "self"
| argument that is never used.
|
| So there's no reason to do this, aside from aesthetics.
|
| I'd recommend against doing un-Pythonic stuff like this, it
| makes your code harder to pick up for new engineers.
| ben509 wrote:
| You could call it aesthetics, but it's also readability,
| and that's an important aspect of tests.
| w0tintarnation wrote:
| Do you have an opinion on grouping pytest tests in classes?
| class Test_my_function: def
| with_default_arguments(self): def
| with_some_other_arguments(self):
|
| If you can make your eye stop twitching after seeing snake
| cased class names, this is at least another option of grouping
| tests for a single function.
| poooogles wrote:
| They're already grouped by module which normally provides
| enough granularity (in my experience, I've only scaled this
| up to 50k LOC apps though so YMMV).
| ben509 wrote:
| I like the concept, but using the profiler to grab locally
| declared tests is a bit more magic than I'm comfortable with in
| my tests.
|
| Something like this might be a good compromise:
| def describe_my_function(register): @register
| def with_this_thing(): ...
|
| I think most Python devs understand that "register" can have a
| side-effect.
___________________________________________________________________
(page generated 2021-03-24 23:01 UTC)