[HN Gopher] Doctests in R
___________________________________________________________________
Doctests in R
Author : dash2
Score : 37 points
Date : 2022-11-26 22:57 UTC (1 days ago)
(HTM) web link (hughjonesd.github.io)
(TXT) w3m dump (hughjonesd.github.io)
| tialaramex wrote:
| I don't think "doctests built in" means the same thing when
| Python has a module you must choose to import and add to your
| test infrastructure, versus Rust just tests your documented
| examples without any special steps - unless you tell it not to
| (either that a specific example is not suitable for testing, or
| just categorically don't run doctests).
|
| This idea (doctests) is one of those crucial "You can, you
| should, but you probably don't" things where the point is to make
| doing The Right Thing(tm) so easy that you actually do it rather
| than just nodding when people say you _should_ do it. You clearly
| should check that your documented APIs match reality, that
| examples work, but in many languages that 's not easy enough to
| do out of the box, so either a project invests time and effort to
| do this or they go without and most projects will go without.
|
| Over a year ago I wrote some MS Teams integration code in C#.
| Microsoft publishes documentation with examples for how to use
| the APIs for this. The documentation is wrong. There are open
| bugs on long abandoned (as is usual for Microsoft) repositories,
| nobody cares. But chances are back when they _shipped_ the
| documentation if that had failed, somebody would have fixed it,
| maybe it 'd take a few minutes or even an hour, but it'd save a
| nasty experience for lets say conservatively, thousands of
| developers. Instead it's just a much upvoted Stack Overflow
| answer with a workaround.
| masklinn wrote:
| > I don't think "doctests built in" means the same thing when
| Python has a module you must choose to import and add to your
| test infrastructure
|
| If you're using pytest, then it's just a flag away (well two if
| you're using both docstring-doctests and document-doctests, but
| the latter seems unlikely).
|
| But that it's opt-in makes sense either way, as doctest support
| was added later on, whereas in Rust it's been there all along.
| vharuck wrote:
| Not a fan.
|
| - Example sections will be cluttered with unit tests.
|
| - Doc tests asserting warnings or errors will produce examples of
| bad code. This might make sense for the example `safe_mean`
| function, where its only purpose is wigging out for improper
| input. But most functions should just show how to use them.
|
| - Test scripts are still useful for setting up loops, creating
| helper functions, or other stuff. But then test code will be
| split between the roxygen comments and those test scripts.
|
| I use doc tests in python scripts, because they're quick sanity
| checks that fit in the same file. I don't use them in packages.
| If R had doc tests, I'd rather use them in single-file scripts.
| Maybe a function that acts like `source` but also generates and
| inserts the tests.
| masklinn wrote:
| > - Example sections will be cluttered with unit tests.
|
| The fundamental purpose of doctests is not to write unit tests,
| but to ensure your examples are valid. It's easy to write
| examples which don't work in docstrings.
|
| Running a doctest system on your documentation doesn't preclude
| having actual tests, quite the opposite. Edge cases or
| complicated scenarios often don't make for great examples, but
| are usually valuable tests.
|
| For instance in Rust most methods of Vec have an example, which
| is doctested, and yet Vec still has an extensive suite of unit
| tests: https://github.com/rust-
| lang/rust/blob/master/library/alloc/...
|
| Technically you could use doctests as a literate-ish test
| framework (assuming that's even supported, which it may not
| be), but the oddball environment tends to make that not great,
| and the "literate" part is not very useful when unit testing.
| It's way more valuable to ensure docstrings and standalone
| documentation are valid.
| apwheele wrote:
| Yeah, and the R ecosystem has this built in (checking the
| examples run, not their output is correct). `R CMD check` has
| as one of its checks whether the examples you build in the
| help docs generate runtime errors.
|
| I use roxygen example help file generation as well for R
| packages, but have mixed feelings relative to python
| documentation.
| crispyambulance wrote:
| I am inclined to agree. Unit tests and documentation are two
| SEPARATE things with different intentions. IMHO, mixing these
| together harms both.
|
| Unit tests are primarily intended for the developers of the
| library. They do help users, sometimes, when you're trying to
| work out some fundamental misconception about what the library
| DOES, but generally speaking the granularity of unit tests is
| too fine unless you're REALLY digging in.
|
| Much better, I think, to spend effort on writing clear
| documentation. R has a problem with that. Docs typically have
| overwhelming terse detail followed by anemic examples. Couple
| this with that fact that R users tend to always be in the
| middle of something urgent and completely unrelated to writing
| libraries (like, analyzing data and making decisions based on
| that data) and you get a recipe for frustration.
|
| I find myself referring over and over again to the tidyverse
| "cheatsheets" [https://posit.co/resources/cheatsheets/]. These
| show, explicitly and clearly, what the things actually do. I
| wish someone put that kind of, often graphical, content into
| the docs for all functions.
| StarlaAtNight wrote:
| I SERIOUSLY relate with you on the "anemic examples" part.
| One of my biggest frustrations with R
|
| And granted, this is a complaint for base R and other older
| spaces in the R world...the tidyverse packages (and modern
| ones inspired by it) tend to have pretty great examples with
| lots of iron
| cardosof wrote:
| Not really related to the post but since R is so rare here in
| Hacker News, I will ask anyway: is R still worth using in
| 2022-23? Even RStudio gave up it's R brand to focus on Python.
| nerdponx wrote:
| Use it if you find it useful. It still has a much better and
| more vibrant ecosystem for statistics, including Bayesian
| statistics and certain kinds of time series analysis.
| Data.table is also a serious "power tool", although other non-
| Pandas data frame libraries like Polars might be dethroning it.
| Also GGPlot is still awesome, even if you can now get it in
| Python with Plotnine.
| BrandonS113 wrote:
| R has much much better statistical packages that R, if it is
| statistics, you can probably find a package in R to do it, not
| same with python. And the programming language is much better
| for statistics than numpy/pandas if a package is not
| sufficient. I use both, and for statistics have no choice but
| to use R. For data, I use python.
| vhhn wrote:
| There are still several areas where R beats Python: tabular
| data crunching, data analysis (plotting, stats), finance
| (econometrics etc...) but it's less and less obvious.
| throwaway_2341 wrote:
| < Even RStudio gave up it's R brand to focus on Python
|
| Wouldn't R still be the primary language in RStudio, with
| Python being made available as necessary? Or is the idea that
| RStudio will turn into a proper Python IDE? Curious what makes
| you say that RStudio is putting its 'focus' on Python.
| cardosof wrote:
| They changed their name to Posit so yeah, that's a conscious
| move away from R.
| jstx1 wrote:
| If you work with mostly tabular data, never deploy anything and
| don't need any deep learning, then it's fine.
| goosedragons wrote:
| I think so. It's still better at the things it was always
| better at, data analysis. I could be biased since it's my main
| language though.
| kickout wrote:
| Yes it's worth using IMO. Plotting and grokking is better than
| python IMO.
| closed wrote:
| One interesting thing about R examples is their outputs tend to
| be bigger. I think this is in direct contrast to python
| docstrings, where outputs are very concise--because you manually
| include the output for doctest.
|
| I wonder if a challenge for doctests in R is they often have to
| test larger, more realistic outputs?
|
| For example, in dplyr's mutate doc, one example is this:
| starwars %>% select(name, mass) %>% mutate(
| mass2 = mass * 2, mass2_squared = mass2 * mass2 )
|
| This example's output is a dataframe with 4 columns and will
| display first 5 rows.
|
| On the other hand in siuba (a port of dplyr to python), I often
| have to truncate the example output, because it's hard coded in
| the docstring: (cars >> mutate(
| cyl2 = _.cyl * 2, cyl4 = _.cyl2 * 2 )
| >> head(2) ) cyl mpg hp cyl2 cyl4 0
| 6 21.0 110 12 24 1 6 21.0 110 12 24
|
| It's nice you can see the full example in the docstring in
| python, but also very handy seeing complex examples on R doc
| pages:
|
| https://dplyr.tidyverse.org/reference/mutate.html#ref-exampl...
| civilized wrote:
| Couldn't people just add expect_* tests to their examples? What's
| the benefit of adding all this new notation and magic?
|
| Disclaimer: I'm an R programmer but not deeply familiar with
| authoring packages.
| vharuck wrote:
| The idea in TFA is to keep a function's definition,
| documentation, and unit tests next to each other in a single
| file.
|
| >Couldn't people just add expect_* tests to their examples?
|
| Users can run examples with the `example` function. So if you
| use the `testthat` package in examples, then you should add it
| to your package's imports. Which means more to load with the
| package, but only for a small benefit that's rarely used.
|
| Also, raising warnings or errors in examples and not catching
| them is a no-no. The CRAN package repository will not accept a
| package like that.
|
| _Edit: I originally wrote that this wouldn 't create any
| examples in the final manual pages, but I was wrong._
| civilized wrote:
| Ok, I think I might get it now? The tests are written down in
| the example, but are only run by the package developer and
| the results are hidden from the user? That seems like a good
| thing. The user wants to see the example but doesn't care
| about whether your test passed.
|
| The magic makes sense now too. You already need a roxygen2
| header to set up the auto-generated tests, so why not call it
| @expect and then write equal instead of expect_equal, so as
| not to repeat yourself?
| masklinn wrote:
| > Ok, I think I might get it now? The tests are written
| down in the example, but are only run by the package
| developer and the results are hidden from the user? That
| seems like a good thing. The user wants to see the example
| but doesn't care about whether your test passed.
|
| But surely the user wants to see what the result of the
| call is, if it's relevant? That's why rust examples (which
| are also doctests) include the corresponding assertions.
|
| You _can_ hide them, but usually you don 't, because e.g.
| showing what the result of `str::len` is is the point of
| having an example: https://doc.rust-
| lang.org/std/primitive.str.html#method.len
|
| Unless roxygen or Rd independently runs the code and embeds
| the output independent of the doctests succeeding or
| failing?
| civilized wrote:
| Right, the user wants to see the result, but doesn't care
| about the developer's test that the result is the
| expected result.
| nonrandomstring wrote:
| With tests, sometimes you want to embed test data, as a here-
| document so that tests don't get separated from minimal datasets
| needed. In perl it was customary to use <<"EOF";...EOF and """
| triple quotes """ serve similar utility in Python. What's the
| deal in R? Just make a vector in the test?
| civilized wrote:
| The strategies I usually see are (1) use a built-in dataset (2)
| make the data at the beginning of the example.
___________________________________________________________________
(page generated 2022-11-27 23:01 UTC)