[HN Gopher] Why Clinical Laboratorians Should Embrace the R Prog...
___________________________________________________________________
Why Clinical Laboratorians Should Embrace the R Programming
Language
Author : teleforce
Score : 47 points
Date : 2021-05-22 10:11 UTC (12 hours ago)
(HTM) web link (www.aacc.org)
(TXT) w3m dump (www.aacc.org)
| refactor_master wrote:
| To me, R seems more like a cobbled together ecosystem of
| automation within statistics, rather than an actual language.
|
| Compared with Python the language ergonomics of R are confusing
| and inconsistent.
|
| I guess momentum and establishment is also a feature in itself
| though I've personally never felt that, one of the selling
| points, the esoteric statistics packages at the edge would be of
| any use to me.
|
| The use I've seen is reminiscent of Spyder and Notebooks:
| tangled, unreadable mess of line-by-line execution where people
| are prone to re-running stuff out of order.
| akg_67 wrote:
| Actually, it most probably reverse. Python, numpy, pandas, etc
| are cobbled together with duct tape to do what R does
| elegantly. There is no consistency with Python ecosystem.
| Maxion wrote:
| Agree. One thing is very clear in the R vs. Python debate is
| that a lot of programmers seem to know either or, not both.
|
| They are different tools for different purposes.
| disgruntledphd2 wrote:
| I know both pretty well at this point, and would say that
| python is a much better language overall, but the stats/ML
| parts (especially pandas) are pretty inconsistent with both
| Python and themselves.
|
| That being said, R is amazing for exploratory work, while
| Python is better for integrating with the rest of the
| world.
| hichamino wrote:
| I am a clinical laboratorian and i find this article very useful
| .. Thank you for sharing. Can you help me to explore this field ?
| wheelinsupial wrote:
| What do you currently use for data analysis?
|
| I've taken courses on statistical computing in R and
| statistical computing in SAS in my statistics degree. We were
| always told that SAS is the standard for anything health care,
| pharmaceutical, or where regulation and publication comes into
| play.
|
| Anecdotally, my friends who did PhDs in biochem and immunology
| all used SAS for their data analysis.
|
| Have I been misled or is this up to individual preference?
| cinntaile wrote:
| https://r4ds.had.co.nz/ you could use this to get started.
| bogeholm wrote:
| This seems like a reasonable place to start: https://online-
| learning.harvard.edu/subject/r
| todd8 wrote:
| Tilman M. Davies's book, _The Book of R: A First Course in
| Programming and Statistics_ is a good place to start. See [1].
|
| [1] https://www.amazon.com/Book-First-Course-Programming-
| Statist...
| hichamino wrote:
| I am a clinical laboratorian and find this article very useful.
| Can you help me to explore this new field ?
| clircle wrote:
| Reasons I prefer R to python:
|
| - Rmarkdown. I prefer a text document over a web notebook for
| exploratory research
|
| - The standard library is for statistics and data: dataframe, lm,
| anova, etc. are builtin
|
| - A huge range of probability distributions are built-in. I don't
| need to import extra libs to do simulations.
|
| - Between functional programming techniques and vectorization, I
| can write very clean and concise code
|
| - Tidyverse and data.table are lovely and coherent approaches to
| data management. Data.table is fast and memory efficient.
|
| - Advanced models are trustworthy: For example, glmnet, mgcv,
| nlme, rms are authored by statistical heavyweights, and have
| accompanying books that are excellent. I don't have the same
| confidence in python's statsmodels.
|
| - CRAN is easy to use, I can access it from my R session, and
| there are rarely problems (big thanks to Uwe Ligges)
|
| - Libraries for design of experiments and surveys are available.
| R supports the entire design -> data management -> model cycle.
|
| - base graphics/lattice/ggplot2 are excellent for plotting. If I
| need something advanced, I can use grid. If I need vector
| graphics, I can use tikzDevice for latex.
|
| - Rstudio is a an excellent IDE, and Emacs Speaks Statistics is
| an excellent Emacs plugin
|
| - It is very easy to get help without going to google. (?foo,
| ??bar, etc) Documentation is well organized, and the documents
| often contain citations and relevant links.
|
| - Lots of advanced models can't be found outside of R. Today I
| fit a splines-on-a-sphere model using mgcv
| (https://stat.ethz.ch/R-manual/R-patched/library/mgcv/html/sm...)
|
| - Rapid iteration in modeling using Wilkinson notation formulas.
| Built-in formulas are the actual killer app of R, IMO.
|
| - Things are generally fast, but if you need extra horsepower,
| plugging into c++ is easy using Rcpp.
|
| - R feels like lisp. Experimentation is easy, and I don't feel
| forced into any particular paradigm while using R. I have a lot
| of ways to evaluate code
| (https://ess.r-project.org/Manual/ess.html#Evaluating-code)
| jhoechtl wrote:
| Does RStudio already work on Linux wayland? Last time it failed
| because of a QT-component but AFAIK the whole thing is
| transitioning to electron.
| jarenmf wrote:
| Agreed, also R with vim is really a joy to work with (Nvim-R
| plugin) I can't replicate the experience with any other IDE.
| For example, I can define my own key bindings to show a certain
| summary statistic or a custom plot for the variable I'm at.
| clircle wrote:
| Thanks for this. I might give it Nvim-R a shot. I've been
| using Evil+ESS for a long time, but Emacs runs like a dog on
| Windows 10.
| qsort wrote:
| This is more of a sales pitch for the general concept of
| programming rather than R in particular. R is as good a choice as
| any...
| teruakohatu wrote:
| I would say you could replace R with Python or Julia and do just
| fine. Anything must be better than SPSS.
|
| That said, R had come a long way in recent years and is enjoyable
| to use. It is very complete as far as statistics go.
| ekianjo wrote:
| R has much better syntax with tidyverse for data wrangling and
| even up to models with tidymodels and all. Python in comparison
| is hard to read.
| z77dj3kl wrote:
| What... Have you seen the piping syntax in tidyverse? It's
| incomprehensible unless you put in a lot of effort to
| understand all that's going on.
| teruakohatu wrote:
| What is incomprehensible about it?
| tomrod wrote:
| Grandparent commenter mentioned the piping syntax,
| specifically.
| disgruntledphd2 wrote:
| I dunno, I got pretty sick of writing head(filter(df,
| value>10)) and it's (a little) easier to indent as df %>%
| filter(value>10) %>% head().
|
| It's problematic because people abuse it for everything
| (300 line pipes are common, sadly) but it's a really
| useful tool in moderation.
| lordgroff wrote:
| This is such an odd take. I program almost exclusively
| Python these days and I miss the elegance of piped
| functions daily.
| Maxion wrote:
| It depends on what you're doing, there's a lot of genomics
| related packages for R that do not exist in python.
| nabla9 wrote:
| I think that R libraries have edge over Python and Julia in
| quality and quantity.
|
| If you like to write lots of code, Python might be better. But
| if you use it in clinical research, R has probably better
| packages for whatever you need.
| tomrod wrote:
| R packages are the wild west as far as code quality goes. In
| one corner you have Hadley Wickam producing phenomenal
| efforts like tidyverse and ggplot. In the other corner you
| have a herd of feral cats.
|
| Python gets scrutiny but most packages are on github and
| feedback can be received. Although, you should read the
| source code regardless.
|
| EDIT: And I re-emphasize -- never trust the source code, even
| if the company you work with indemnifies. One commercial
| product we used in particular had an insidious bug in one of
| the new time series packages that got corrected with in later
| versions, but we never would have found it if model testing
| requirements didn't also require implementing in R or Python.
| Since the package didn't exist for Python, and we wrote it
| ourselves, we found the performance issues.
| nabla9 wrote:
| There are different qualities. Code quality and quality of
| the functionality.
|
| In R you more packages that do what you expect
| (mathematically) but the implementation is inelegant and
| slow. Written by someone who knows exactly what they need
| and what the package should do, but has difficulty of
| writing it down.
|
| In python you many well implemented neat packages where the
| code is well implemented and performs well, but is not
| exactly doing what user need or skips important features
| because they are conceptually difficult.
| [deleted]
| tomrod wrote:
| > In R you more packages that do what you expect
| (mathematically)
|
| I disagree with this point explicitly. Many packages are
| not only poorly written programmatically and
| systemically, they also produce bad results in common
| cases and fail silently. This has been discussed well
| before on our very own YCombinator.
|
| https://news.ycombinator.com/item?id=17308554
|
| > is not exactly doing what user need or skips important
| features because they are conceptually difficult.
|
| This is why classes are cool. You can extend them, modify
| them, and so on. Perhaps we should teach this more to
| fellow data scientists.
|
| https://docs.python.org/3/tutorial/classes.html
| disgruntledphd2 wrote:
| > This is why classes are cool
|
| You may be unaware, but there are vast similarities
| between the object system of R and Python, mostly due to
| their common inheritance from the Art of the Metaobject
| protocol. They look very different (generic functions vs
| classes), but they are equally extensible.
|
| The trouble with R's systems is that there's three of
| them, and people use whichever works without really
| understanding any of them, but all the tools are there.
| tomrod wrote:
| Aware, yes. The point I made before was pointing out that
| you can implement what you want in Python relatively
| easy. That R has a wild west set of systems is exactly my
| point.
| disgruntledphd2 wrote:
| I don't really get what you mean.
|
| You said, in the context of ensuring conceptually
| difficult parts of a model/method were implemented:
|
| > This is why classes are cool. You can extend them,
| modify them, and so on. Perhaps we should teach this more
| to fellow data scientists.
|
| I pointed out that both R and Python have similar object
| systems.
|
| > ou can implement what you want in Python relatively
| easy.
|
| I don't get why this is necessarily true, but I might be
| missing something. Can you clarify?
| warlog wrote:
| The only thing worse than an object system is...three
| object systems. I never quite know what is the cool kid
| implementation, and I don't even like oo.
| akg_67 wrote:
| One's who come from programming background prefer Python. The
| one's who come from scientific or statistics background prefer
| R.
|
| IME, for production Python rules and for data exploration,
| graphics, stats and adhoc analysis R rules.
| CalRobert wrote:
| R might be a perfectly fine language, but the culture and
| ecosystem around R seem to produce a lot of untested, difficult
| to read code. Globals everywhere, mediocre requirements
| resolution, and a lack of forced namespacing come to mind. Maybe
| it's a result of being used by people who are not primarily
| programmers.
| rout39574 wrote:
| That's my diagnosis. I've tried on and off since at least 2006
| to affect the course of that culture, to make managing R
| packages more tractable.
|
| I'd say the R dev community was then actively hostile to a
| culture change to support any model other than individual
| contributors working at their desktop.
| disgruntledphd2 wrote:
| Most R is written by people who are not (and mostly don't want
| to be) professional programmers. This is both a strength
| (industrial strength discipline specific tools) and a weakness
| (oh dear lord the code, my eyeeeeees).
|
| It's also important to note that much of the original core of R
| is based on S, which was developed around the same time as C,
| so some baggage would be expected.
| lordgroff wrote:
| I see a lot of R is very hard to reproduce use Python, or R is
| hardly a programming language, and I honestly have to wonder if
| this is really written in good faith, and on a forum that's
| supposedly a bastion of Scheme love, no less.
|
| R is far more of a Lisp than Python, and in a field that heavily
| relies on DSL abstractions (which definitely includes clinical
| laboratories), R is going to fight you a lot less than most
| choices.
|
| In regards to packaging, you have MRAN snapshots, you have conda
| (which will give you binaries on Linux), you have renv, roughly
| in order of preference. The situation is not ideal, but it's
| certainly not worse than Python, this is not a hill I'd die on!
|
| Julia might be an exciting and welcome alternative to both; from
| where I'm sitting, it hardly even enters the conversation
| currently (in data science where I'm at, it's all Python and R,
| with Python unfortunately taking by far the larger slice of the
| pie), but I wish it a bright future, it's a great language.
| roonilwaslib wrote:
| I run operations for a company that relies heavily on R, and I'd
| strongly advise against using the language. R's package
| management system makes reproducing work difficult. We've had to
| rely on using renv, a snapshot of CRAN (the default source of R
| packages: some FTP servers), and a bunch of Docker to get vaguely
| reproducible installs. However, since R installing a package
| involves compiling that code that you just downloaded from a
| public FTP server, installs are extremely slow.
|
| I'd recommend python based on the slightly-saner tooling. I've
| found that python with conda/pipenv/poetry results in mostly
| reproducible installs of the tools needed to run a computation.
| disgruntledphd2 wrote:
| So, I actually get what you mean here, and have used both R and
| Python in anger for a number of years.
|
| This is all about tradeoffs. Fundamentally, if your package
| doesn't compile on the latest version of R, it gets removed
| from CRAN. This means that each version of R has a consistent
| set of packages that (mostly) work together.
|
| Contrast with Python which does facilitate reproducible builds,
| because you can hack together ancient versions of Python and
| make them continue working. I could go into a massive rant here
| about pip, but it's trending in the right direction now and I
| don't want to discourage any of the people working on it.
|
| R is better in terms of being able to ensure that for a given R
| version, any package you install will be compatible, Python is
| better for making sure that that one application built three
| years ago keeps working in the same fashion.
|
| Also, it sounds like you're running a nix based system, have
| you considered (I'm sure you have) using the system packages.
| For example, the Debian/Ubuntu ones are pretty comprehensive,
| at the cost of using older versions. I _believe_ that R-studio
| also have pre-built packages for Linux (but have not tested
| this) so that could also work.
|
| To be fair, conda is pretty good as a package manager, because
| it handles the C++/C dependencies. But to your Docker point,
| that's how I handle the insanity that is python packaging,
| especially in the data science space, so it may just be an
| issue with the field itself.
| lordgroff wrote:
| Conda supports R and it gives you R binaries on any platform.
| We've used this setup for years at my old workplace, and it
| gives you sane reproducible builds.
| _Wintermute wrote:
| I basically worked as an R troubleshooter in a Pharmaceutical
| company, and honestly I wish python or Julia would take its
| place. There's so many instances when R would return a nonsense
| answer rather than fail, but you wouldn't realise until you did a
| deep dive of someone's code.
| fastaguy88 wrote:
| It would be so easy to write a very similar article on why
| clinical laboratories show NOT use 'R' in their analyses. I use
| 'R' extensively for data presentation, but I am constantly
| bitten by plots that look great, but do not in fact represent
| the data, because of some weird "factor" issue. I have never
| used a language where it is so easy to get beautiful results
| that are wrong. 'R' has very limited error checking, if it can
| figure out a way to do something (incorrectly), it's happy to
| do it. With its hidden 'statefulness' and tricky 'factor'
| effects, it would be a disaster in the clinical setting.
| Clinical labs need languages and procedures that will fail,
| rather than present an incorrect result. Unfortunately, 'R'
| does the opposite.
| eigenspace wrote:
| So many dynamic languages take this strange ethos of never
| wanting to throw an error and instead just guessing what the
| programmer meant and just doing something wacky instead of
| throwing an error. It's a real problem.
| vixen99 wrote:
| So the question is, who was writing the code and why were they
| so evidently incompetent? Easy for anyone to pick up a bit of R
| and start working with it. Thus it's hardly surprising to find
| the situation you describe. Why weren't these folk put through
| a rigorous course before being let loose in a pharmaceutical
| company of all places? Hardly their fault unless they
| exaggerated their skills.
| fastaguy88 wrote:
| While your standards could be different from mine, I don't
| think that all programmers who fail to write perfect code the
| first time are incompetent. Many competent programmers are
| not perfect, and rely on error messages and debuggers to
| produce correct code. Unfortunately, R does often fails to
| give the information required to find bugs.
| 41b696ef1113 wrote:
| My goto resource on this would be aRrgh which goes through
| some of the many rough edges of the language. Silent failures
| and data type castings can bite even those experienced in the
| language.
|
| R is a data analysis DSL that also happens to be a full
| programming language.
|
| [0] http://arrgh.tim-smith.us/
| toolslive wrote:
| > Unlike Excel and many other graphical user interface
| (GUI)-based programs, R's reliance on text-based structure makes
| it straightforward to review at any time the commands used in a
| data processing pipeline to ensure that the correct steps were
| taken. > Furthermore, the ability to view the underlying
| commands facilitates transparency and reproducibility of
| analyses.
|
| The article seems to be targeted at people with zero programming
| knowledge. The arguments here are valid, but don't rely on R. The
| title could just have been ".... Should Embrace a Programming
| Language".
| adimitrov wrote:
| I disagree. As a software engineer, R is a nuisance, it's a
| terrible language, and I hate doing complicated things in it.
|
| But it's very powerful, it's exactly right for these use cases
| and its ecosystem is mindbogglingly huge. Also, it tends to be
| easier to grasp for folks who don't have prior programming
| knowledge (anecdotal, but I've seen people pick it up very
| quickly who struggled a lot with, say, Python. And Python is
| the only language/ecosystem that comes close to R.)
|
| So, yeah, a lot of languages _could_ be used for the use cases
| in TFA, but R is uniquely suited, weaknesses notwithstanding.
| fastaguy88 wrote:
| But, to focus on the original article, Clinical Labs, where
| data analysis is literally the basis for life-and-death
| decisions, is not a sensible use case.
| hermitcrab wrote:
| R, for all its strengths, seems to have quite a learning curve
| and a lot of syntax to remember. The other approach is to use a
| GUI based enviroment such as Easy Data Transform, Alteryx or
| Knime. These are never going to be quite as flexible as a
| language-based approach, but they are a lot easier to get started
| with - especially for people with no programming background.
| kardianos wrote:
| No.
|
| R is hard to reproduce library setup. Unable to compile and
| static validation.
| kubb wrote:
| My impression is that people outside of maths and statistics are
| more likely to choose Python than R, because they're able to get
| started with it more easily.
|
| Conventional programmers seem to be somewhat reluctant to learn
| R's syntax and adjust their programming model.
|
| Non-programmer types think in maths even less so they like the
| python "straightforwardness".
| z77dj3kl wrote:
| R is used almost exclusively in stats. Most in maths use
| python, c++ (there's a surprising amount of hpc code, e.g. pde
| solvers and other stuff floating around in c++), matlab, etc.
| the_only_law wrote:
| No more fortran I guess?
| MohammedAShahid wrote:
| From what I've seen, there is actually still people using
| FORTRAN for Applied Math. I had several professors who use
| it for CFD.
| nnm wrote:
| It's interesting to observe that whenever R is discussed, someone
| talked that python is better than R in this or that area.
| ppod wrote:
| There are some specific advantages that R has over other
| languages in this context. One is that you can use (almost)* a
| single source document to produce docx, pdf, and interactive html
| output.
|
| * I can't quite get htmlwidgets and docx formats to work together
| in bookdown without using separate commands for the interactive
| tables (DT and flextable).
| ekianjo wrote:
| Rmarkdown is certainly a big win in terms of reporting and its
| integration with Rstudio makes it a breeze to work with it
___________________________________________________________________
(page generated 2021-05-22 23:01 UTC)