[HN Gopher] Why Clinical Laboratorians Should Embrace the R Prog...
       ___________________________________________________________________
        
       Why Clinical Laboratorians Should Embrace the R Programming
       Language
        
       Author : teleforce
       Score  : 47 points
       Date   : 2021-05-22 10:11 UTC (12 hours ago)
        
 (HTM) web link (www.aacc.org)
 (TXT) w3m dump (www.aacc.org)
        
       | refactor_master wrote:
       | To me, R seems more like a cobbled together ecosystem of
       | automation within statistics, rather than an actual language.
       | 
       | Compared with Python the language ergonomics of R are confusing
       | and inconsistent.
       | 
       | I guess momentum and establishment is also a feature in itself
       | though I've personally never felt that, one of the selling
       | points, the esoteric statistics packages at the edge would be of
       | any use to me.
       | 
       | The use I've seen is reminiscent of Spyder and Notebooks:
       | tangled, unreadable mess of line-by-line execution where people
       | are prone to re-running stuff out of order.
        
         | akg_67 wrote:
         | Actually, it most probably reverse. Python, numpy, pandas, etc
         | are cobbled together with duct tape to do what R does
         | elegantly. There is no consistency with Python ecosystem.
        
           | Maxion wrote:
           | Agree. One thing is very clear in the R vs. Python debate is
           | that a lot of programmers seem to know either or, not both.
           | 
           | They are different tools for different purposes.
        
             | disgruntledphd2 wrote:
             | I know both pretty well at this point, and would say that
             | python is a much better language overall, but the stats/ML
             | parts (especially pandas) are pretty inconsistent with both
             | Python and themselves.
             | 
             | That being said, R is amazing for exploratory work, while
             | Python is better for integrating with the rest of the
             | world.
        
       | hichamino wrote:
       | I am a clinical laboratorian and i find this article very useful
       | .. Thank you for sharing. Can you help me to explore this field ?
        
         | wheelinsupial wrote:
         | What do you currently use for data analysis?
         | 
         | I've taken courses on statistical computing in R and
         | statistical computing in SAS in my statistics degree. We were
         | always told that SAS is the standard for anything health care,
         | pharmaceutical, or where regulation and publication comes into
         | play.
         | 
         | Anecdotally, my friends who did PhDs in biochem and immunology
         | all used SAS for their data analysis.
         | 
         | Have I been misled or is this up to individual preference?
        
         | cinntaile wrote:
         | https://r4ds.had.co.nz/ you could use this to get started.
        
         | bogeholm wrote:
         | This seems like a reasonable place to start: https://online-
         | learning.harvard.edu/subject/r
        
         | todd8 wrote:
         | Tilman M. Davies's book, _The Book of R: A First Course in
         | Programming and Statistics_ is a good place to start. See [1].
         | 
         | [1] https://www.amazon.com/Book-First-Course-Programming-
         | Statist...
        
       | hichamino wrote:
       | I am a clinical laboratorian and find this article very useful.
       | Can you help me to explore this new field ?
        
       | clircle wrote:
       | Reasons I prefer R to python:
       | 
       | - Rmarkdown. I prefer a text document over a web notebook for
       | exploratory research
       | 
       | - The standard library is for statistics and data: dataframe, lm,
       | anova, etc. are builtin
       | 
       | - A huge range of probability distributions are built-in. I don't
       | need to import extra libs to do simulations.
       | 
       | - Between functional programming techniques and vectorization, I
       | can write very clean and concise code
       | 
       | - Tidyverse and data.table are lovely and coherent approaches to
       | data management. Data.table is fast and memory efficient.
       | 
       | - Advanced models are trustworthy: For example, glmnet, mgcv,
       | nlme, rms are authored by statistical heavyweights, and have
       | accompanying books that are excellent. I don't have the same
       | confidence in python's statsmodels.
       | 
       | - CRAN is easy to use, I can access it from my R session, and
       | there are rarely problems (big thanks to Uwe Ligges)
       | 
       | - Libraries for design of experiments and surveys are available.
       | R supports the entire design -> data management -> model cycle.
       | 
       | - base graphics/lattice/ggplot2 are excellent for plotting. If I
       | need something advanced, I can use grid. If I need vector
       | graphics, I can use tikzDevice for latex.
       | 
       | - Rstudio is a an excellent IDE, and Emacs Speaks Statistics is
       | an excellent Emacs plugin
       | 
       | - It is very easy to get help without going to google. (?foo,
       | ??bar, etc) Documentation is well organized, and the documents
       | often contain citations and relevant links.
       | 
       | - Lots of advanced models can't be found outside of R. Today I
       | fit a splines-on-a-sphere model using mgcv
       | (https://stat.ethz.ch/R-manual/R-patched/library/mgcv/html/sm...)
       | 
       | - Rapid iteration in modeling using Wilkinson notation formulas.
       | Built-in formulas are the actual killer app of R, IMO.
       | 
       | - Things are generally fast, but if you need extra horsepower,
       | plugging into c++ is easy using Rcpp.
       | 
       | - R feels like lisp. Experimentation is easy, and I don't feel
       | forced into any particular paradigm while using R. I have a lot
       | of ways to evaluate code
       | (https://ess.r-project.org/Manual/ess.html#Evaluating-code)
        
         | jhoechtl wrote:
         | Does RStudio already work on Linux wayland? Last time it failed
         | because of a QT-component but AFAIK the whole thing is
         | transitioning to electron.
        
         | jarenmf wrote:
         | Agreed, also R with vim is really a joy to work with (Nvim-R
         | plugin) I can't replicate the experience with any other IDE.
         | For example, I can define my own key bindings to show a certain
         | summary statistic or a custom plot for the variable I'm at.
        
           | clircle wrote:
           | Thanks for this. I might give it Nvim-R a shot. I've been
           | using Evil+ESS for a long time, but Emacs runs like a dog on
           | Windows 10.
        
       | qsort wrote:
       | This is more of a sales pitch for the general concept of
       | programming rather than R in particular. R is as good a choice as
       | any...
        
       | teruakohatu wrote:
       | I would say you could replace R with Python or Julia and do just
       | fine. Anything must be better than SPSS.
       | 
       | That said, R had come a long way in recent years and is enjoyable
       | to use. It is very complete as far as statistics go.
        
         | ekianjo wrote:
         | R has much better syntax with tidyverse for data wrangling and
         | even up to models with tidymodels and all. Python in comparison
         | is hard to read.
        
           | z77dj3kl wrote:
           | What... Have you seen the piping syntax in tidyverse? It's
           | incomprehensible unless you put in a lot of effort to
           | understand all that's going on.
        
             | teruakohatu wrote:
             | What is incomprehensible about it?
        
               | tomrod wrote:
               | Grandparent commenter mentioned the piping syntax,
               | specifically.
        
               | disgruntledphd2 wrote:
               | I dunno, I got pretty sick of writing head(filter(df,
               | value>10)) and it's (a little) easier to indent as df %>%
               | filter(value>10) %>% head().
               | 
               | It's problematic because people abuse it for everything
               | (300 line pipes are common, sadly) but it's a really
               | useful tool in moderation.
        
             | lordgroff wrote:
             | This is such an odd take. I program almost exclusively
             | Python these days and I miss the elegance of piped
             | functions daily.
        
         | Maxion wrote:
         | It depends on what you're doing, there's a lot of genomics
         | related packages for R that do not exist in python.
        
         | nabla9 wrote:
         | I think that R libraries have edge over Python and Julia in
         | quality and quantity.
         | 
         | If you like to write lots of code, Python might be better. But
         | if you use it in clinical research, R has probably better
         | packages for whatever you need.
        
           | tomrod wrote:
           | R packages are the wild west as far as code quality goes. In
           | one corner you have Hadley Wickam producing phenomenal
           | efforts like tidyverse and ggplot. In the other corner you
           | have a herd of feral cats.
           | 
           | Python gets scrutiny but most packages are on github and
           | feedback can be received. Although, you should read the
           | source code regardless.
           | 
           | EDIT: And I re-emphasize -- never trust the source code, even
           | if the company you work with indemnifies. One commercial
           | product we used in particular had an insidious bug in one of
           | the new time series packages that got corrected with in later
           | versions, but we never would have found it if model testing
           | requirements didn't also require implementing in R or Python.
           | Since the package didn't exist for Python, and we wrote it
           | ourselves, we found the performance issues.
        
             | nabla9 wrote:
             | There are different qualities. Code quality and quality of
             | the functionality.
             | 
             | In R you more packages that do what you expect
             | (mathematically) but the implementation is inelegant and
             | slow. Written by someone who knows exactly what they need
             | and what the package should do, but has difficulty of
             | writing it down.
             | 
             | In python you many well implemented neat packages where the
             | code is well implemented and performs well, but is not
             | exactly doing what user need or skips important features
             | because they are conceptually difficult.
        
               | [deleted]
        
               | tomrod wrote:
               | > In R you more packages that do what you expect
               | (mathematically)
               | 
               | I disagree with this point explicitly. Many packages are
               | not only poorly written programmatically and
               | systemically, they also produce bad results in common
               | cases and fail silently. This has been discussed well
               | before on our very own YCombinator.
               | 
               | https://news.ycombinator.com/item?id=17308554
               | 
               | > is not exactly doing what user need or skips important
               | features because they are conceptually difficult.
               | 
               | This is why classes are cool. You can extend them, modify
               | them, and so on. Perhaps we should teach this more to
               | fellow data scientists.
               | 
               | https://docs.python.org/3/tutorial/classes.html
        
               | disgruntledphd2 wrote:
               | > This is why classes are cool
               | 
               | You may be unaware, but there are vast similarities
               | between the object system of R and Python, mostly due to
               | their common inheritance from the Art of the Metaobject
               | protocol. They look very different (generic functions vs
               | classes), but they are equally extensible.
               | 
               | The trouble with R's systems is that there's three of
               | them, and people use whichever works without really
               | understanding any of them, but all the tools are there.
        
               | tomrod wrote:
               | Aware, yes. The point I made before was pointing out that
               | you can implement what you want in Python relatively
               | easy. That R has a wild west set of systems is exactly my
               | point.
        
               | disgruntledphd2 wrote:
               | I don't really get what you mean.
               | 
               | You said, in the context of ensuring conceptually
               | difficult parts of a model/method were implemented:
               | 
               | > This is why classes are cool. You can extend them,
               | modify them, and so on. Perhaps we should teach this more
               | to fellow data scientists.
               | 
               | I pointed out that both R and Python have similar object
               | systems.
               | 
               | > ou can implement what you want in Python relatively
               | easy.
               | 
               | I don't get why this is necessarily true, but I might be
               | missing something. Can you clarify?
        
               | warlog wrote:
               | The only thing worse than an object system is...three
               | object systems. I never quite know what is the cool kid
               | implementation, and I don't even like oo.
        
         | akg_67 wrote:
         | One's who come from programming background prefer Python. The
         | one's who come from scientific or statistics background prefer
         | R.
         | 
         | IME, for production Python rules and for data exploration,
         | graphics, stats and adhoc analysis R rules.
        
       | CalRobert wrote:
       | R might be a perfectly fine language, but the culture and
       | ecosystem around R seem to produce a lot of untested, difficult
       | to read code. Globals everywhere, mediocre requirements
       | resolution, and a lack of forced namespacing come to mind. Maybe
       | it's a result of being used by people who are not primarily
       | programmers.
        
         | rout39574 wrote:
         | That's my diagnosis. I've tried on and off since at least 2006
         | to affect the course of that culture, to make managing R
         | packages more tractable.
         | 
         | I'd say the R dev community was then actively hostile to a
         | culture change to support any model other than individual
         | contributors working at their desktop.
        
         | disgruntledphd2 wrote:
         | Most R is written by people who are not (and mostly don't want
         | to be) professional programmers. This is both a strength
         | (industrial strength discipline specific tools) and a weakness
         | (oh dear lord the code, my eyeeeeees).
         | 
         | It's also important to note that much of the original core of R
         | is based on S, which was developed around the same time as C,
         | so some baggage would be expected.
        
       | lordgroff wrote:
       | I see a lot of R is very hard to reproduce use Python, or R is
       | hardly a programming language, and I honestly have to wonder if
       | this is really written in good faith, and on a forum that's
       | supposedly a bastion of Scheme love, no less.
       | 
       | R is far more of a Lisp than Python, and in a field that heavily
       | relies on DSL abstractions (which definitely includes clinical
       | laboratories), R is going to fight you a lot less than most
       | choices.
       | 
       | In regards to packaging, you have MRAN snapshots, you have conda
       | (which will give you binaries on Linux), you have renv, roughly
       | in order of preference. The situation is not ideal, but it's
       | certainly not worse than Python, this is not a hill I'd die on!
       | 
       | Julia might be an exciting and welcome alternative to both; from
       | where I'm sitting, it hardly even enters the conversation
       | currently (in data science where I'm at, it's all Python and R,
       | with Python unfortunately taking by far the larger slice of the
       | pie), but I wish it a bright future, it's a great language.
        
       | roonilwaslib wrote:
       | I run operations for a company that relies heavily on R, and I'd
       | strongly advise against using the language. R's package
       | management system makes reproducing work difficult. We've had to
       | rely on using renv, a snapshot of CRAN (the default source of R
       | packages: some FTP servers), and a bunch of Docker to get vaguely
       | reproducible installs. However, since R installing a package
       | involves compiling that code that you just downloaded from a
       | public FTP server, installs are extremely slow.
       | 
       | I'd recommend python based on the slightly-saner tooling. I've
       | found that python with conda/pipenv/poetry results in mostly
       | reproducible installs of the tools needed to run a computation.
        
         | disgruntledphd2 wrote:
         | So, I actually get what you mean here, and have used both R and
         | Python in anger for a number of years.
         | 
         | This is all about tradeoffs. Fundamentally, if your package
         | doesn't compile on the latest version of R, it gets removed
         | from CRAN. This means that each version of R has a consistent
         | set of packages that (mostly) work together.
         | 
         | Contrast with Python which does facilitate reproducible builds,
         | because you can hack together ancient versions of Python and
         | make them continue working. I could go into a massive rant here
         | about pip, but it's trending in the right direction now and I
         | don't want to discourage any of the people working on it.
         | 
         | R is better in terms of being able to ensure that for a given R
         | version, any package you install will be compatible, Python is
         | better for making sure that that one application built three
         | years ago keeps working in the same fashion.
         | 
         | Also, it sounds like you're running a nix based system, have
         | you considered (I'm sure you have) using the system packages.
         | For example, the Debian/Ubuntu ones are pretty comprehensive,
         | at the cost of using older versions. I _believe_ that R-studio
         | also have pre-built packages for Linux (but have not tested
         | this) so that could also work.
         | 
         | To be fair, conda is pretty good as a package manager, because
         | it handles the C++/C dependencies. But to your Docker point,
         | that's how I handle the insanity that is python packaging,
         | especially in the data science space, so it may just be an
         | issue with the field itself.
        
         | lordgroff wrote:
         | Conda supports R and it gives you R binaries on any platform.
         | We've used this setup for years at my old workplace, and it
         | gives you sane reproducible builds.
        
       | _Wintermute wrote:
       | I basically worked as an R troubleshooter in a Pharmaceutical
       | company, and honestly I wish python or Julia would take its
       | place. There's so many instances when R would return a nonsense
       | answer rather than fail, but you wouldn't realise until you did a
       | deep dive of someone's code.
        
         | fastaguy88 wrote:
         | It would be so easy to write a very similar article on why
         | clinical laboratories show NOT use 'R' in their analyses. I use
         | 'R' extensively for data presentation, but I am constantly
         | bitten by plots that look great, but do not in fact represent
         | the data, because of some weird "factor" issue. I have never
         | used a language where it is so easy to get beautiful results
         | that are wrong. 'R' has very limited error checking, if it can
         | figure out a way to do something (incorrectly), it's happy to
         | do it. With its hidden 'statefulness' and tricky 'factor'
         | effects, it would be a disaster in the clinical setting.
         | Clinical labs need languages and procedures that will fail,
         | rather than present an incorrect result. Unfortunately, 'R'
         | does the opposite.
        
         | eigenspace wrote:
         | So many dynamic languages take this strange ethos of never
         | wanting to throw an error and instead just guessing what the
         | programmer meant and just doing something wacky instead of
         | throwing an error. It's a real problem.
        
         | vixen99 wrote:
         | So the question is, who was writing the code and why were they
         | so evidently incompetent? Easy for anyone to pick up a bit of R
         | and start working with it. Thus it's hardly surprising to find
         | the situation you describe. Why weren't these folk put through
         | a rigorous course before being let loose in a pharmaceutical
         | company of all places? Hardly their fault unless they
         | exaggerated their skills.
        
           | fastaguy88 wrote:
           | While your standards could be different from mine, I don't
           | think that all programmers who fail to write perfect code the
           | first time are incompetent. Many competent programmers are
           | not perfect, and rely on error messages and debuggers to
           | produce correct code. Unfortunately, R does often fails to
           | give the information required to find bugs.
        
           | 41b696ef1113 wrote:
           | My goto resource on this would be aRrgh which goes through
           | some of the many rough edges of the language. Silent failures
           | and data type castings can bite even those experienced in the
           | language.
           | 
           | R is a data analysis DSL that also happens to be a full
           | programming language.
           | 
           | [0] http://arrgh.tim-smith.us/
        
       | toolslive wrote:
       | > Unlike Excel and many other graphical user interface
       | (GUI)-based programs, R's reliance on text-based structure makes
       | it straightforward to review at any time the commands used in a
       | data processing pipeline to ensure that the correct steps were
       | taken.       > Furthermore, the ability to view the underlying
       | commands facilitates transparency and reproducibility of
       | analyses.
       | 
       | The article seems to be targeted at people with zero programming
       | knowledge. The arguments here are valid, but don't rely on R. The
       | title could just have been ".... Should Embrace a Programming
       | Language".
        
         | adimitrov wrote:
         | I disagree. As a software engineer, R is a nuisance, it's a
         | terrible language, and I hate doing complicated things in it.
         | 
         | But it's very powerful, it's exactly right for these use cases
         | and its ecosystem is mindbogglingly huge. Also, it tends to be
         | easier to grasp for folks who don't have prior programming
         | knowledge (anecdotal, but I've seen people pick it up very
         | quickly who struggled a lot with, say, Python. And Python is
         | the only language/ecosystem that comes close to R.)
         | 
         | So, yeah, a lot of languages _could_ be used for the use cases
         | in TFA, but R is uniquely suited, weaknesses notwithstanding.
        
           | fastaguy88 wrote:
           | But, to focus on the original article, Clinical Labs, where
           | data analysis is literally the basis for life-and-death
           | decisions, is not a sensible use case.
        
       | hermitcrab wrote:
       | R, for all its strengths, seems to have quite a learning curve
       | and a lot of syntax to remember. The other approach is to use a
       | GUI based enviroment such as Easy Data Transform, Alteryx or
       | Knime. These are never going to be quite as flexible as a
       | language-based approach, but they are a lot easier to get started
       | with - especially for people with no programming background.
        
       | kardianos wrote:
       | No.
       | 
       | R is hard to reproduce library setup. Unable to compile and
       | static validation.
        
       | kubb wrote:
       | My impression is that people outside of maths and statistics are
       | more likely to choose Python than R, because they're able to get
       | started with it more easily.
       | 
       | Conventional programmers seem to be somewhat reluctant to learn
       | R's syntax and adjust their programming model.
       | 
       | Non-programmer types think in maths even less so they like the
       | python "straightforwardness".
        
         | z77dj3kl wrote:
         | R is used almost exclusively in stats. Most in maths use
         | python, c++ (there's a surprising amount of hpc code, e.g. pde
         | solvers and other stuff floating around in c++), matlab, etc.
        
           | the_only_law wrote:
           | No more fortran I guess?
        
             | MohammedAShahid wrote:
             | From what I've seen, there is actually still people using
             | FORTRAN for Applied Math. I had several professors who use
             | it for CFD.
        
       | nnm wrote:
       | It's interesting to observe that whenever R is discussed, someone
       | talked that python is better than R in this or that area.
        
       | ppod wrote:
       | There are some specific advantages that R has over other
       | languages in this context. One is that you can use (almost)* a
       | single source document to produce docx, pdf, and interactive html
       | output.
       | 
       | * I can't quite get htmlwidgets and docx formats to work together
       | in bookdown without using separate commands for the interactive
       | tables (DT and flextable).
        
         | ekianjo wrote:
         | Rmarkdown is certainly a big win in terms of reporting and its
         | integration with Rstudio makes it a breeze to work with it
        
       ___________________________________________________________________
       (page generated 2021-05-22 23:01 UTC)