[HN Gopher] RStudio: Integrated development environment (IDE) for R
___________________________________________________________________
RStudio: Integrated development environment (IDE) for R
Author : _benj
Score : 93 points
Date : 2024-03-20 11:02 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| rrjjww wrote:
| As someone who learned most of my initial coding abilities
| through R and RStudio in a data science context, and since moved
| on to more "standard" languages and IDEs, I've yet to find
| anything that comes close to the flexibility and integration of
| RStudio for hacking together data analytics.
|
| VS Code/Python has made some major improvements in the past
| couple years but it's still very clunky compared to the ease of
| running R code line by line without having to start up a debug
| instance. And now with copilot the most frustrating parts of R
| (such as remembering all the Tidyverse syntax) have been
| abstracted away.
| qudat wrote:
| My partner does a lot of biostats in RStudio and I really think
| it breds terrible habits. Instead of categorizing code by
| files, everything is shoved into massive files. Instead of
| running a file top-to-bottom, code is run out-of-order which
| makes the code organization and flow of a program a complete
| disaster.
|
| There is something to be said about running and processing
| large CSVs and keeping that in memory while running other parts
| of the program as well as having clickable access to all the
| dataframes loaded into memory.
| mjhay wrote:
| There's nothing about RStudio that encourages big single
| files or writing huge unstructured scripts. RStudio is a
| pretty good IDE, and R is a highly expressive functional-
| first [0] language. R was heavily influenced by Scheme, and
| has its own powerful metaprogramming [1] system - which is
| used to great effect in Tidyverse[2] libraries to make APIs
| that are nicer and convenient than anything reasonably
| practical in Python.
|
| The problem with a lot of end-user R code is that it is
| written by statisticians, not programmers. They'd write the
| same garbage and huge scripts in Python (trust me, I know).
|
| [0] http://adv-r.had.co.nz/Functional-programming.html
|
| [1] https://adv-r.hadley.nz/metaprogramming.html
|
| [2] https://www.tidyverse.org/
| cameronh90 wrote:
| I agree that RStudio isn't too awful, but the packaging
| management and reproducibility situation in R is dire, even
| compared to Python.
|
| I have to deal with getting code from data scientists into
| production, and simply getting it to run outside of their
| mutant local environment can take days. Things are starting
| to get a bit better with packrat initially and now
| renv/pak/rig and the like, but most DS haven't heard of
| them, and major breakages between minor library versions
| are still commonplace, as are undocumeted system library
| dependencies. Then there is the whole stringsAsFactors
| nightmare, thankfully slowly on its way out but still
| around causing occasional catastrophic breakage.
|
| There are lots of nice things about R, but it makes it very
| easy to shoot yourself in the foot.
| mjhay wrote:
| Yeah, the package management situation is a big weak
| spot. There are some issues with renv, but it is usable.
| It definitely helps to keep a lid on the number of
| dependencies, and for God's sake never pull anything in
| from Bioconductor. IMO, new code should always prefer
| Tidyverse libs for basic stuff, and avoid relying on the
| ancient and warty standard library.
|
| All that said, I still greatly prefer it over Python for
| DS work.
| levocardia wrote:
| >I agree that RStudio isn't too awful, but the packaging
| management and reproducibility situation in R is dire,
| even compared to Python.
|
| I've had exactly the opposite experience. For R, I
| download R and install it, and download Rstudio and
| install it. Then when I need a new package I just
| install.packages("coolnewpackage") and it just works
| (TM). Occasionally I get info messages about packages
| being built in newer versions of R, and once a year or so
| I eventually get around to looking up how to use the
| updateR() function, but in five years of doing biostats
| in R I can't remember a single time I had a dependency
| issue.
|
| Python, on the other hand, is a nightmare. Conda makes
| life a lot easier, but it is not easy to learn if you are
| not a software engineer (remember, R was made not just
| _by_ statisticians, but _for_ them as well). For many
| projects, my Python flow was something like...
|
| Try creating a new conda env with the packages I think I
| need. Try starting the project, oops I don't have spyder-
| kernels installed. Oh, and my environment isn't
| compatible with it. How about just running it in VScode?
| Well now I don't have my variable explorer. How about
| Jupyter? How do I get Jupyter to find my conda env again?
| Oh wait I need this other library it's only on conda-
| forge, and then the conda environment solver fails. I
| guess I'll start from scratch with a new conda env, and
| maybe after several trial-and-error sessions of carefully
| composing the correct "conda create -n ..." incantation
| in a text editor before copy-pasting them to the command
| line, I _might_ get the environment I need up and
| running, after conda finishes its 10-minute compatibility
| search and downloads 80 GB of python libraries.
|
| And using conda is the _easy_ way of doing it! Don 't
| even get me started on pip and venv...
| extr wrote:
| Great summary of the situation. If you've ever been in
| the position of trying to explain to a bunch of R users
| why Python packaging is so much harder to deal with, you
| know the struggle. R/RStudio really makes it incredibly
| easy to get up and going for non-developers in a way
| that's probably hard to appreciate for many people on HN
| who are SWEs by trade.
| 0thgen wrote:
| I have never needed anything more than pip in 8 years of
| development, and have always run into issues with r
| packages (every new version of r seems to break 30% of
| existing tidyverse packages)
| disgruntledphd2 wrote:
| Do you do much DS/ML in Python? I definitely agree that
| pip is totally fine otherwise.
|
| At work, I've been giving out about pip to one of our DEs
| for a while, and when he needed to upgrade a bunch of DS
| packages he finally started coming around to my opinion.
| th0ma5 wrote:
| With R on Windows, you get some binary dependencies, but
| on Linux you need the system libraries for any package
| that uses an external library. R uses the HTTP headers to
| determine which binary package to send you and no roll-
| your-own package system for virus scanning and the like
| supports either the Conda contrib patterns nor the R HTTP
| code binary scheme. I think Conda used to be kind of
| cool, but I have the same problems, and its position was
| always to make a ton of assumptions about what you want
| to do. R is like that... Sensible and automatic defaults
| that you can't find or aren't told about.
| listenallyall wrote:
| Your own experience seems to disprove the claim that
| conda makes running analytical/numerical code easier in
| Python. Simple venv and pip really is the simpler choice.
| _Wintermute wrote:
| I think a lot of the problem is that R does everything it
| can to prevent people from writing modular code.
|
| It doesn't have modules or namespaces, and the current
| fashion is for packages to use non-standard evaluation
| which adds friction to user's writing their own functions.
| t-kalinowski wrote:
| R does have namespaces. Take a look at the NAMESPACE file
| found at the root of every R package, which defines the
| symbols and methods exported by the package.
|
| Note for many R packages, the NAMESPACE file is
| autogenerated from roxygen docs: https://cran.r-project.o
| rg/web/packages/roxygen2/vignettes/n...
| _Wintermute wrote:
| > which defines the symbols and methods exported by the
| package
|
| Which are all dumped into the one single global namespace
| regardless if you want everything or not.
|
| I can't remember the exact number, but tidyverse package
| imports literally _thousands_ of things into your global
| namespace on package load, coupled with any other
| dependencies and you have a hell of a time figuring out
| where any function or constant came from.
| mjhay wrote:
| Calling library() is kind of an antipattern in production
| R code. You can either call namespaced functions (like
| say dplyr::mutate()), or use roxygen.
|
| https://roxygen2.r-lib.org/articles/namespace.html
| disgruntledphd2 wrote:
| Agreed but the GP isn't wrong. It's much much nicer to
| import a library with an alias in Python.
| dxbydt wrote:
| > it is written by statisticians, not programmers. They'd
| write the same garbage in Python
|
| I guess I should take offense as a statistician. But its a
| fairly common complaint. The reality is, most of us
| statisticians are trying to compute a result. Like once. Or
| sometimes twice. For a paper. Or a task. If someone comes
| to me with a time series and asks me to test it for
| stationarity, or find the p lags to make it MA(p)
| stationary, they aren't asking me to write a program. The
| goal is not reproducibility. The goal is a fast answer.
| I've used R at trading desks & financial institutions - the
| goal has seldom been "run the same program again, but with
| this new input". If that was the case, I would write a
| function & stick it in a nice library with documentation.
| But these aren't tech firms. We aren't shipping software.
| The goal is to compute something fast so you can get on
| with life & make the trade, or draft the next paragraph in
| your paper, or... Like if they give me a set of bespoke
| mortgages with some hairy constraints & ask me to compute
| the value at risk, there is not much point in building some
| VaR function. Because its a once in a while thing. Next
| time it will involve a different set of args & they'd be
| different constraints & so forth. So just write some 10
| line script & get the number & move on. Yeah, sometimes I
| would stash the script in some repo & write a 1-line
| comment on how it works - but its kinda pointless, it
| doesn't get much play/reuse. We aren't programmers in that
| sense, we are just trying to solve problems.
|
| My kid knocked on my office door yesterday. He's in some
| AoPs course where they use generating functions to count
| stuff. So he had a problem about the number of ways to add
| three odd numbers to make 1001. He had worked out the
| algebra & gotten some number, but before he hits Submit, he
| wants to doublecheck with me because wrong answers have a
| penalty. Now, I don't have the time to go back to school
| and learn what is a generating function. And I don't want
| to write lots of for loops & if statements & fight with
| syntax errors & so forth. So my 1-liner in R
|
| dim(subset(expand.grid(a=seq(1,1001,2), b=seq(1,1001,2),
| c=seq(1,1001,2)), a+b+c==1001))
|
| tells me there are 125250 ways. He says he got the same
| number with generating functions. Boom done! So that's what
| R is for. Quick & easy.
| RedCardRef wrote:
| I have been an R "user" for a while now, after reading
| your single line approach to the problem I am reminded of
| the saying which goes something like this "An idiot
| admires complexity, A genius admires simplicity!".
| Perfectly splendid!
| cjk2 wrote:
| This is the defacto standard way of operating it I
| understand, which is mostly just hacking at stuff in small
| chunks until it sort of works and leaving comments throughout
| it with "run this bit on Tuesdays only".
|
| I recently had to inherit someone's R stuff and I had to
| learn R and fix it all. It now runs from a makefile
| repeatably.
|
| Anyway it could be worse. It could be Minitab.
| ellisv wrote:
| > Instead of categorizing code by files, everything is shoved
| into massive files.
|
| That's not really RStudio's fault. It is just how many people
| use R and were taught.
|
| > code is run out-of-order which makes the code organization
| and flow of a program a complete disaster.
|
| In my experience, with R Markdown, this is untrue. I see
| Jupyter Notebooks with cells run out of order much more
| often.
| madcaptenor wrote:
| I have done a lot in R Markdown, and the project I'm
| currently working on has me mostly working in Databricks
| notebooks (which are very similar to Jupyter notebooks). My
| execution gets out of order a lot more often in Databricks.
| bachmeier wrote:
| > Instead of running a file top-to-bottom, code is run out-
| of-order which makes the code organization and flow of a
| program a complete disaster.
|
| That's more a REPL issue than specific to a particular
| language. It's the tradeoff you make. I write my R programs
| in Geany and then run the whole thing using Rscript. That
| gives me a clean environment on every run.
| goosedragons wrote:
| Emacs + ESS? Way more flexible. Maybe less integration because
| many of the big R package devs work for Posit. RStudio has a
| lot of superfluous junk in the UI I just don't need or care
| about.
| kqr wrote:
| I've used ESS for the past few years and recently tried using
| RStudio when I'm on Windows. For my purposes, which is just a
| little industrial statistics on the side, they are remarkably
| similar. I feel right at home in either!
| lylejantzi3rd wrote:
| > I've yet to find anything that comes close to the flexibility
| and integration of RStudio for hacking together data analytics.
|
| Is there a good demo or video you can point to that shows this?
| I have no experience with R, RStudio, or data science, but
| you've piqued my interest.
| ellisv wrote:
| Any of David Robinson's (or anyone else's) Tidy Tuesday
| videos.
|
| https://www.youtube.com/@safe4democracy/featured
| Kalanos wrote:
| jupyter
| silveraxe93 wrote:
| This works out of the box in VSCode?
|
| Just open a .py file, then select the snippet of code you want
| to run and cmd+enter
|
| It will open a new REPL for you (using your selected
| interpreter) the first time, and after that all commands are
| run in that same one.
| wodenokoto wrote:
| RStudio is just way better at choosing what code to send (if
| you only send the line the cursor rests on you're gonna have
| a bad time. VSCode is a bit better than that but not great.
| Also, where does your plots get drawn when you use this?
| RStudio just works in this regards)
| ubiquitination wrote:
| I agree - I teach statistics at a University and there is
| really no alternative to Rstudio for working with R. This is
| especially true considering that the vast majority of folk
| using R (in my field) have no prior programming experience.
| Downloading R, Vscode, downloading some R plugin, getting them
| to talk to each other, and only then starting to learn R -
| isn't very straightforward. It's also remarkably consistent on
| different operating systems - something to consider when half
| the students are on windows, half on macos...
| bachmeier wrote:
| RStudio Server on a Digital Ocean instance made my life a lot
| easier. Students fire up a browser, log in, and they're using
| R with all the packages. It was horrible when students ran R
| on their own machines back in the old days. Most of the
| questions I got were tech support rather than related to the
| material. And these days it has good Python support too.
| dcreater wrote:
| Jupiter (ipynb) notebooks in vs code.
| jurimasa wrote:
| If you work with Python, Spyder comes really, really close and
| is way better than jupyter
| RobinL wrote:
| It looks like, as far as I can tell, VS Code doesn't support
| the interactive window for working in R, which was a bit of a
| surprise to me when i looked it up.
|
| The python interactive window has pretty much fully replaced my
| use of jupyter, since it gives you notebook-style output
| without the annoyance of the notebook format. My usual workflow
| is highlighting lines of code and shift-enter to execute
| (there's also a cells syntax).
|
| I'm surprised by this because it _is_ possible to use R in
| Jupyter (although I never really liked the experience, R Studio
| was far superior).
| yabbs wrote:
| ?
|
| Yes it does.
| aiisjustanif wrote:
| Please supply references for the audience.
| RobinL wrote:
| I'm specifically referring to:
| https://code.visualstudio.com/docs/python/jupyter-support-
| py
|
| The support for R looks a bit different (to me at least?):
| https://code.visualstudio.com/docs/languages/r
|
| In the screenshot the window on the right does not look
| comparable to the output in a jupyter notebook. It looks
| more like a standard terminal. e.g. does it support
| interactive charts, html tables etc?
|
| The Python interactive window uses the ipykernel package to
| allow rich outputs like that.
|
| I still might be wrong and would like to be corrected on
| this, since it would mean R support in VS Code is now
| better than I thought (I haven't tried it fora. while)
| jakupovic wrote:
| cat, grep, sort and awk come pretty close :)
| dcchuck wrote:
| Came here to share that same experience. RStudio truly made me
| feel "close" to the data.
| ivan_ah wrote:
| An alternative in the Python world that is definitely worth
| looking into is the JupyterLab Desktop app, which is a
| standalone installer that is cross-platform and works great for
| beginners (no command line needed):
| https://github.com/jupyterlab/jupyterlab-desktop?tab=readme-...
|
| See my other comment in the main thread with more info.
| ellisv wrote:
| Are we just submitting GitHub repos as posts now?
| JR1427 wrote:
| I was thinking the same. R studio is certainly not new, either.
| forgotpwd16 wrote:
| Hasn't this been happening ever since GitHub opened?
| gdevenyi wrote:
| If I complain here will they fix my year old bug?
|
| https://github.com/rstudio/rstudio/issues/12508
| jmcphers wrote:
| Can't make any promises -- our dev team is pretty small! -- but
| it's been flagged for triage.
| cdrv wrote:
| This particular issue should be resolved in the latest daily
| builds of RStudio. The underlying issue here was a conda patch
| included in the conda-provided builds of R, which interfered
| with the way RStudio attempted to load R. Please see
| https://github.com/rstudio/rstudio/issues/13184#issuecomment...
| for more details.
| gdevenyi wrote:
| The answer, it turns out, was yes!
| fumeux_fume wrote:
| It's really nice to have everything you need in one spot. Plus
| it'll run on any OS and is free. I started learning how to
| program with C++ back in the early 2000s which required Windows
| and a Visual Studio license and it was still a pain to get stuff
| done. Whether it's RStudio or Jupyter there's really never been a
| better time to start picking up a language and building something
| useful. Three cheers for the creators, maintainers and community
| who support tools like this.
| tetris11 wrote:
| Freemium is what they ("Posit") are pivoting to now.
|
| https://posit.co/pricing/individual-products/
|
| If you want a Rstudio server to host for a research group
| containing more than 5 people, talk to their sales Rep.
|
| Otherwise each person will need to host their own Rstudio
| server side-by-side on the same machine.
|
| Jupyter and JupyterHub is the way forward.
|
| Especially if they get multi-kernel notebooks mainlined (read:
| what Org-Mode has been doing for decades)
| jmcphers wrote:
| That pricing sheet is for Posit Workbench; RStudio Server[0]
| can host as many people as you have the compute for, and it's
| free and open source. It does only support one session per
| user, but might meet the needs of a small research group.
|
| [0] https://posit.co/download/rstudio-server/
| wjholden wrote:
| The killer feature of RStudio for me is RMarkdown.
|
| I composed almost all my homeworks in grad school using RMarkdown
| in RStudio. You get LaTeX whenever you need it, code (I usually
| use it for R or Julia), and markdown for ordinary text. The kable
| function renders tables nicely from data frames and ggplot2
| creates beautiful plots.
|
| Mathematica and Jupyter have a few advantages, but overall I'm
| very happy with RStudio.
| minimaxir wrote:
| RMarkdown in RStudio _was_ the killer feature, until the VSCode
| R extension matured. Not only does it support RMarkdown, it
| adds a ton of features RStudio doesn 't have and runs a lot
| faster.
| https://github.com/REditorSupport/vscode-R/wiki/R-Markdown
|
| For my uses, it replaced RStudio 100% of the time.
| dr_kiszonka wrote:
| Thanks for the link! Is it possible to display plots inline
| like in notebooks? (The screenshot shows a plot in a preview
| pane.)
| minimaxir wrote:
| Unfortunately no. (tbh I don't like that feature in RStudio
| anyways: it makes it longer to scroll through large
| notebooks, and ggsave is better at rendering charts than
| R's native rendering)
|
| For knitting, you can use Markdown image links.
| dr_kiszonka wrote:
| Thanks for letting me know.
| adr1an wrote:
| Can you use quarto in vscode? It's the next magic from
| Posit.co
| minimaxir wrote:
| Yes, quarto has native support for VSCode:
| https://quarto.org/docs/get-started/hello/vscode.html
|
| There isn't much advantage to using it over RMarkdown for
| R, IMO.
| Tarq0n wrote:
| That's a lot of prerequisites for something that just works
| in rstudio.
| minimaxir wrote:
| It takes 5-10 minutes to set up the dependencies.
| mightyham wrote:
| RStudio and the R language are a couple of my absolute favorite
| pieces of software. While I'm a software engineer by trade, every
| once in a while I need to do some data analysis work and throwing
| together a notebook in RStudio always makes me feel like I'm
| using a cheat code. For simple tasks, everything is incredibly
| seamless, plus coworkers who are unfamiliar with R are usually
| impressed by how nice ggplot visualizations can look.
| lvl102 wrote:
| I enjoy RStudio but the best feature of R is data.table. It's
| simply unmatched.
| ProjectArcturis wrote:
| Once you climb that steep learning curve, absolutely.
| th0ma5 wrote:
| Polars is faster? Data.table was a pioneering speed improvement
| at one point for sure.
| lvl102 wrote:
| It is but if we are talking speed, I'd just opt for RAPIDS.
| uptownfunk wrote:
| I think one of the most underrated pieces of software in modern
| history. Absolutely brilliant. Huge fan. I am glad to see it
| getting love. I've moved on from data science in a professional
| capacity but for some pet projects of mine it has been
| indispensable. I think managing the namespace was one non trivial
| concern (which may be resolved in modern versions). Otherwise
| very well built for data science applications. Interesting that
| it didn't catch on for LLM training - I think a missed
| opportunity.
| dclaw wrote:
| Ahh cool, now r-studio brings up this instead of the 24 year old
| data recovery program.... :-(
| stonogo wrote:
| RStudio is thirteen years old so I'm not sure what changed that
| makes the search results different "now"
| matttproud wrote:
| I'm about as old school as you can get with preference for CLI
| and simple text-oriented development environments. I recently
| picked up R again for a long-term data science project
| (https://matttproud.com/blog/posts/teaser-weather-temp-repres...)
| after having not used it since university. In spite of a fair bit
| of annoyance with the R language
| (https://matttproud.com/blog/posts/rant-and-r-melt-
| function.h...), I found RStudio to make the prototyping process
| with R actually tolerable. Big kudos to Posit and the R community
| for RStudio.
|
| There are a couple of things I would love for the R ecosystem:
| project scaffolding to do bulk data generation (e.g., from
| continuously generated data sets). What's the best way to do
| this: makefiles, or what? I have a relatively short entrypoint R
| file that sources other leaf files to run specific analyses, but
| it makes the software engineer inside of me want to curl up and
| die.
| mjhay wrote:
| reshape2 (where `melt` is from) has been deprecated for some
| time, and for pretty good reasons. Try dplyr and tidyr instead
| - they are much nicer and modern. The equivalent of melt would
| be pivot_longer. For packaging, renv is the usual choice. I
| wouldn't structure the package as a bunch of scripts with an
| entrypoint. Just write functions as you would in other
| languages, and keep any specific analysis script small.
|
| https://tidyr.tidyverse.org/
| melondonkey wrote:
| Weird one minute it feels like the internet is screaming that I'm
| an out-of-touch dinosaur for using R and the next a simple link
| to its most popular IDE makes the front of HN.
| ivan_ah wrote:
| The closest Python equivalent to RStudio is the JupyterLab
| Desktop app[1,2], which I highly recommend. I've entirely
| switched to using it for teaching, and it is a godsend, since it
| works the same way across platforms (win/mac/linux), installs its
| own Python interpreter independent of any system Python the
| student might have, and even comes with
| NumPy/SciPy/Pandas/Seaborn/statsmodels already installed, which
| makes it possible for me to skip the `pip ...` or `conda ...`
| instructions altogether.
|
| Between the standalone desktop app, and the convenience of
| running JypyterLab in the cloud thanks to https://mybinder.org/
| links, there is now a smooth path for beginners getting into
| stats/ML/data science: (1) read notebook on github or nbviewer,
| (2) run notebooks in the cloud via mybinder links, (3) install
| JupyterLab Desktop app, (4) learn to install Python+env-manager
| via command line. Previously, new learners were forced to jump
| straight to (4), but now there are logical steps along the way!
|
| [1] https://github.com/jupyterlab/jupyterlab-
| desktop?tab=readme-...
|
| [2] https://blog.jupyter.org/jupyterlab-desktop-app-now-
| availabl...
| Kalanos wrote:
| i use jupyter a lot for python. i occasionally have to use
| rstudio for bioinformatics. the ux is much, much worse. just
| haven't bothered to get the R kernel for jupyter working.
| rubslopes wrote:
| Is there a way to visualize a dataframe like a spreadsheet, as
| RStudio does, but for VSCode?
| HayBale wrote:
| Ahhh I started my programming with Rstudio. Since than I changed
| to Emacs with ESS.
|
| Rstudio is nice but lacks a lot of nice things from something
| bigger.
___________________________________________________________________
(page generated 2024-03-20 23:02 UTC)