[HN Gopher] Learn R Through Examples (2020)
       ___________________________________________________________________
        
       Learn R Through Examples (2020)
        
       Author : diplodocusaur
       Score  : 129 points
       Date   : 2021-06-05 11:43 UTC (11 hours ago)
        
 (HTM) web link (gexijin.github.io)
 (TXT) w3m dump (gexijin.github.io)
        
       | nojito wrote:
       | R's such a fantastic language and is leagues ahead of all others
       | for data work.
       | 
       | I do however recommend picking up data.table along the way
       | because that is easily one of the best reasons to continue using
       | R today.
        
       | AndyPatterson wrote:
       | I work using R almost everyday and I think many of the problems
       | that are unique to R could be solved by having a couple of
       | experienced SWE in the core team to point R in the right
       | direction. As it stands, I think R will be left behind until it
       | fixes things like performance and scalability (e.g. intuitive
       | byref semantics and a faster runtime) and a consistent scoping
       | model and OOP.
       | 
       | Apart from that, I think there's a bigger challenge which still
       | needs to be addressed is that analysis/modelling projects tend to
       | be worked on by individuals and/or thrown away after the initial
       | value is pulled out of them.
       | 
       | Going forward, I think we need to start identifying design
       | methodologies that would make collaborating on this sort of work
       | pain-free and more agile. Doing so should give us more value and
       | sooner and for longer.
        
       | civilized wrote:
       | Here is my take on R as a guy who does stats as well as some
       | software engineering in more mainstream languages like Python.
       | 
       | R is a fantastic DSL for data manipulation and statistical
       | analysis, with both traditional and modern tools, on datasets up
       | to the gigabyte scale. It has great, easy-to-use data structures
       | and unparalleled APIs in the tidyverse. It is not the thing for
       | the latest deep learning implementation on petascale data, but
       | most data science work doesn't need or benefit from that.
       | Surprisingly, some machine learning methods have their nicest
       | APIs in R, because that's where the users of those methods are.
       | 
       | It has its warts, but so do JavaScript and SQL, and I think few
       | people dispute that these are very powerful DSLs. Statistical
       | analysis is just as legitimate a computing task as building
       | webpages or querying databases. It is not the same as general-
       | purpose programming, and it needs a good DSL.
        
         | ryando wrote:
         | Agree with this. Nvim-R plus dplyr and the plotting libraries
         | are the best tools I've found for manipulating and
         | understanding the characteristics of a dataset quickly.
         | Eventually I moved to Python for the specific reason that the R
         | packages for interacting with cloud platforms couldn't really
         | keep up with the development of those platforms, and I got
         | tired of having projects that mashed together both languages. I
         | haven't checked, maybe things have stabilized enough now that I
         | would be happy going back to R.
        
           | civilized wrote:
           | That makes sense. I don't much like crossing streams between
           | languages on a project. It's great if you can keep R for
           | analysis and Python for production, but if R can't keep up
           | with the wrangling part, you're a bit up a creek. I'm lucky
           | in that most of my analysis only needs to query fairly well-
           | behaved SQL databases.
        
         | tylurp wrote:
         | As an experienced R programmer that doesn't do much statistics
         | or have a background in computer science. What language
         | characteristics does R lack that makes it a DSL instead of GPL?
         | 
         | Edit: I found an interesting quote from a guy named Martin
         | Fowler about the subject.
         | 
         | "Languages can have a domain focus but still be general-purpose
         | languages. A good example of this is R, a language and platform
         | for statistics; it is very much targeted at statistics work,
         | but has all the expressiveness of a general-purpose programming
         | language. Thus, despite its domain focus, I would not call it a
         | DSL."
        
       | specproc wrote:
       | R is such a horrible language to learn. I gave up entirely and
       | now just use rpy2 for the few things it can do that Python can't.
        
         | vasili111 wrote:
         | Have you tried to read R book? Lots of people are learning from
         | videos, tutorials and etc and that is not a good approach to
         | learn R.
        
       | qntty wrote:
       | If you already know another dynamic language and want to
       | understand R, I would recommend skipping all the intros to R
       | based around data analysis and start with Advanced R by Hadley
       | Wickham. It will explain all the weirdness you'll encounter right
       | up front before it confuses you. Then you can read the data
       | analysis tutorials and focus on the content rather than the R
       | weirdness.
        
         | wodenokoto wrote:
         | I was quite surprised how easy that book was to read and
         | understand. I'd expected "advanced" any language to be much
         | more difficult.
        
           | beforeolives wrote:
           | The title is kind of misleading - the book is more of a list
           | of exceptions, edge cases, unexpected behaviour and other
           | gotchas.
        
             | carljv wrote:
             | That is not an accurate description of the book.
        
         | CornCobs wrote:
         | I agree that Advanced R is a fantastic resource; however I do
         | not think reading it is a good way to learn R. Advanced R
         | explains the whys not the whats, and someone coming into R, who
         | hasn't encountered any of the stuff Wickham gives intuitions
         | for will likely find it an inexplicable truck of concepts.
         | 
         | Instead I would propose using R in some capacity, encountering
         | it's weird quirks (why sometimes evaluating an expression
         | prints something to console and sometimes it doesn't? How is
         | dplyr using my column names as variables? Why do I get warnings
         | when using & instead of &&?) And then turning to Advanced R as
         | a source of sanity
         | 
         | Another source I would recommend is in fact the R language
         | definition. It's very approachable and you quickly realize that
         | R is pretty simple at the core, buried under piles of cruft
        
         | claytonjy wrote:
         | It's also available online under a creative commons license:
         | https://adv-r.hadley.nz/
        
       | temp8964 wrote:
       | I recently tried to migrate my R code to Julia. Even though I
       | already knew R data.table is faster than DataFrames.jl, I was
       | totally blown away by how slow Julia is. So I quickly gave up. I
       | think I will have to write unavoidable hard loop in cpp, which I
       | really don't want to do...
        
         | ku-man wrote:
         | My experience as well.
         | 
         | In order to get those so much vaunted C-like speeds the Julia
         | fanboys claim, you need lots of contortions and hacks. Off the
         | bat, Julia speeds are mediocre.
        
         | xiaodai wrote:
         | yeah. time to first plot (ttfp) is a real issue.
        
         | cbkeller wrote:
         | There are a few tricks to getting Julia to be actually fast,
         | and while it's not hard _per se_ if you know them all (at least
         | for numerical work), it 's definitely not trivial.
         | 
         | IMHO, you really have to embrace dispatch-oriented programming,
         | and that includes being scrupulous about avoiding _type
         | instability_. You also have to be a bit conscious about
         | allocations, since it 's easy to write Julia code (especially
         | if you're trying to write in a "vectorized" style as is common
         | in R, Python, Matlab) that generates absurd numbers of
         | allocations, which must then be garbage-collected. But also
         | easy to avoid those allocations if you know.
         | 
         | It took about two years, but after picking up more of this, I
         | was eventually able to switch everything my group does from a
         | two-language solution of matlab for scripts and plotting and C
         | (with MPI) for HPC to all-Julia. This [1] was originally
         | targeted at academics making the same switch, but much of it
         | could be relevant to those with an R background as well.
         | 
         | [1]
         | https://github.com/brenhinkeller/JuliaAdviceForMatlabProgram...
        
         | nojito wrote:
         | if it's grouping related check out the collapse package
         | 
         | https://sebkrantz.github.io/collapse/
        
         | glial wrote:
         | For whatever it's worth, I was pleasantly surprised at how easy
         | Rcpp is to use.
        
       | [deleted]
        
       | montmorencie wrote:
       | I am currently a data scientist. Educational background in cs,
       | few hobby web projects and currently updating my skills in java/
       | kotlin with the idea to go in mobile dev.
       | 
       | I use only Python in my work. I learnt R and honestly, it's the
       | same thing as using Python scientific packages. It's mostly
       | vectorized operations, spaghetti functional, if people know how
       | to write functions, code just to get it done. To make graphs, web
       | dashboard(no, we are not doing web dev, it's dark magic
       | frameworks), build machine learning and eventually some reports.
       | Stuff like that.
       | 
       | I do some software engineering but that s optional and I do it
       | because I can. Most data scientists/ ml engineers can't. So you
       | guys are not fair. R and Python in these environments are not
       | even being used for building stuff. This language is not build
       | for that. Hence it's not good from the perspective you look at it
       | from(software engineers).
       | 
       | Unlike Python , R is solely for statistics, data science and
       | probably some basic ml( I haven't tried tho). Also Shiny for
       | building web dashboards. But don't look at the code for
       | dashboards, it's bad, with 'get it done and forget' approach.
       | 
       | That being said. Good luck scrapping, mining, cleaning data with
       | something not called R/Python. Good luck with data engineering.
       | Exploring and visualizing trends. Creating dashboards even.
       | Machine learning. Monitoring and reporting in scientific manner.
       | 
       | Try This type of work with your favorite languages. Then see how
       | quickly and easily it's done with R/Python . Come back and say
       | it's bad language.
       | 
       | It's the same thing as embedded dev complaining about how bad js
       | is for his job. You just totally ignore the context.
        
       | AuthorizedCust wrote:
       | R _plus the tidyverse_ is what makes it a great language. Some
       | tidyverse concepts are being baked into base R, like the pipe,
       | but base R by itself feels hollow.
       | 
       | R's future is inseparable from the tidyverse. We need to just
       | lump them together in any serious discussion of R.
       | 
       | I teach a graduate level R course mainly for economics and
       | statistics majors. (My educational background and career are
       | computer science and technical; while that may stereotype me into
       | Python, I just love R.) I spend the first three weeks on base R,
       | to convey language-essential concepts, like vectorized objects,
       | then the rest of the course is tidyverse-centric.
        
         | CalChris wrote:
         | I spend the first three weeks on base R, to convey language-
         | essential concepts ...
         | 
         | Awesome. When I was at Berkeley, Linear Systems had Matlab
         | assignments. The real engineers (ME, CE, NE, ...) had taken a
         | Matlab class and knew the language. We computer scientists
         | hadn't and suffered horribly as a consequence. Your three weeks
         | of learning base R instead of sink or swim will pay dividends.
        
           | notagoodidea wrote:
           | Got the reverse experience. Trained in Matlab, R and Python,
           | we had to follow a database/application class with computer
           | scientist and software engineers where the big project was to
           | make a basic Android application with sqlite database. That
           | was painfully to be dropped in Android Studio without any
           | Java knowledge. And because the class was focused on
           | database, we had no Java introduction or whatever. We were
           | able to team up with students from the other cursus that
           | already had multiple Java projects and classes under their
           | belt but the pill was bitter swallow.
        
         | kgwgk wrote:
         | Just for the record, many users are happy with base R. There
         | are dozens of us!
         | 
         | R by itself is nice. But the tidyverse is creeping in and
         | bringing dependency hell with it.
         | 
         | http://www.tinyverse.org/
        
         | ProjectArcturis wrote:
         | Personally I much prefer data.table. The syntax is a bit harder
         | to get a handle on, but you can do just about anything with it,
         | and it's much faster at runtime than tidyverse.
        
         | acomjean wrote:
         | For those that don't know the "tidyverse" is a set of R
         | packages that make using base R much easier/better (in my
         | opinion)
         | 
         | https://www.tidyverse.org/ There is also a free ebook that's a
         | good reference.
         | 
         | ggplot2 the plotting package included is pretty awesome
         | 
         | I took a biostatistics class and after the basic examples in R
         | using the tidyverse to analyze data for projects was very
         | helpful.
        
       | baron_harkonnen wrote:
       | Lot's of negative comments in here about learning R from
       | experienced programmers. I've found this is largely because
       | experienced programmers have this unjustified bias that R is some
       | toy language that should be easy to learn and has nothing to
       | teach them. If you approached a language like Rust in the same
       | way you would likely be just as frustrated with it.
       | 
       | Certainly R has its quirks, but most of this comes from being one
       | of the oldest continuing existing programming languages there is.
       | It derives from S which was written 46 years ago. Because of this
       | it has multiple object/class systems reflecting the changing
       | standards for OOP. It's most dominant one, S3, predates Java and
       | therefore uses the Generic Function paradigm of OOP similar to
       | Common Lisp's ClOS. If you're experienced but have never worked
       | with non-Java style OOP you're going to be a bit confused.
       | 
       | R's most important feature, which is well worth studying and
       | mastering for any serious programmer, is that it is a completely
       | vectorized programming language. It borrows this style from APL
       | (though is a million times more readable). Every value in R is a
       | vector and for the vast majority of operations the best approach
       | to solve your problem is by thinking in vector operations. This
       | makes simple things like string formatting with `paste` seem like
       | a confusing nightmare, but there is a real logic there. Functions
       | like `ifelse` can seem strange, and writing C-style code in R,
       | while possible will result in horrible performance.
       | 
       | Once you do learn to think in vectors you realize that R isn't
       | just popular in the stats world because most statisticians
       | haven't seen a "real" programming language, but because you can
       | very rapidly iterate on models. Translating mathematical notation
       | into R, for the experience R programmer, is easier than any other
       | language I've worked with by a long shot.
       | 
       | My advice to any experienced programmer approaching R is to have
       | some respect for the language. Most of the frustrations you'll
       | have aren't because R is a bad language, but because you have
       | less experience than you think and learning R well can expand
       | your programming views in a similar way to Haskell.
        
         | jmcdl wrote:
         | What is the advantage of "thinking in vectors" in R versus
         | "thinking in vectors" using numpy in Python (for example)?
        
           | carljv wrote:
           | There's some overlap, but vectors are essential to the
           | language. Every type of data in R is a vector. There are no
           | scalars, just vectors of length 1. Instead of dictionaries,
           | it's idiomatic in R to use "lists", which are vectors of
           | vectors. Data frames are lists (vectors of vectors)
           | constrained to have equal length element vectors (ie
           | columns). Classes are defined as lists with some metadata
           | (stored in a vector) to direct method dispatch.
           | 
           | It's not just vectorizing mathematical operations a la numpy.
        
         | xiaodai wrote:
         | NSE is what will do experiences programmer's head in. It's an
         | interesting feature.
        
         | foxes wrote:
         | I find it highly unlikely that learning R will expand your
         | programming views anywhere near Haskell.
         | 
         | Haskell is an advanced functional programming language. Most R
         | stuff seems to be incoherent, hard to verify correctness,
         | hacky. It does not seem built on a solid foundation like
         | Haskell. Truly everything being a vector is not a huge take
         | away.
         | 
         | As for "here's just a bunch of examples", well that seems sort
         | of a brute force way to learn something. I agree examples are
         | important, but they are usually to back something up. Having to
         | reverse engineer some fundamental ideas out of just examples is
         | more work. Seems like this is just promoting more hackyness.
         | Seems like its training a neural network instead of actually
         | understanding something.
        
           | datastoat wrote:
           | R certainly expanded my programming views! Haskell did too,
           | but the lessons of Haskell didn't stick the way that R's
           | lessons did. Here are some of the things I learnt from R
           | (though they can be found in other languages of course).
           | 
           | * Multiple dispatch. Before learning R, I knew about
           | polymorphism in Java and C++, and multiple dispatch in R
           | broadened my mind and turns out to be very handy.
           | 
           | * The idea of "frames". In R, when you invoke
           | `lm(height~sex*age, data=mydataframe)`, the first argument
           | (the formula) doesn't get evaluated until the lm command asks
           | it to be evaluated, and lm can set up the "frame" for that
           | evaluation, i.e. the place where variables are looked up,
           | however it likes. In fact, lm sets it up to include variables
           | from both the scope in which you invoked lm, and also from
           | mydataframe. This is what makes R so wonderfully concise for
           | modelling in data science, compared to e.g. Python + pandas.
           | I knew about frames from interactive debuggers, but until R
           | it never occurred to me that the programming language could
           | manipulate them.
           | 
           | * "Held" arguments. In R, when you invoke `plot(x, y1+y2)`,
           | it doesn't just evaluate the arguments and then call the plot
           | function -- it leaves the arguments unevaluated, and invokes
           | plot. Plot then (1) decides when to evaluate them, (2) gets
           | access to the language expression `y1+y2`, which means that
           | it can print "y1+y2" on the plot label, (3) it can even
           | define extra variables to include in the scope when y1+y2
           | gets evaluated. (I knew about held arguments earlier, from
           | Mathematica, but they only clicked when I read the R
           | documentation.)
           | 
           | I've read that R is a descendent of Scheme, and that that's
           | where it gets all its "manipulate language expressions" from.
           | I don't know any Scheme, nor Lisp, and I should definitely
           | learn them -- but in the meantime, my experience has been
           | that R's ability to manipulate language expressions is what
           | makes it such a wonderful sweet spot as a data modelling
           | language. I mostly use Python + pandas nowadays, but it feels
           | such a slog in comparison.
        
             | bookofsand wrote:
             | Syntactic forms ('frames', 'held arguments') are reasonably
             | useful, but have two flaws:
             | 
             | A. Understanding how to _implement_ functions using
             | syntactic forms is a steep learning curve. I remember
             | running out of dplyr and having to implement a udf. Fairly
             | unpleasant experience (enquo, !!, perhaps other unusual
             | constructs). Felt like programming C macros.
             | 
             | B. "A function can decide where variables are looked up
             | however it likes" is a significant obstacle in
             | understanding how even basic constructs like function calls
             | actually work. There is a non-trivial amount of hard-to-
             | debug dark magic lurking behind every corner.
             | 
             | A middle ground has never been achieved. For example,
             | `plot(expr(x), expr(y1 + y2))`, where the system limits the
             | dark magic to explicit uses of the `expr()` construct, and
             | `expr(x)` always means `{vars: vars(x), expr: (vars(x)) =>
             | x}`. Instead of patching interpreter environments, simply
             | call a lambda function.
        
               | datastoat wrote:
               | I completely agree about the steep learning curve and the
               | feeling of dark magic -- how many times have I had to
               | relearn what deparse(substitute(x)) means -- but oh the
               | satisfaction of broadening my programming horizons. For
               | me it didn't feel like C macros, it felt like "This must
               | be what it feels like to have the power of Lisp"!
               | 
               | That's the weird thing about R. All this dark magic is
               | hiding under the hood, but the core R team hid it so
               | deftly that to the casual statistician it's a
               | straightforward data modelling language that "just
               | works". I'm not sure that it's possible to get rid of the
               | dark magic and retain that data-modeller friendliness.
        
           | CornCobs wrote:
           | This sounds like a highly biased perspective of both R, and
           | what it means for a language to be respectable.
           | 
           | If you define a language that "will expand your programming
           | views" as one that can verify correctness easily then yeah, R
           | is terrible at that. But so are many languages that are as
           | flexible as R. Would you have the same opinion of FORTH? Or
           | LISP? or TCL? I think these languages definitely count as
           | "hacky" languages and yet they don't seem to draw the same
           | derision as R (in my experience)
        
         | jghn wrote:
         | S4 is the one that is reminiscent of CLOS. Dylan was explicitly
         | cited as an inspiration [0]
         | 
         | [0] There's an old article from Robert Gentleman named
         | something like "S4 objects in 5 pages, more or less" but I
         | can't find it. However, there's a mention of Dylan and CLOS
         | here:
         | https://genomebiology.biomedcentral.com/articles/10.1186/gb-...
         | 
         | EDIT: Here's the document I was looking to cite:
         | https://www.stat.auckland.ac.nz/S-Workshop/Gentleman/S4Objec...
        
           | kgwgk wrote:
           | Arguably the S3 object system is also "functional" in spirit,
           | even if it's single-dispatch.
           | 
           | https://arxiv.org/pdf/1409.3531.pdf
           | 
           | Object-Oriented Programming, Functional Programming and R
           | (John M. Chambers)
           | 
           | "Chambers and Hastie (1992), in the discussion of classes and
           | methods, noted that S differed from other OOP languages
           | because of its functional programming style. In fact, this
           | version of functional OOP finessed the resulting distinction
           | from encapsulated OOP in two ways. First, the methods were
           | dispatched according to a single argument, the first formal
           | argument of the generic function in principle. As a result,
           | the methods were unambiguously associated with a single
           | class, as they would be in encapsulated OOP. Methods were
           | actually dispatched on either argument to the usual binary
           | operators, but a number of encapsulated OOP languages do the
           | same, under the euphemism of operator overloading.
           | 
           | "Second, the question of whether methods belonged to a class
           | or a function was avoided by not having them belong to
           | either. Methods were assigned as ordinary functions and
           | identified by the pattern of their name: "function.class". In
           | any case, there were no class objects and generic functions
           | were ordinary functions that invoked UseMethod() to select
           | and call the appropriate method. Neither the function nor the
           | class was able to own the methods."
        
         | mraza007 wrote:
         | Okay I'm not sure why R is getting a lot of Hate. After all its
         | a programming language that gets the job done and its very
         | popular in finacial industry especially among Risk Modelers,
         | Quants and i have even seen this being used in analytics space
         | within financial industry
        
         | dwrodri wrote:
         | I have used R a few times now, and I definitely agree with the
         | statement that thinking in vectors is central to writing good R
         | scripts. However, as a computer engineer and performance
         | junkie, its unfortunate that it doesn't get as much attention
         | as other "STEM DSLs" (Julia, MATLAB) when it comes to
         | performance.
         | 
         | The same could technically be argued for Python; The current
         | approaches to dealing with high-performance compute workloads
         | either rely on JITing (e.g. Numba, Tensorflow/JAX's XLA) or
         | bridging over to giant binary blobs through the CPython's well-
         | supported C interop.
        
         | ttz wrote:
         | To add: in my experience, programmers who denigrate R think of
         | it as a software engineering language. Not all programming
         | languages are meant to be languages for building large scale
         | applications. Programming is not just about building business
         | applications, it's about getting a computer to do things.
         | 
         | And R excels at doing statistics and data science. If you keep
         | that mindset, I believe many will find that its an excellent
         | programming language.
        
         | kgwgk wrote:
         | > S3, predates Java and therefore uses the Generic Function
         | paradigm of OOP similar to Common Lisp's ClOS
         | 
         | S3 appeared a few years before Java but there were other OOP
         | languages like C++ around at the time.
        
         | auto wrote:
         | Honestly, I'd argue that the issue I had in my experience in R
         | wasn't that I myself wasn't giving it the respect it deserved,
         | but rather that the course constructors for my degree didn't
         | give it that respect.
         | 
         | We were essentially told "Install R Studio, then just copy and
         | paste these library imports and you're good to go". Your
         | description of a vectorized language makes total sense, and
         | that single paragraph is more of an intro to R than we ever got
         | in class.
         | 
         | That said, I think the reason this happened with this class in
         | particular is the viewpoint of (what I perceive) as the
         | majority user's of R. Mathematics focused researchers who never
         | learned the language, they just have done enough to get by and
         | don't _really_ appreciate the underpinnings, or the nuances of
         | running an environment on a machine that isn 't theres.
         | 
         | I can't entirely absolve myself of blame though, once I
         | realized what was happening I should have gone and done some
         | more foundational R learning, but at that point I just wanted
         | to be done with the class.
        
           | hdkrgr wrote:
           | I wouldn't say this is true for "the majority user's of R" at
           | all.
           | 
           | But for "the majority of professors who tangentially use R
           | code in classes on
           | statistics/bioinformatics/economics/finance (anything not
           | explicitly about R and/or Data Science best practices)"?
           | Absolutely.
           | 
           | The R code you see in industry (or academic labs where
           | someone cares about modern R) looks vastly different from
           | those script examples in college that are most people's first
           | impression of the language.
        
       | Closi wrote:
       | I currently do lots of data analysis in excel and know basic
       | Python. I would be interested to get opinions on if R is better
       | suited to data analysis than python if that's all I was doing.
        
         | trailrunner46 wrote:
         | Python is certainly more popular and for job prospects I always
         | tell that to newer data folks. That being said if you want to
         | load in some data do some SQL like manipulation, run some stats
         | and make a graph or output a report I would argue R is way
         | better experience than Python but that's much more about the
         | package ecosystem and less a comment on the language. Dplyr is
         | just more friendly to use than pandas (often 3-5 ways to do
         | something and as a beginner this can be disorienting) and
         | ggplot2 vs matlibplot. For interactive graphs you are probably
         | going to use plotless anyway from both languages.
         | 
         | One other thing I would mention is knowing SQL well is the most
         | translatable skill. A lot of dplyr and pandas are doing SQL
         | like operations (in fact dbplyr will generate SQL equivalent
         | commands for your dplyr code for various backends).
         | 
         | In summary know how to manipulate data in SQL then pick a
         | language (because you will need to do some IO/reporting stuff
         | outside just data work) where the ecosystem of packages feels
         | user friendly to you and your work flow and roll with that.
        
         | scottmcdot wrote:
         | I'm strong in R, Python and Excel. I'd say anyone transitioning
         | from Excel would be better off using R first. Because the R
         | Integrated Development Environment (IDE) RStudio is fantastic
         | compared to any Python IDEs that you can actually figure out
         | how to install. The IDE makes it easy to visualise what you're
         | actually doing to your dataframes by using multiple table tabs.
        
       | auto wrote:
       | I consider myself a pretty experienced developer/software
       | engineer/whatever. Decade into my career, started in iOS in
       | Obj-C, learned Swift along the way, eventually migrated to the
       | backend with Java/SQL, and finally found myself where I really
       | wanted to be, embedded doing C/C++ work for a household name.
       | 
       | That said, about halfway through my master's about two years ago,
       | I found myself in an intro to data mining course that was sold as
       | an "we will teach you R". I had heard non-programmer math friends
       | talk about what they had accomplished in R, and was excited to
       | dive in.
       | 
       | Now, it didn't help that the class ended up being _heavy_ on the
       | statistics side (which despite a math /cs double major, stats was
       | never my thing), but the actually learning R part was 99% left as
       | an exercise to us alongside of the classwork required.
       | 
       | I can say without a doubt, learning R is the worst programming
       | experience I've ever had. Our assignments would give some high
       | level direction on which libraries to use, but getting the right
       | libraries setup and in the environment was just an absolute
       | nightmare. All I remember from that class is hours every week
       | googling unreadable python pukes from R studio (because
       | apparently everything data mining/ML related in R is actually
       | just python), and then spending an hour or less actually doing
       | the statistics work.
       | 
       | I feel bad because I feel like I was setup to not be able to give
       | it a fair chance, but if that's what non-programmer math types
       | are subjected to when told "you need to do some programming for
       | your job", I can understand the apprehension.
        
         | CapmCrackaWaka wrote:
         | > All I remember from that class is hours every week googling
         | unreadable python pukes from R studio (because apparently
         | everything data mining/ML related in R is actually just python)
         | 
         | I'm curious what libraries you were using, I've had to go into
         | the source of quite a few popular libraries and I don't think
         | I've ever encountered Python. Lots of C and it's derivatives,
         | lots of Stan and FORTRAN, but I don't think I've seen Python
         | yet.
        
           | auto wrote:
           | Just going back and looking through some of the homework,
           | here's what I'm seeing imported:
           | 
           | readr, caret, lattice, ggplot2, RColorBrewer, mlbench,
           | ElemStatLearn, klaR, dplyr, arules, arulesViz, tensorflow
        
             | malshe wrote:
             | In this list only tensorflow requires Python
        
               | BoiledCabbage wrote:
               | And I haven't used it, but there is Torch for R as the
               | alternative which isn't supposed to have any dependency
               | on Python.
               | 
               | https://torch.mlverse.org/
        
             | disgruntledphd2 wrote:
             | Yeah tensor flow is python, and is a nightmare to work
             | with.
             | 
             | That's not Rs fault though
        
         | temp8964 wrote:
         | What you described is a common phenomenon in stats / data
         | mining / psychometrics / econometrics. They all need students
         | to use certain programing languages, such as SAS / Stata / R /
         | Python, but they don't really spend time to teach those
         | programing languages. I guess this could also be the same for
         | MATLAB in math?
        
           | pacbard wrote:
           | The problem is that there are no incentives in learning how
           | to program for people in those fields. Most people just get
           | their code to "work" (i.e., output the analyses that they
           | want) without really wanting to know how it works. Most of
           | the time code is passed from grad student to grad student and
           | modified to make it work for the specific analysis. As a
           | result, you get Frankenstein code that somewhat works but
           | that is good enough for writing a results section of a paper.
           | 
           | There are people that know how to code but those are few and
           | in-between. Usually they are pushed out of academic positions
           | because there are very few ways to fund work to develop
           | scientific code.
        
         | warlog wrote:
         | Dataframes...python has R to thank...too bad they suck compared
         | to R.
        
         | malshe wrote:
         | > because apparently everything data mining/ML related in R is
         | actually just python
         | 
         | I think it is apparent only to you. The most popular package
         | for ML in R is `caret` and it has nothing to do with Python.
         | Similarly, `mlr` also has no Python. In fact, except for
         | tensorflow and keras, I can't think of any major ML package
         | that needs Python. Even torch package which brings pytorch to R
         | doesn't need Python
         | (https://cran.r-project.org/web/packages/torch/index.html).
         | 
         | What confuses me even more is that you were learning statistics
         | but using R packages that use Python as backend. In my
         | experience, almost all the new statistics and econometrics
         | methods are first released as R packages by the researchers.
         | Can you name any data mining R packages that you used that
         | required Python? I am really curious to know.
        
         | gonzo41 wrote:
         | I pretty much had the same experience with R. I've never been
         | able to get really productive with it as a software developer.
         | I feel like I know too much and can't break from old habbits.
         | It's very academic which I feel really holds it back from
         | software devs and also captures non developers in it's web.
         | 
         | Whilst it's certainly got some runs on the board. I think
         | having data science folk work in more standard languages would
         | actually make supporting them and their needs easier.
        
           | notagoodidea wrote:
           | Funny, I'll never say that R itself is very academic. The
           | environment maybe, the users sure but the language is very
           | much an mutated Lisp with vectorized operation. Same for the
           | python stuff, I never hit that problem working a lot with it
           | but most of the time when I wanted to check a lib, I landed
           | in C++ aka I am not sure to understand how to read the code.
           | 
           | What in R made it feel academic for you?
        
         | beforeolives wrote:
         | > because apparently everything data mining/ML related in R is
         | actually just python
         | 
         | Everything? I'm just wondering what you've been doing exactly.
         | I know that Tensorflow in R is just a wrapper on top of Python,
         | not sure what else. If you're doing any deep learning, then
         | going straight to Python is certainly much better than using R.
         | For most other things it's not quite as clear of a decision.
        
           | _Wintermute wrote:
           | Hilariously there's an argparse library for R, which has
           | python as a dependency.
        
       ___________________________________________________________________
       (page generated 2021-06-05 23:01 UTC)