[HN Gopher] Vapour: A typed superset of the R programming language
       ___________________________________________________________________
        
       Vapour: A typed superset of the R programming language
        
       Author : johncoene
       Score  : 61 points
       Date   : 2024-09-16 20:24 UTC (3 days ago)
        
 (HTM) web link (vapour.run)
 (TXT) w3m dump (vapour.run)
        
       | brudgers wrote:
       | [flagged]
        
         | johncoene wrote:
         | First, how is that "giving myself an excuse"? Second, it's a
         | total non sequitur, and even then, it's a day old has it
         | broken?
        
           | brudgers wrote:
           | _the syntax might change, things will break, expect bugs._
           | 
           | Bugs are normal software development.
           | 
           | Changing syntax and breaking things make work for everyone
           | else for the convenience of developers. Reliability is what
           | makes a tool a tool.
        
             | Terretta wrote:
             | > _Changing syntax and breaking things make work_
             | 
             | How else might one explore a new language (vapour) in the
             | open among interested like-minded developers seeking to
             | iterate on a tool found lacking (R)?
             | 
             | Changing and iterating things _makes_.
        
               | ausbah wrote:
               | they aren't wrong. backwards compatibility is a suppose
               | to one of the first promises any mature programming
               | languages. unless you make it explicit via noting
               | breaking changes in major version updates (1.X.X -->
               | 2.X.X) or the language is purely for R&D and makes no
               | guarantee of anything
        
       | lloydatkinson wrote:
       | This looks nice. I find R to be an unreadable mess. The comprison
       | shows a great improvement.
        
         | qudat wrote:
         | The default IDE workflow is like a python "notebook" where code
         | can and is run in whatever order the creator wants. Every R
         | code I've read treats it as such and it results in an absolute
         | mess to read and manage.
        
       | andrewla wrote:
       | As an R programmer the examples given on the landing page seem
       | very foreign to me -- you are almost always writing vectorized
       | code in R, so I would think that would be front and center.
       | let x: int = 1
       | 
       | Is this a list of ints or a pure singleton? R doesn't have scalar
       | types, so it would seem the former, but the example makes it
       | unclear. Later in the docs it makes it clearer:
       | let x: int = (1, 2, 3)
       | 
       | And this, as an R developer, I can definitely get behind -- the
       | c(...) syntax is always awkward and having a native syntax for
       | static arrays is a welcome change.
        
         | juujian wrote:
         | Yeah, it's not an idiomatic example. I like the idea, but this
         | makes me worry that the project does not have the right
         | priorities. I.e., supporting my use cases :D
        
       | ecshafer wrote:
       | I think this is a great idea for the project. I don't dislike the
       | syntax, but the syntax seems more ML than R to me. I think
       | keeping the syntax more R-like could be worthwhile.
        
       | clircle wrote:
       | Statisticians and researchers, is this helpful?
        
         | tech_ken wrote:
         | I would say that vast majority of type problems in data
         | science/stats workflows come from data tables "trojan-horsing"
         | type or missing data issues, rather than type problems strictly
         | at the code level. Type annotations won't help you when your
         | upstreams decide they want to change the format of their year-
         | quarter strings without telling you.
        
           | dragonwriter wrote:
           | > Type annotations won't help you when your upstreams decide
           | they want to change the format of their year-quarter strings
           | without telling you.
           | 
           | IME with both Python and JS/TS, it _helps_ a lot (which is
           | different than completely solving the problem), for reasons
           | which should generalize to other typing add-ons /supersets
           | for untyped languages. Typing your code forces validations at
           | the boundaries, which obviously doesn't stop upstream sources
           | from messing with formats but it does mean that you are much
           | more likely to catch it at the boundary rather than having
           | weird breakages deep in your code that you have to trace back
           | to bad upstream data.
        
             | tech_ken wrote:
             | Is the idea that if my year_quarter parser is properly
             | typed then it should detect the format change and throw an
             | error? (kind of a silly example, just trying to be
             | illustrative)
        
               | Nadya wrote:
               | Yes. Your type can encode what the proper format for a
               | string should be and if a string is passed that does not
               | meet that format it will throw an error allowing you to
               | make any necessary adjustments to handle the new date
               | year_quarter format.
               | 
               | eg. `type DateString = ${number}/${number}/${number}`
               | 
               | A super naive check for using "/" instead of "-" as the
               | separator character for a date formatted as a string. If
               | a date is provided with some other separator character it
               | will throw an error. If my function takes a DateString
               | the string must be formatted correctly to pass the type
               | check. Obviously this isn't enough (YYYY/MM/DD is
               | different than DD/MM/YYYY) but the intention was to show
               | a way to enforce something via types rather than
               | validating a string to check that your have a DateString
               | you can simply enforce that you have one.
        
               | dllthomas wrote:
               | "Typing your code forces validations at the boundaries"
               | was too strong because of course you can type your code
               | without actually doing the validations, but you _can_
               | structure your code such that that won 't happen
               | accidentally: https://lexi-
               | lambda.github.io/blog/2019/11/05/parse-don-t-va...
               | 
               | The idea is that checking should be the only way of
               | making a value of the type. That prevents you from
               | forgetting to check when you turn some broader type (say,
               | string) into the more narrow one (date, in this case).
        
               | dragonwriter wrote:
               | > "Typing your code forces validations at the boundaries"
               | was too strong because of course you can type your code
               | without actually doing the validations
               | 
               | Yeah, of course you can cheat the typechecking in the
               | code at the boundary in several ways, or convert from
               | wire format to internal types in a way which plugs in
               | type-valid defaults for bad data rather than erroring, or
               | just use too-broad internal types to start with (you can
               | have "stringly-typed code"), and fail to help the
               | problems. But if you use the types that make sense
               | internally for what the code is doing, than conversion
               | including validation at the boundary becomes the path of
               | least resistance in most cases. "Forces" is not strictly
               | true, but my experience is that adding types does create
               | a strong push for boundary validation.
        
         | levocardia wrote:
         | Not really, because honestly a lot of us who came into
         | programming via research never learned typed languages or unit
         | tests or any of those best practices - we were just hacking
         | around in MATLAB, R, or Python from the start. What I really
         | need is a seamless and easy way to run statistical models that
         | can only be fit in R, but from Python or Node. There are
         | several categories of statistical modeling where R completely
         | blows python out of the water, and it's incredibly wasteful
         | (and error-prone) to try to re-implement these yourself in
         | Python.
        
           | bachmeier wrote:
           | rpy2 can be used to call R from Python:
           | https://rviews.rstudio.com/2022/05/25/calling-r-from-
           | python-...
           | 
           | reticulate works for going in the other direction:
           | https://rstudio.github.io/reticulate/
           | 
           | With the good interoperability these days, let's stop
           | rewriting functionality in other languages. If the
           | interoperability is no good, work on fixing that, please.
        
         | ellisv wrote:
         | It is probably helpful in some cases and unhelpful in others. R
         | uses multiple dispatch, so calling `foo` on different types can
         | produce different output. It isn't clear to me how Vapour
         | handles this. In general though, folks are passing around
         | data.frame or similar objects.
        
       | mushufasa wrote:
       | The main reason we shy away from R for production apps is all the
       | silent errors where things seem to succeed while being horribly
       | wrong if you take a look. Typing would certainly help mitigate
       | that.
        
       | uptownfunk wrote:
       | Will this fix the problems it claims to? The power of R is the
       | rich package ecosystem. It caters to people who don't want to
       | think about engineering concerns but want a fast way to access
       | the powers of computation rather than building a scalable system,
       | two very different things. It excels at the former. A new
       | language will not fix this, because this type of thinking has
       | infected the entire package ecosystem. Frankly with code
       | translation you probably don't need a new language. Prototype in
       | R and code translate to Python or whatever you want to use in
       | prod. Or frankly just do code gen directly in Python so you can
       | skip having to confirm if the results match.
       | 
       | To be clear, I love R, it excels in prototyping but I have seen
       | too many real world struggles of folks trying to move to prod
       | that I would say save it for EDA projects and one time analyses.
        
         | _Wintermute wrote:
         | I often find I want a specific statistical package that's only
         | in R, but want a more general purpose language for all the
         | other stuff that's involved (parsing, filesystem stuff, error
         | handling etc). I don't want to risk re-writing the statistical
         | methods and all their dependencies in the sensible language, so
         | I end up calling R only for the statistical methods, but I can
         | see this as an alternative.
        
         | joshdavham wrote:
         | > A new language will not fix this, because this type of
         | thinking has infected the entire package ecosystem.
         | 
         | Do you think the culture of the package ecosystem could
         | possibly change in the future?
        
       | bachmeier wrote:
       | I took a couple stabs at this long ago (even before there was a
       | Typescript for inspiration). The first attempt was to add types
       | to the syntax of R, but that would have required a lot more time
       | than I had. Properly catching errors is a massive undertaking
       | requiring a lot of background I don't have. The second attempt
       | was to add syntax for types to R and then compile the code to
       | another language. That's easy to do, but really boring, so I
       | wasn't able to stick with it. It comes with the advantages of
       | static typing and R code that runs very fast. I gave up and went
       | with embedding R inside a statically typed language. Very happy
       | with my choice.
       | 
       | Good luck to the authors of this. I believe it solves an
       | important problem for R package authors and others wanting to
       | write bigger programs. It's hard to argue with the benefits of
       | static typing for this type of work.
        
       | layer8 wrote:
       | Sounds like vapourware. ;)
        
         | joshdavham wrote:
         | I mean, there is an alpha you can download. If it was just a
         | landing page and an email waitlist, then that would be
         | vaporware.
        
           | layer8 wrote:
           | I was commenting on the naming choice.
        
       | joshdavham wrote:
       | Looks interesting! What types of programs do you think people
       | would write in this language? I don't see an obvious need for
       | traditional R programs which are usually just scripts for working
       | with data, but maybe people could write R packages in this
       | language?
        
       | johnnybzane wrote:
       | How do I find jobs that use the R language? It's impossible to
       | search the letter "R" on linkedIn or Indeed without getting a
       | bunch of unrelated job postings
       | 
       | "R" is the only programming language I know and I can't find a
       | job that uses a R because job search engines don't allow you to
       | sort by skill
       | 
       | "R language" is the closest substitute on linkedin but the
       | results are still a jumbled mess of jobs, some looking moreso for
       | other skills (SQL/Python)
       | 
       | I know R-heavy jobs exist but finding them on LinkedIn is
       | virtually impossible
        
         | clircle wrote:
         | Why would you do that? R is a just a tool for doing statistics
         | or research. You need to search for jobs in your subject area
         | like "ecologist", "econometrician", "green energy reseacher",
         | etc.
        
           | johnnybzane wrote:
           | There are hedge funds that like hiring people who know how to
           | manipulate data in R using dplyr and data.table
           | 
           | Looking for a similar job where my desire/interest to spend
           | all day in Rstudio is a value add to a business
        
             | nickforr wrote:
             | With apologies if this breaks guidelines:
             | https://hymans.current-
             | vacancies.com/Jobs/Advert/3525353?cid...
        
         | Balladeer wrote:
         | How does "R language" compare to searching for one of the
         | popular R packages? Searching for "tidyverse", "dplyr", or
         | "ggplot" seems to get a good chunk of hits. That being said,
         | yeah, there does seem to be a trio of skills that often go
         | together (R, python, SQL)
        
           | johnnybzane wrote:
           | If you search specific packages on LinkedIn the number of
           | jobs is usually very small
           | 
           | E.g. tidyverse or dplyr is like 20-40 jobs. ggplot is 88.
           | There's definitely way more than 100+ companies looking for
           | R-heavy users.
        
         | kagevf wrote:
         | I tried using "r" (with quotes) on indeed, and got some hits
         | where R was listed as one of the necessary skills.
        
         | dkga wrote:
         | Perhaps #rlang would work? Or #tidyverse if you are feeling
         | tibblish :)
        
       | russellbeattie wrote:
       | This isn't specifically about Vapour, just about what's become
       | the common way to specify types.
       | 
       | I know this is totally bike shedding, semantics, vi vs Emacs,
       | BigEndian vs LittleEndian and it's too late now to affect
       | anything, but to me using a colon _after_ the variable is just
       | wrong!
       | 
       | let x : int = 1
       | 
       | func add(x: int, y: int): int { return x + y }
       | 
       | I see that and it looks like int = 1 and the function's return
       | type is totally lost.
       | 
       | This seems completely backwards to me. Maybe I'm just used to the
       | way C did it, but the variable modifiers should come first.
       | 
       | let int x = 1
       | 
       | func int add(int x, int y) { return x + y }
       | 
       | Why we reversed it and added in the colon just doesn't make much
       | sense to me.
        
       | nerdponx wrote:
       | I have some questions that are not answered by the homepage.
       | 
       | 1) How does this work with function parameters that are intended
       | to be captured unevaluated with substitute()? Do you type the
       | input as "any" and document separately that the parameter is kept
       | "unevaluated" as a symbol/name or call?
       | 
       | 2) How does this work with existing untyped R code? Does it at
       | least include types for the standard library (or some subset
       | thereof?)
       | 
       | 3) Is there any type inference, or does it require explicit type
       | annotation everywhere?
       | 
       | 4) How do you propose to handle NA (which can appear "within" any
       | typed vector)? Does the compiler support refinement types? If
       | not, how does checking for and preventing nullability work, when
       | checking for NA values requires a runtime check?
       | 
       | 5) How do data frames work? Are they typed like structs?
       | 
       | 6) Which object systems does it support, if any? S3, S4,
       | Reference Classes, or the 3rd-party R6?
       | 
       | As much as I like static types, I feel like R is maybe the
       | language where I need or want them the _least_. How often do you
       | really run into a situation where you pass a character vector to
       | a function that requires a numeric vector and it crashes your
       | program?
       | 
       | 99% of the time what you really want is _known-valid data frames_
       | for data processing, and _statically-sized arrays_ for math
       | stuff.
        
       | condwanaland wrote:
       | Cool idea! Looking forward to exploring it this weekend
        
       ___________________________________________________________________
       (page generated 2024-09-19 23:00 UTC)