[HN Gopher] Vapour: A typed superset of the R programming language
___________________________________________________________________
Vapour: A typed superset of the R programming language
Author : johncoene
Score : 61 points
Date : 2024-09-16 20:24 UTC (3 days ago)
(HTM) web link (vapour.run)
(TXT) w3m dump (vapour.run)
| brudgers wrote:
| [flagged]
| johncoene wrote:
| First, how is that "giving myself an excuse"? Second, it's a
| total non sequitur, and even then, it's a day old has it
| broken?
| brudgers wrote:
| _the syntax might change, things will break, expect bugs._
|
| Bugs are normal software development.
|
| Changing syntax and breaking things make work for everyone
| else for the convenience of developers. Reliability is what
| makes a tool a tool.
| Terretta wrote:
| > _Changing syntax and breaking things make work_
|
| How else might one explore a new language (vapour) in the
| open among interested like-minded developers seeking to
| iterate on a tool found lacking (R)?
|
| Changing and iterating things _makes_.
| ausbah wrote:
| they aren't wrong. backwards compatibility is a suppose
| to one of the first promises any mature programming
| languages. unless you make it explicit via noting
| breaking changes in major version updates (1.X.X -->
| 2.X.X) or the language is purely for R&D and makes no
| guarantee of anything
| lloydatkinson wrote:
| This looks nice. I find R to be an unreadable mess. The comprison
| shows a great improvement.
| qudat wrote:
| The default IDE workflow is like a python "notebook" where code
| can and is run in whatever order the creator wants. Every R
| code I've read treats it as such and it results in an absolute
| mess to read and manage.
| andrewla wrote:
| As an R programmer the examples given on the landing page seem
| very foreign to me -- you are almost always writing vectorized
| code in R, so I would think that would be front and center.
| let x: int = 1
|
| Is this a list of ints or a pure singleton? R doesn't have scalar
| types, so it would seem the former, but the example makes it
| unclear. Later in the docs it makes it clearer:
| let x: int = (1, 2, 3)
|
| And this, as an R developer, I can definitely get behind -- the
| c(...) syntax is always awkward and having a native syntax for
| static arrays is a welcome change.
| juujian wrote:
| Yeah, it's not an idiomatic example. I like the idea, but this
| makes me worry that the project does not have the right
| priorities. I.e., supporting my use cases :D
| ecshafer wrote:
| I think this is a great idea for the project. I don't dislike the
| syntax, but the syntax seems more ML than R to me. I think
| keeping the syntax more R-like could be worthwhile.
| clircle wrote:
| Statisticians and researchers, is this helpful?
| tech_ken wrote:
| I would say that vast majority of type problems in data
| science/stats workflows come from data tables "trojan-horsing"
| type or missing data issues, rather than type problems strictly
| at the code level. Type annotations won't help you when your
| upstreams decide they want to change the format of their year-
| quarter strings without telling you.
| dragonwriter wrote:
| > Type annotations won't help you when your upstreams decide
| they want to change the format of their year-quarter strings
| without telling you.
|
| IME with both Python and JS/TS, it _helps_ a lot (which is
| different than completely solving the problem), for reasons
| which should generalize to other typing add-ons /supersets
| for untyped languages. Typing your code forces validations at
| the boundaries, which obviously doesn't stop upstream sources
| from messing with formats but it does mean that you are much
| more likely to catch it at the boundary rather than having
| weird breakages deep in your code that you have to trace back
| to bad upstream data.
| tech_ken wrote:
| Is the idea that if my year_quarter parser is properly
| typed then it should detect the format change and throw an
| error? (kind of a silly example, just trying to be
| illustrative)
| Nadya wrote:
| Yes. Your type can encode what the proper format for a
| string should be and if a string is passed that does not
| meet that format it will throw an error allowing you to
| make any necessary adjustments to handle the new date
| year_quarter format.
|
| eg. `type DateString = ${number}/${number}/${number}`
|
| A super naive check for using "/" instead of "-" as the
| separator character for a date formatted as a string. If
| a date is provided with some other separator character it
| will throw an error. If my function takes a DateString
| the string must be formatted correctly to pass the type
| check. Obviously this isn't enough (YYYY/MM/DD is
| different than DD/MM/YYYY) but the intention was to show
| a way to enforce something via types rather than
| validating a string to check that your have a DateString
| you can simply enforce that you have one.
| dllthomas wrote:
| "Typing your code forces validations at the boundaries"
| was too strong because of course you can type your code
| without actually doing the validations, but you _can_
| structure your code such that that won 't happen
| accidentally: https://lexi-
| lambda.github.io/blog/2019/11/05/parse-don-t-va...
|
| The idea is that checking should be the only way of
| making a value of the type. That prevents you from
| forgetting to check when you turn some broader type (say,
| string) into the more narrow one (date, in this case).
| dragonwriter wrote:
| > "Typing your code forces validations at the boundaries"
| was too strong because of course you can type your code
| without actually doing the validations
|
| Yeah, of course you can cheat the typechecking in the
| code at the boundary in several ways, or convert from
| wire format to internal types in a way which plugs in
| type-valid defaults for bad data rather than erroring, or
| just use too-broad internal types to start with (you can
| have "stringly-typed code"), and fail to help the
| problems. But if you use the types that make sense
| internally for what the code is doing, than conversion
| including validation at the boundary becomes the path of
| least resistance in most cases. "Forces" is not strictly
| true, but my experience is that adding types does create
| a strong push for boundary validation.
| levocardia wrote:
| Not really, because honestly a lot of us who came into
| programming via research never learned typed languages or unit
| tests or any of those best practices - we were just hacking
| around in MATLAB, R, or Python from the start. What I really
| need is a seamless and easy way to run statistical models that
| can only be fit in R, but from Python or Node. There are
| several categories of statistical modeling where R completely
| blows python out of the water, and it's incredibly wasteful
| (and error-prone) to try to re-implement these yourself in
| Python.
| bachmeier wrote:
| rpy2 can be used to call R from Python:
| https://rviews.rstudio.com/2022/05/25/calling-r-from-
| python-...
|
| reticulate works for going in the other direction:
| https://rstudio.github.io/reticulate/
|
| With the good interoperability these days, let's stop
| rewriting functionality in other languages. If the
| interoperability is no good, work on fixing that, please.
| ellisv wrote:
| It is probably helpful in some cases and unhelpful in others. R
| uses multiple dispatch, so calling `foo` on different types can
| produce different output. It isn't clear to me how Vapour
| handles this. In general though, folks are passing around
| data.frame or similar objects.
| mushufasa wrote:
| The main reason we shy away from R for production apps is all the
| silent errors where things seem to succeed while being horribly
| wrong if you take a look. Typing would certainly help mitigate
| that.
| uptownfunk wrote:
| Will this fix the problems it claims to? The power of R is the
| rich package ecosystem. It caters to people who don't want to
| think about engineering concerns but want a fast way to access
| the powers of computation rather than building a scalable system,
| two very different things. It excels at the former. A new
| language will not fix this, because this type of thinking has
| infected the entire package ecosystem. Frankly with code
| translation you probably don't need a new language. Prototype in
| R and code translate to Python or whatever you want to use in
| prod. Or frankly just do code gen directly in Python so you can
| skip having to confirm if the results match.
|
| To be clear, I love R, it excels in prototyping but I have seen
| too many real world struggles of folks trying to move to prod
| that I would say save it for EDA projects and one time analyses.
| _Wintermute wrote:
| I often find I want a specific statistical package that's only
| in R, but want a more general purpose language for all the
| other stuff that's involved (parsing, filesystem stuff, error
| handling etc). I don't want to risk re-writing the statistical
| methods and all their dependencies in the sensible language, so
| I end up calling R only for the statistical methods, but I can
| see this as an alternative.
| joshdavham wrote:
| > A new language will not fix this, because this type of
| thinking has infected the entire package ecosystem.
|
| Do you think the culture of the package ecosystem could
| possibly change in the future?
| bachmeier wrote:
| I took a couple stabs at this long ago (even before there was a
| Typescript for inspiration). The first attempt was to add types
| to the syntax of R, but that would have required a lot more time
| than I had. Properly catching errors is a massive undertaking
| requiring a lot of background I don't have. The second attempt
| was to add syntax for types to R and then compile the code to
| another language. That's easy to do, but really boring, so I
| wasn't able to stick with it. It comes with the advantages of
| static typing and R code that runs very fast. I gave up and went
| with embedding R inside a statically typed language. Very happy
| with my choice.
|
| Good luck to the authors of this. I believe it solves an
| important problem for R package authors and others wanting to
| write bigger programs. It's hard to argue with the benefits of
| static typing for this type of work.
| layer8 wrote:
| Sounds like vapourware. ;)
| joshdavham wrote:
| I mean, there is an alpha you can download. If it was just a
| landing page and an email waitlist, then that would be
| vaporware.
| layer8 wrote:
| I was commenting on the naming choice.
| joshdavham wrote:
| Looks interesting! What types of programs do you think people
| would write in this language? I don't see an obvious need for
| traditional R programs which are usually just scripts for working
| with data, but maybe people could write R packages in this
| language?
| johnnybzane wrote:
| How do I find jobs that use the R language? It's impossible to
| search the letter "R" on linkedIn or Indeed without getting a
| bunch of unrelated job postings
|
| "R" is the only programming language I know and I can't find a
| job that uses a R because job search engines don't allow you to
| sort by skill
|
| "R language" is the closest substitute on linkedin but the
| results are still a jumbled mess of jobs, some looking moreso for
| other skills (SQL/Python)
|
| I know R-heavy jobs exist but finding them on LinkedIn is
| virtually impossible
| clircle wrote:
| Why would you do that? R is a just a tool for doing statistics
| or research. You need to search for jobs in your subject area
| like "ecologist", "econometrician", "green energy reseacher",
| etc.
| johnnybzane wrote:
| There are hedge funds that like hiring people who know how to
| manipulate data in R using dplyr and data.table
|
| Looking for a similar job where my desire/interest to spend
| all day in Rstudio is a value add to a business
| nickforr wrote:
| With apologies if this breaks guidelines:
| https://hymans.current-
| vacancies.com/Jobs/Advert/3525353?cid...
| Balladeer wrote:
| How does "R language" compare to searching for one of the
| popular R packages? Searching for "tidyverse", "dplyr", or
| "ggplot" seems to get a good chunk of hits. That being said,
| yeah, there does seem to be a trio of skills that often go
| together (R, python, SQL)
| johnnybzane wrote:
| If you search specific packages on LinkedIn the number of
| jobs is usually very small
|
| E.g. tidyverse or dplyr is like 20-40 jobs. ggplot is 88.
| There's definitely way more than 100+ companies looking for
| R-heavy users.
| kagevf wrote:
| I tried using "r" (with quotes) on indeed, and got some hits
| where R was listed as one of the necessary skills.
| dkga wrote:
| Perhaps #rlang would work? Or #tidyverse if you are feeling
| tibblish :)
| russellbeattie wrote:
| This isn't specifically about Vapour, just about what's become
| the common way to specify types.
|
| I know this is totally bike shedding, semantics, vi vs Emacs,
| BigEndian vs LittleEndian and it's too late now to affect
| anything, but to me using a colon _after_ the variable is just
| wrong!
|
| let x : int = 1
|
| func add(x: int, y: int): int { return x + y }
|
| I see that and it looks like int = 1 and the function's return
| type is totally lost.
|
| This seems completely backwards to me. Maybe I'm just used to the
| way C did it, but the variable modifiers should come first.
|
| let int x = 1
|
| func int add(int x, int y) { return x + y }
|
| Why we reversed it and added in the colon just doesn't make much
| sense to me.
| nerdponx wrote:
| I have some questions that are not answered by the homepage.
|
| 1) How does this work with function parameters that are intended
| to be captured unevaluated with substitute()? Do you type the
| input as "any" and document separately that the parameter is kept
| "unevaluated" as a symbol/name or call?
|
| 2) How does this work with existing untyped R code? Does it at
| least include types for the standard library (or some subset
| thereof?)
|
| 3) Is there any type inference, or does it require explicit type
| annotation everywhere?
|
| 4) How do you propose to handle NA (which can appear "within" any
| typed vector)? Does the compiler support refinement types? If
| not, how does checking for and preventing nullability work, when
| checking for NA values requires a runtime check?
|
| 5) How do data frames work? Are they typed like structs?
|
| 6) Which object systems does it support, if any? S3, S4,
| Reference Classes, or the 3rd-party R6?
|
| As much as I like static types, I feel like R is maybe the
| language where I need or want them the _least_. How often do you
| really run into a situation where you pass a character vector to
| a function that requires a numeric vector and it crashes your
| program?
|
| 99% of the time what you really want is _known-valid data frames_
| for data processing, and _statically-sized arrays_ for math
| stuff.
| condwanaland wrote:
| Cool idea! Looking forward to exploring it this weekend
___________________________________________________________________
(page generated 2024-09-19 23:00 UTC)