[HN Gopher] What would it take to add refinement types to Rust?
___________________________________________________________________
What would it take to add refinement types to Rust?
Author : Yoric
Score : 64 points
Date : 2024-12-23 09:15 UTC (1 days ago)
(HTM) web link (yoric.github.io)
(TXT) w3m dump (yoric.github.io)
| loeg wrote:
| The author didn't mention this, but maybe there could be a type
| (unit) canonicalization step that always produces the same
| reduced/simplified type for any equivalent set of Mul/Divs? So
| that you don't need the later equivalency check.
|
| E.g. with the article's example, where 'a' is Meters and 'b' is
| Seconds, 'a/(b*b)' and 'a/b/b' both have type 'Div<Meters,
| Mul<Seconds, Seconds>>' instead of one having the type
| 'Div<Div<Meters, Seconds>, Seconds>'.
|
| For only Muls and Divs you can basically just have a histogram of
| units to powers (e.g., m/s^2 => m: 1, s: -2) which uniquely
| represent equivalent types.
| hinkley wrote:
| James Gosling looked at this and wrote about it at the time. It
| can be a bit mind bending. Especially with units named after
| people.
|
| If you're looking at statistics for current or power the type
| system might try to convert it to joules even though you wanted
| to look at average wattage.
| loeg wrote:
| Yeah, but equivalent units with different names is sort of
| only a display/formatting issue, right?
| weinzierl wrote:
| This is a matter of perspective.
|
| For example the unit Sievert is an official SI unit despite
| being just J/kg. This is because confusing equivalent dose
| and absorbed dose, which also has the unit of J/kg, could
| be very dangerous.
|
| Note, that this is different from J sometimes being written
| as Ws. While there are informal conventions, when we use J
| and when Ws, using the unconventional one would not be
| technically wrong because 1 J is simply 1 Ws, whereas 1 Sv
| is not necessarily 1 J/kg when the later is an absorbed
| dose.
|
| I think one could reasonably disagree with these decisions
| but that is how the SI people see it.
| epcoa wrote:
| In this case though it's just the wrong unit. Average
| wattage is still watts, not joules.
| epcoa wrote:
| I do not understand, average wattage/power would be
| integration of wattage (joules) divided by time (so back to
| watts), it's still watts? Under what implementation would you
| end up with joules that isn't just oddly broken in general?
|
| I guess I could see a naive implementation (confusing
| integration/sampling and discrete summation) going the
| _other_ way, erroneously ending up with a nonsensical W /s.
|
| Also don't understand what it has to do with eponyms, which
| are just substitutes for base units, either your DA works or
| not, no? Average wattage is kg[?]m2[?]s-3 not kg[?]m2[?]s-2
| (joules) or kg[?]m2[?]s-4
| jiggawatts wrote:
| A simple way to do this is to store a vector of the powers.
|
| For example momentum is kilogram * meter / second, which is
| MASS^1 * LENGTH^1 * TIME^-1
|
| As a vector, this can be represented as (1,1,-1) where the
| positions are M, L, T respectively.
|
| In that format velocity is represented as (0,1,-1),
| acceleration is (0,1,-2), etc...
|
| This is automatically canonicalised and much easier to
| manipulate than a tree of operations.
|
| Of course, this assumes uniform units such as CGS or MKS in
| something sane like the metric system. Conversion back and
| forth is generally straightforward, as long as the types encode
| the system used. E.g.: CGS<1,1,-1> and MKS<1,1,-1> both
| represent momentum, but at different scales.
|
| Imperial also works, and other base units can be added to
| extend the system. This can include things like current,
| temperature, moles, etc...
|
| See: https://en.wikipedia.org/wiki/Dimensional_analysis
| jph wrote:
| Type refinements are a great concept and I'd love to see them in
| Rust. And double-refinement types are great for helping with
| conversions, such as with Rust From/Into, and potentially a
| dynamic converter function.
|
| Examples of double-refinements that I'd like:
|
| - Common units like Length:Meter and Length:Foot.
|
| - Color bits like Color:RGB24 and Color:CYMK24.
|
| - Worldwide currency like Money:USD and Money:GBP with a
| converter function that knows exchange rates.
|
| - Human languages like String:English vs String:Cymraeg with a
| converter function that knows translations.
| spockz wrote:
| The currency converter is the odd one out because it context
| dependent. (Time, contract, etc)
| 0xFF0123 wrote:
| Similar to time with a timezone?
| skissane wrote:
| > - Worldwide currency like Money:USD and Money:GBP with a
| converter function that knows exchange rates.
|
| Exchange rates vary over time, so you'd arguably need a type
| which includes a timestamp (e.g. "USD 1000 at 2024-12-25").
|
| And that's ignoring all these other complexities such as the
| spread, different currency converters offering differing rates,
| unofficial and multiple official rates in countries with
| currency controls (e.g. Argentina), hedging, etc
| jaza wrote:
| Plus, perhaps the biggest complexity of all is that the
| currency rates are often not free, particularly if you want
| "live pricing" (updated every few seconds), and particularly
| if it's for commercial use. And, the fact that they may or
| may not be free, also illustrates well the fact that there
| are no definitive rates, there are only "rates according to
| X".
| nmilo wrote:
| It should be easier now with const generics, I know the popular
| C++ library looks like Value<Unit, T, L, M, ...> where the
| letters are numbers representing the dimension of time, length,
| mass, etc. So m/s^2 would always be Value<f64, -2, 1, 0, ...>. By
| keeping it normalized you don't need Equivalent or whatever
| marxisttemp wrote:
| Could someone clarify to me the different between dependent types
| and refinement types?
| dietr1ch wrote:
| Refinements embed a predicate that elements of the new type
| must pass. def Nat = Int: (|x| -> x >= 0)
|
| Dependent types allow types to be computed from functions (and
| depend on arguments, otherwise it seems they become just weird
| constants), def Five(as_type: String) ->
| NumericType(as_type): match as_type:
| "string" => "five" "int" => 5 "float"
| => 5.0f "double" => 5.0d _ => panic()
| // Unnecessary if you refine `as_type` from a String to an enum
| or a fixed set of strings.
|
| Dependent types seem weird, but they help making types first-
| class (https://www.youtube.com/watch?v=mOtKD7ml0NU&t=325s) and
| gaining types like `Array<T, N>` that allow ensuring things are
| the right length, and define append/extend properly.
| demurgos wrote:
| Dependent types can make type constructors generic over
| _values_ (instead of only types). Refinement types keep the
| separation between types and values, but they let you wrap an
| existing type to enforce extra constraints or semantics.
| Another example of refinement are pattern types where you can
| attach a match pattern that is enforced to pass.
| burakemir wrote:
| Dependent types typically refers to type systems where a type
| can depend on a term. The canonical example is "Vector n" where
| n is some expression that evaluates to a natural number.
|
| Refinement types typically(1) refers to a type systems that
| lets you create a subtype of a type through refining
| (qualifying) with a predicate or constraint on the shape.
| Examples {x \in int | is_even x } or { x \in List | len(x) = 1
| }
|
| Refinement types can be very powerful but that may well make
| type checking undecidable (think of a type of Turing machines,
| and the refinement that keeps only the ones that halt). By
| being careful about the logic used in the refinements, one may
| retain decidability.
|
| (1) The article seems to have a different idea of what a
| refinement type is: quote "a type system that does its work
| after another type system has already done its work".
|
| I am not going to play orthodox guardian of type theory
| terminology here, yet to me personally, it does seem
| unfortunate to use that term. The author seems to really want a
| form of type-level computation, which could be interesting if
| it could be rigorously specified and it's relation to the
| existing type level reduction clarified.
| akdor1154 wrote:
| You could just read the first paragraph of TFA?
| Waterluvian wrote:
| If I understand this properly, I've needed this in typescript a
| lot.
|
| I can make `type uuid = string` for self documentation, but a lot
| of plugins will just label it "string" and developers can (and
| have) mistakenly put some other identifier, like the robot's
| hostname.
|
| Of course we validate at the API but it'd be more skookum if we
| could prevent accidental wiring together of front-end components
| that make this error.
|
| String literals help a ton. Gosh they're wonderful to care about
| the shape of a string in the type system. But sometimes I really
| want to say "strict uuid" as in "I don't care if it quacks like a
| duck, it's not called duck."
| mcfedr wrote:
| Check out what are often called tagged types.
|
| You basically define a `type A = string &{a: SomeSymbol}`
|
| And then have a type assertion function that just returns true,
| and you have control over the places A can come from
| Waterluvian wrote:
| Ah yeah I remember poking at that. I should review it again.
| I think at the time it felt like a bit of a hack, but maybe
| some features of the past years of TS updates have helped.
|
| I assume the idea is to lie to the type system about the
| existence of the symbol, and at runtime it is just a string.
| demurgos wrote:
| Yes, it's the standard way if you want implicit compat with
| the base type, so you can pass a value of type `A` to a
| function expecting a `string`. An other name for this pattern
| is "branded types".
| steve_adams_86 wrote:
| I really like the Effect schema library for tagged types. I'm
| not sure how well it works for primitives, though.
| spockz wrote:
| Haskell has had this with `newtype` for ages. Type safety
| without runtime overhead.
| amenghra wrote:
| Units might seem simple but they have a ton of edge cases. Do you
| want to be able to add inches and feet? Be careful about
| potential precision/rounding issues. What is the unit for a
| temperature delta? You can't simply keep the original unit (eg C
| or F) because conversion from F to C is a different rule than DF
| to DC. Etc.
|
| Units do prevent bugs in programs, so they have an important role
| to play. But they also need to be designed very carefully.
|
| Java adopted units via JSR 385 (https://belief-driven-
| design.com/java-measurement-jsr-385-21...)
| thesuperbigfrog wrote:
| I would love to see refinement types added to Rust.
|
| They are very handy for preventing errors and expressing intent
| in Ada:
|
| https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...
| weinzierl wrote:
| I find the idea of being able to specify numerical types with
| arbitrary ranges very appealing.
|
| In Pascal these are called range types, e.g.
|
| month: 1..12
|
| would define an integer where the type system would ensure that
| it is always between 1 and 12.
|
| Apart from Ada this seems to be an alien concept to all other
| languages. The concept of a "range type" also seems to have other
| meanings.
|
| What is the Pascal "range type" properly called in type theory
| and what is its relationship to refinement types?
| munchler wrote:
| > an integer where the type system would ensure that it is
| always between 1 and 12
|
| How does Pascal handle overflow/underflow? E.g. Month 10 +
| Month 11 = Month 21?
| bspammer wrote:
| [delayed]
| JoshTriplett wrote:
| As a related concept, we've talked about adding "pattern types"
| to Rust: `Result<T> is Err(_)` (the type of a Result that you've
| just confirmed is an `Err`, so you don't have to handle the `Ok`
| case), or `u32 is 1..` (equivalent to `NonZero<u32>` but more
| flexible).
| ordu wrote:
| I'd really like to have them. If I'm writing an algorithm that
| maps interval (0, 1) of probabilities onto different intervals of
| u32 integers, and probabilites are also represented as integers
| (an integer part of p*2^n), I will want to have different units
| for those integer probabilites and their mapped values. I don't
| want to mix them accidentally and to add probability and a mapped
| probability accidentally. Probably I dont want to add cumulative
| probability and a probability, though I'm not really sure about
| that, because sometimes I want to add (for example when
| calculating cumulative probabilities) and I want to compare them.
| Though maybe it will work with some exotic rules, like CumProb-
| CumProb -> Prob, while CumProb+CumProb is forbidden?
|
| And there are more examples of that, I can have probabilities
| from different distributions, I don't want to add them
| accidentally, though multiplications of them is all over the
| place, so let it be. Of course I have no hope that any language
| could deal with the explosion of types that are needed to
| represent this, but if compiler just gave me f64 as a result of
| multiplication of probabilites, I'd be happy.
|
| It can be done in Rust on case by case bases, but it is a lot of
| boilerplate. The issue is the typedef IntProb = u32, treat
| IntProb like an alias to u32 and rustc converts these types into
| each other silently like C compiler converts int to char. One can
| do struct IntProb(u32), but then it would be needed to implement
| traits like Add, Mul, Cmp and so on, which is possible but it is
| too much work. If it was possible to force rustc to treat typedef
| MyType = {NumericType} as a distinct numeric type that requires
| explicit conversion into NumericType, while retaining all the
| traits of NumericType, just substituting in them NumericType with
| MyType, it would be great.
|
| I think, that all the complexities described in the article stem
| from the attempt to create an universal instrument that can do
| everything and to keep bees. The real difficulty is to pick a
| small subset of wants, that will cover 80% of needs, while being
| really simple. I see no issues with occasional
| .into::<Length<f64>>(), like I see no issues with (my_struc.index
| as usize), if I keep index as u8 to spare memory, but use it as
| usize of course. I see no need in different units for the same
| quantity, because in any case I'd want to convert everything into
| the same uniform units before I start adding and multiplying. But
| I'd like to have restrictions on available operations between
| different types, and I'd like to have a possibility to add
| optional dynamic checks for type value, that I could turn on for
| a debug build and turn off for a release one. For example, I'd
| like to check that any probability p fits into [0, 1].
___________________________________________________________________
(page generated 2024-12-24 23:00 UTC)