[HN Gopher] What would it take to add refinement types to Rust?
       ___________________________________________________________________
        
       What would it take to add refinement types to Rust?
        
       Author : Yoric
       Score  : 64 points
       Date   : 2024-12-23 09:15 UTC (1 days ago)
        
 (HTM) web link (yoric.github.io)
 (TXT) w3m dump (yoric.github.io)
        
       | loeg wrote:
       | The author didn't mention this, but maybe there could be a type
       | (unit) canonicalization step that always produces the same
       | reduced/simplified type for any equivalent set of Mul/Divs? So
       | that you don't need the later equivalency check.
       | 
       | E.g. with the article's example, where 'a' is Meters and 'b' is
       | Seconds, 'a/(b*b)' and 'a/b/b' both have type 'Div<Meters,
       | Mul<Seconds, Seconds>>' instead of one having the type
       | 'Div<Div<Meters, Seconds>, Seconds>'.
       | 
       | For only Muls and Divs you can basically just have a histogram of
       | units to powers (e.g., m/s^2 => m: 1, s: -2) which uniquely
       | represent equivalent types.
        
         | hinkley wrote:
         | James Gosling looked at this and wrote about it at the time. It
         | can be a bit mind bending. Especially with units named after
         | people.
         | 
         | If you're looking at statistics for current or power the type
         | system might try to convert it to joules even though you wanted
         | to look at average wattage.
        
           | loeg wrote:
           | Yeah, but equivalent units with different names is sort of
           | only a display/formatting issue, right?
        
             | weinzierl wrote:
             | This is a matter of perspective.
             | 
             | For example the unit Sievert is an official SI unit despite
             | being just J/kg. This is because confusing equivalent dose
             | and absorbed dose, which also has the unit of J/kg, could
             | be very dangerous.
             | 
             | Note, that this is different from J sometimes being written
             | as Ws. While there are informal conventions, when we use J
             | and when Ws, using the unconventional one would not be
             | technically wrong because 1 J is simply 1 Ws, whereas 1 Sv
             | is not necessarily 1 J/kg when the later is an absorbed
             | dose.
             | 
             | I think one could reasonably disagree with these decisions
             | but that is how the SI people see it.
        
             | epcoa wrote:
             | In this case though it's just the wrong unit. Average
             | wattage is still watts, not joules.
        
           | epcoa wrote:
           | I do not understand, average wattage/power would be
           | integration of wattage (joules) divided by time (so back to
           | watts), it's still watts? Under what implementation would you
           | end up with joules that isn't just oddly broken in general?
           | 
           | I guess I could see a naive implementation (confusing
           | integration/sampling and discrete summation) going the
           | _other_ way, erroneously ending up with a nonsensical W /s.
           | 
           | Also don't understand what it has to do with eponyms, which
           | are just substitutes for base units, either your DA works or
           | not, no? Average wattage is kg[?]m2[?]s-3 not kg[?]m2[?]s-2
           | (joules) or kg[?]m2[?]s-4
        
         | jiggawatts wrote:
         | A simple way to do this is to store a vector of the powers.
         | 
         | For example momentum is kilogram * meter / second, which is
         | MASS^1 * LENGTH^1 * TIME^-1
         | 
         | As a vector, this can be represented as (1,1,-1) where the
         | positions are M, L, T respectively.
         | 
         | In that format velocity is represented as (0,1,-1),
         | acceleration is (0,1,-2), etc...
         | 
         | This is automatically canonicalised and much easier to
         | manipulate than a tree of operations.
         | 
         | Of course, this assumes uniform units such as CGS or MKS in
         | something sane like the metric system. Conversion back and
         | forth is generally straightforward, as long as the types encode
         | the system used. E.g.: CGS<1,1,-1> and MKS<1,1,-1> both
         | represent momentum, but at different scales.
         | 
         | Imperial also works, and other base units can be added to
         | extend the system. This can include things like current,
         | temperature, moles, etc...
         | 
         | See: https://en.wikipedia.org/wiki/Dimensional_analysis
        
       | jph wrote:
       | Type refinements are a great concept and I'd love to see them in
       | Rust. And double-refinement types are great for helping with
       | conversions, such as with Rust From/Into, and potentially a
       | dynamic converter function.
       | 
       | Examples of double-refinements that I'd like:
       | 
       | - Common units like Length:Meter and Length:Foot.
       | 
       | - Color bits like Color:RGB24 and Color:CYMK24.
       | 
       | - Worldwide currency like Money:USD and Money:GBP with a
       | converter function that knows exchange rates.
       | 
       | - Human languages like String:English vs String:Cymraeg with a
       | converter function that knows translations.
        
         | spockz wrote:
         | The currency converter is the odd one out because it context
         | dependent. (Time, contract, etc)
        
           | 0xFF0123 wrote:
           | Similar to time with a timezone?
        
         | skissane wrote:
         | > - Worldwide currency like Money:USD and Money:GBP with a
         | converter function that knows exchange rates.
         | 
         | Exchange rates vary over time, so you'd arguably need a type
         | which includes a timestamp (e.g. "USD 1000 at 2024-12-25").
         | 
         | And that's ignoring all these other complexities such as the
         | spread, different currency converters offering differing rates,
         | unofficial and multiple official rates in countries with
         | currency controls (e.g. Argentina), hedging, etc
        
           | jaza wrote:
           | Plus, perhaps the biggest complexity of all is that the
           | currency rates are often not free, particularly if you want
           | "live pricing" (updated every few seconds), and particularly
           | if it's for commercial use. And, the fact that they may or
           | may not be free, also illustrates well the fact that there
           | are no definitive rates, there are only "rates according to
           | X".
        
       | nmilo wrote:
       | It should be easier now with const generics, I know the popular
       | C++ library looks like Value<Unit, T, L, M, ...> where the
       | letters are numbers representing the dimension of time, length,
       | mass, etc. So m/s^2 would always be Value<f64, -2, 1, 0, ...>. By
       | keeping it normalized you don't need Equivalent or whatever
        
       | marxisttemp wrote:
       | Could someone clarify to me the different between dependent types
       | and refinement types?
        
         | dietr1ch wrote:
         | Refinements embed a predicate that elements of the new type
         | must pass.                   def Nat = Int: (|x| -> x >= 0)
         | 
         | Dependent types allow types to be computed from functions (and
         | depend on arguments, otherwise it seems they become just weird
         | constants),                   def Five(as_type: String) ->
         | NumericType(as_type):           match as_type:
         | "string" => "five"             "int" => 5             "float"
         | => 5.0f             "double" => 5.0d             _ => panic()
         | // Unnecessary if you refine `as_type` from a String to an enum
         | or a fixed set of strings.
         | 
         | Dependent types seem weird, but they help making types first-
         | class (https://www.youtube.com/watch?v=mOtKD7ml0NU&t=325s) and
         | gaining types like `Array<T, N>` that allow ensuring things are
         | the right length, and define append/extend properly.
        
         | demurgos wrote:
         | Dependent types can make type constructors generic over
         | _values_ (instead of only types). Refinement types keep the
         | separation between types and values, but they let you wrap an
         | existing type to enforce extra constraints or semantics.
         | Another example of refinement are pattern types where you can
         | attach a match pattern that is enforced to pass.
        
         | burakemir wrote:
         | Dependent types typically refers to type systems where a type
         | can depend on a term. The canonical example is "Vector n" where
         | n is some expression that evaluates to a natural number.
         | 
         | Refinement types typically(1) refers to a type systems that
         | lets you create a subtype of a type through refining
         | (qualifying) with a predicate or constraint on the shape.
         | Examples {x \in int | is_even x } or { x \in List | len(x) = 1
         | }
         | 
         | Refinement types can be very powerful but that may well make
         | type checking undecidable (think of a type of Turing machines,
         | and the refinement that keeps only the ones that halt). By
         | being careful about the logic used in the refinements, one may
         | retain decidability.
         | 
         | (1) The article seems to have a different idea of what a
         | refinement type is: quote "a type system that does its work
         | after another type system has already done its work".
         | 
         | I am not going to play orthodox guardian of type theory
         | terminology here, yet to me personally, it does seem
         | unfortunate to use that term. The author seems to really want a
         | form of type-level computation, which could be interesting if
         | it could be rigorously specified and it's relation to the
         | existing type level reduction clarified.
        
         | akdor1154 wrote:
         | You could just read the first paragraph of TFA?
        
       | Waterluvian wrote:
       | If I understand this properly, I've needed this in typescript a
       | lot.
       | 
       | I can make `type uuid = string` for self documentation, but a lot
       | of plugins will just label it "string" and developers can (and
       | have) mistakenly put some other identifier, like the robot's
       | hostname.
       | 
       | Of course we validate at the API but it'd be more skookum if we
       | could prevent accidental wiring together of front-end components
       | that make this error.
       | 
       | String literals help a ton. Gosh they're wonderful to care about
       | the shape of a string in the type system. But sometimes I really
       | want to say "strict uuid" as in "I don't care if it quacks like a
       | duck, it's not called duck."
        
         | mcfedr wrote:
         | Check out what are often called tagged types.
         | 
         | You basically define a `type A = string &{a: SomeSymbol}`
         | 
         | And then have a type assertion function that just returns true,
         | and you have control over the places A can come from
        
           | Waterluvian wrote:
           | Ah yeah I remember poking at that. I should review it again.
           | I think at the time it felt like a bit of a hack, but maybe
           | some features of the past years of TS updates have helped.
           | 
           | I assume the idea is to lie to the type system about the
           | existence of the symbol, and at runtime it is just a string.
        
           | demurgos wrote:
           | Yes, it's the standard way if you want implicit compat with
           | the base type, so you can pass a value of type `A` to a
           | function expecting a `string`. An other name for this pattern
           | is "branded types".
        
           | steve_adams_86 wrote:
           | I really like the Effect schema library for tagged types. I'm
           | not sure how well it works for primitives, though.
        
         | spockz wrote:
         | Haskell has had this with `newtype` for ages. Type safety
         | without runtime overhead.
        
       | amenghra wrote:
       | Units might seem simple but they have a ton of edge cases. Do you
       | want to be able to add inches and feet? Be careful about
       | potential precision/rounding issues. What is the unit for a
       | temperature delta? You can't simply keep the original unit (eg C
       | or F) because conversion from F to C is a different rule than DF
       | to DC. Etc.
       | 
       | Units do prevent bugs in programs, so they have an important role
       | to play. But they also need to be designed very carefully.
       | 
       | Java adopted units via JSR 385 (https://belief-driven-
       | design.com/java-measurement-jsr-385-21...)
        
       | thesuperbigfrog wrote:
       | I would love to see refinement types added to Rust.
       | 
       | They are very handy for preventing errors and expressing intent
       | in Ada:
       | 
       | https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...
        
       | weinzierl wrote:
       | I find the idea of being able to specify numerical types with
       | arbitrary ranges very appealing.
       | 
       | In Pascal these are called range types, e.g.
       | 
       | month: 1..12
       | 
       | would define an integer where the type system would ensure that
       | it is always between 1 and 12.
       | 
       | Apart from Ada this seems to be an alien concept to all other
       | languages. The concept of a "range type" also seems to have other
       | meanings.
       | 
       | What is the Pascal "range type" properly called in type theory
       | and what is its relationship to refinement types?
        
         | munchler wrote:
         | > an integer where the type system would ensure that it is
         | always between 1 and 12
         | 
         | How does Pascal handle overflow/underflow? E.g. Month 10 +
         | Month 11 = Month 21?
        
         | bspammer wrote:
         | [delayed]
        
       | JoshTriplett wrote:
       | As a related concept, we've talked about adding "pattern types"
       | to Rust: `Result<T> is Err(_)` (the type of a Result that you've
       | just confirmed is an `Err`, so you don't have to handle the `Ok`
       | case), or `u32 is 1..` (equivalent to `NonZero<u32>` but more
       | flexible).
        
       | ordu wrote:
       | I'd really like to have them. If I'm writing an algorithm that
       | maps interval (0, 1) of probabilities onto different intervals of
       | u32 integers, and probabilites are also represented as integers
       | (an integer part of p*2^n), I will want to have different units
       | for those integer probabilites and their mapped values. I don't
       | want to mix them accidentally and to add probability and a mapped
       | probability accidentally. Probably I dont want to add cumulative
       | probability and a probability, though I'm not really sure about
       | that, because sometimes I want to add (for example when
       | calculating cumulative probabilities) and I want to compare them.
       | Though maybe it will work with some exotic rules, like CumProb-
       | CumProb -> Prob, while CumProb+CumProb is forbidden?
       | 
       | And there are more examples of that, I can have probabilities
       | from different distributions, I don't want to add them
       | accidentally, though multiplications of them is all over the
       | place, so let it be. Of course I have no hope that any language
       | could deal with the explosion of types that are needed to
       | represent this, but if compiler just gave me f64 as a result of
       | multiplication of probabilites, I'd be happy.
       | 
       | It can be done in Rust on case by case bases, but it is a lot of
       | boilerplate. The issue is the typedef IntProb = u32, treat
       | IntProb like an alias to u32 and rustc converts these types into
       | each other silently like C compiler converts int to char. One can
       | do struct IntProb(u32), but then it would be needed to implement
       | traits like Add, Mul, Cmp and so on, which is possible but it is
       | too much work. If it was possible to force rustc to treat typedef
       | MyType = {NumericType} as a distinct numeric type that requires
       | explicit conversion into NumericType, while retaining all the
       | traits of NumericType, just substituting in them NumericType with
       | MyType, it would be great.
       | 
       | I think, that all the complexities described in the article stem
       | from the attempt to create an universal instrument that can do
       | everything and to keep bees. The real difficulty is to pick a
       | small subset of wants, that will cover 80% of needs, while being
       | really simple. I see no issues with occasional
       | .into::<Length<f64>>(), like I see no issues with (my_struc.index
       | as usize), if I keep index as u8 to spare memory, but use it as
       | usize of course. I see no need in different units for the same
       | quantity, because in any case I'd want to convert everything into
       | the same uniform units before I start adding and multiplying. But
       | I'd like to have restrictions on available operations between
       | different types, and I'd like to have a possibility to add
       | optional dynamic checks for type value, that I could turn on for
       | a debug build and turn off for a release one. For example, I'd
       | like to check that any probability p fits into [0, 1].
        
       ___________________________________________________________________
       (page generated 2024-12-24 23:00 UTC)