hngopher.com

       [HN Gopher] Nominal for Storing, Structural for Manipulating
       ___________________________________________________________________
        
       Nominal for Storing, Structural for Manipulating
        
       Author : todsacerdoti
       Score  : 40 points
       Date   : 2024-11-20 02:50 UTC (20 hours ago)
        
 (HTM) web link (welltypedwitch.bearblog.dev)
 (TXT) w3m dump (welltypedwitch.bearblog.dev)
        
       | PaulHoule wrote:
       | I think is forgotten here that one of the benefits of nominal
       | typing is that the compiler can know that data layout at run time
       | so performance benefits.
       | 
       | There has been so much ink spilled on the question of what kind
       | of type systems help programmers be productive but there is not
       | such controversy on the performance side.
        
         | agentultra wrote:
         | Do you mean at compile time?
         | 
         | I'm mostly familiar with Haskell which does type erasure. I
         | think the means that there is no type checking at run time.
         | 
         | I think dependently typed languages would the benefit of
         | knowing structure at compile time enough to detect things like
         | dead branches and impossible cases which can optimize by
         | removing code. I'm not sure that all types are erased in such
         | languages though.
        
           | taeric wrote:
           | I assume they mean the simple fact that nominal types can be
           | used in optimizations. Is why float versus double or int can
           | be a big difference.
           | 
           | For user types, explicitly, it is less of a benefit. But can
           | still show up.
        
             | PaulHoule wrote:
             | In the case of simple types, say a 64-bit int if the
             | compiler knows what the types are ahead of time it can
             | store the ints in registers and add them with one assembly
             | language instruction. Same if it is floating point.
             | 
             | If you want to add two things in Python they could be
             | totally different types so at runtime the program is likely
             | to have pointers in those registers and it is going to have
             | to look at the objects and figure out what it has to do to
             | add them, possibly calling the __add__ method on the
             | object. In the obvious implementation you are having to
             | fetch the int out of memory when you could have just got it
             | out of the register.
             | 
             | Now many languages play tricks with pointers, we are not
             | filling out a 64-bit address space any time soon, so we
             | could make 'pointers' with certain bit patterns host
             | integers and other numbers inside. Still it is a lot harder
             | to add two of those than it is to just add two values we
             | know are integers.
             | 
             | With user types it is pretty much the same, particularly in
             | single inheritance languages like Java where it is dead
             | obvious that you can access field A of type X by adding a
             | fixed offset to the object pointer. In a language like
             | Python or Javascript you are looking everything up in a
             | hashtable, even if it is a fast hashtable. (You don't
             | consistently win using slots.)
             | 
             | A really advanced compiler/runtime could undo some of this
             | stuff, for instance it might be clear that in a particular
             | case you are adding x+y and those are both going to be
             | 64-bit ints and you can specialize it. You could do this at
             | build time or even at runtime see that function has been
             | called in a particular context 10,000 and you get the
             | chance to inline that function into the function that calls
             | it and then inline that function and then recompile it with
             | the fastest code but it is tricky to get right. If an x
             | that isn't an int finds its way in there you need to
             | despecialize the function without breaking anything.
             | 
             | PyPy shows that Python could be greatly improved over what
             | it is, I think CPython is going to approach it in speed
             | gradually.
        
         | lmm wrote:
         | Premature optimisation is the root of all evil. If you get the
         | semantics right then good performance will generally follow, if
         | you make the wrong thing then it doesn't matter how fast it is.
        
           | taeric wrote:
           | Performant code in the wild largely disagrees? This is the
           | difference between a numpy array versus a regular python
           | list. It is stark.
           | 
           | You are correct that you can often engineer the performance
           | later. But layout concerns and nominal types for performance
           | are fairly non controversial.
        
             | thesz wrote:
             | Python does not do optimizations at all.
             | 
             | In contrast, you can find Fortran compilers that can
             | substitute your inefficient implementation of some
             | numerical algorithm with different much more performant
             | one.
        
               | PaulHoule wrote:
               | Python has a peephole optimizer:
               | https://akaptur.com/blog/2014/08/02/the-cpython-peephole-
               | opt...
        
               | maccard wrote:
               | I think what OP meant was "Python's optimisations are not
               | very fast", which (IMO) is a fair statement. Python is
               | many, many things, but it's best case scenarios (without
               | rewriting in C, at which point you're writing C not
               | python) are slower than the naive implementations in many
               | other languages.
        
           | biorach wrote:
           | > Premature optimisation is the root of all evil
           | 
           | If you're going to quote Knuth you better be damn sure you've
           | fully understood him.
           | 
           | The quote in question is about micro-optimisations which were
           | pointless on account of design issues. The situation you are
           | commenting on is kind of the opposite.
        
             | PaulHoule wrote:
             | I would say this debate
             | 
             | https://en.wikipedia.org/wiki/AoS_and_SoA
             | 
             | (which I think is weakened in the Wikipedia version) goes
             | to the heart of where profiling can hide huge opportunities
             | for optimization.
             | 
             | It is a good prop bet that rewriting something (that
             | doesn't allocate lots of dynamic memory) that is SoA style
             | in AoS style will speed it up significantly. The victim is
             | emotionally attached to their code and the profiler would
             | never show you the many small costs, often distributed
             | through the memory system, that add up. In a case like that
             | they might accuse you of cheating by applying a bunch of
             | small optimizations which are often easier to express in
             | SoA if not downright obvious. When they apply the same
             | optimizations to their code they will speed it up but not
             | as much.
             | 
             | Oddly it's a discussion that's been going on even since
             | people were writing games in assembly language in the 1980s
             | and probably since that; it would be interesting to see
             | more tools which are AoS on the inside but look SoA on the
             | outside.
        
           | fanf2 wrote:
           | Good data layout is the root of all optimization.
        
             | thesz wrote:
             | Worst-case optimal join algorithm [1] might have a
             | different opinion.
             | 
             | [1] https://en.wikipedia.org/wiki/Worst-
             | case_optimal_join_algori...
        
           | jerf wrote:
           | This is basically a form of the "Sufficiently Smart Compiler"
           | claim. It is generally false, barring a definition of
           | "semantics right" that simply includes a presumption that the
           | "semantics" will be chosen for performance in which case the
           | claim becomes circular.
           | 
           | At this point in the collective journey we are all on
           | understanding programming languages and what they can do, the
           | evidence is _overwhelming_ that there are in fact plenty of
           | useful semantics that are intrinsically slower than other
           | useful semantics, relative to any particular chunk of
           | hardware executing them. That is, what is slow on CPU or GPU
           | may differ, but there is certainly some sort of difference
           | that will exist, and there is no amount of (feasible)
           | compiling around that problem.
           | 
           | Indeed, that's _why_ we have CPUs and GPUs and soon NPUs and
           | in the future perhaps other types of dedicated processors...
           | precisely _because_ not all semantics can be implemented
           | equally no matter how smart you are.
        
             | lmm wrote:
             | From where I stand the sufficiently smart compiler paradigm
             | is already dominant, and getting more so every day.
             | 
             | - The overwhelming majority of new code is written in high-
             | level languages
             | 
             | - High-level languages have continued to close what small
             | performance gaps remain
             | 
             | - There have been no serious efforts to implement a true
             | low-level language for post-Pentium (superscalar) CPUs, yet
             | alone the CPUs of today
             | 
             | - Even GPUs and NPUs are largely programmed by using
             | languages that express largely the same semantics as
             | languages for CPUs, and relying heavily on compiler
             | optimisation
        
               | jerf wrote:
               | That's not what the "sufficiently smart compiler" is. The
               | "sufficiently smart compiler" is one that allows you to
               | write any amazingly capable language you want with
               | whatever amazing semantics you want, and it compiles it
               | into code that is competitive with C written for speed.
               | 
               | You can easily observe in any cross-language shootout in
               | 2024 that optimized code bases in the various languages
               | still have gradients and we do not live in a world where
               | you can just start slamming out Python code and expect no
               | performance delta against C.
               | 
               | https://prog21.dadgum.com/40.html
               | 
               | Merely _smart_ compilers are amazing; one of the defining
               | characteristics of the software world is that you can be
               | handed these things _for free_. The  "sufficiently smart
               | compiler", however, does not exist, and while there is no
               | mathematical proof that I'm aware of that they are
               | impossible, after at least two decades of people trying
               | to produce them and totally failing, the rational
               | expectation at this point must be that they do not exist.
        
       | readthenotes1 wrote:
       | An odd to see Java thrown in there without methods...
        
       | molikto wrote:
       | convertTree doesn't work because Tree uses Tree type recursively.
        
         | jerf wrote:
         | Since the author is writing their own language, their language
         | can fix that; it's not hard to figure out how to implement
         | that.
         | 
         | What I find horrifying is that types are most useful when they
         | are more than just a shape. Just because two things have a type
         | X[A] {X[A], A, X[A]} doesn't mean it's the same binary tree. A
         | trivial example is that one tree could be sorting by the "id"
         | field and another could be sorting on a "username" field, and
         | simply casting one to the other means you've now got a garbage
         | data structure and none of the functions/methods operating on
         | it are going to return good data now. It's a bit of a culture
         | shock to see such Haskell-esque type notation combined with
         | excitement over blowing away everything about a type other than
         | its brute shape.
        
       | vrotaru wrote:
       | I guess Modula-3 was doing it as well.
       | 
       | Records were structurally typed. But you can "braid"(?) a record
       | and that will make it nominal type.
        
       | cryptonector wrote:
       | ASN.1 is structurally typed too.
        
       | deredede wrote:
       | > What this means is that all nominal variants and records are
       | really just nominal wrappers over structural variants or records.
       | 
       | If you have both nominal types and structural types in your
       | language, you can already do this, while keeping the ability to
       | be nominal only or structural only when you want.
       | 
       | In the following OCaml code variant inference in pattern matching
       | is not automatic (although the compiler will warn you about the
       | missing cases, which helps you figuring what to write), but the
       | types do get refined in the corresponding branch.
       | type 'a tree =           Tree of             [ `Empty
       | | `Branch of ('a tree * 'a * 'a tree)               (*
       | Unfortunately, inline records are not supported here,
       | so you have to use tuples, objects, or a mutually recursive
       | record                  type. *)             ]
       | [@@unboxed]              (* You can give aliases to subsets of
       | constructors *)         type 'a branch =           [ `Branch of
       | ('a tree * 'a * 'a tree) ]                  let f (x : 'a branch)
       | = ()                  let f x =           match x with
       | | Tree `Empty -> ()           (* You need to be explicit about
       | the cases you want to handle. This pattern              could
       | also be written `Tree (#branch as rest)`. *)           | Tree
       | (`Branch _ as rest) -> f rest
       | 
       | The one feature I'd really like in this space would be the
       | ability to refer to a subset of the constructors of a nominal
       | types as if it was a (restricted) polymorphic variant that could
       | only be recombined with another subset of constructors of the
       | same nominal type. It would allow some of the power of
       | polymorphic variants without losing the efficient representation
       | allowed by nominal types knowing the possible variants ahead of
       | time.
        
       | js8 wrote:
       | The nominal vs structural distinction (in both programming and
       | math) goes beyond types. Should names in programs matter?
       | 
       | Consider two functions, say copySurname and copyStreetName. They
       | do same exact thing, but in different context. No sane person
       | would call copySurname for the street, even though the function
       | is the same.
       | 
       | So there is this tension, should name of the function (or of the
       | type) only reflect it's internal structure (structuralism), or
       | should it also reflect the intended domain application
       | (nominalism)?
       | 
       | There is a formalism school of mathematics that is pretty much
       | the hardcore structuralist, i.e. names of the objects don't
       | matter, only their relationships. But most mathematicians (and
       | all programmers) reject that view.
        
         | bheadmaster wrote:
         | In real-world software writing, there's a huge pull towards
         | nominalism (i.e. naming code parts by their domain purpose),
         | but in my experience, the most useful way is to have structural
         | names for internal mechanisms, then use their interfaces in
         | nominal way by describing their role in the business logic.
         | 
         | That way, all nominalism is simply a descriptive wrapper over
         | structuralism.
        
           | js8 wrote:
           | That's very similar to what the article is advocating.
        
         | sevensor wrote:
         | Although I have my frustrations with Python's bolted-on type
         | checking, particularly how clunky the type annotations can be
         | at runtime (this type is just the string "Foo | Bar | int",
         | good luck!), I think the pragmatic approach is working out
         | pretty well on the nominal versus structural front. I.e. I can
         | make it an error to assign a number in radians to a variable of
         | type meters (NewType), but I can also construct a dictionary of
         | the right shape and have it satisfy a TypedDict annotation. Or
         | I can write a Protocol, which is the biggest hammer in the type
         | system.
        
         | ojkelly wrote:
         | The name should represent what the function does, it should
         | indicate its purpose.
         | 
         | The distinction is useful even when it's structurally identical
         | to another function.
         | 
         | Two identical functions in different contexts or domains often
         | diverge. When they're not arbitrarily bound by the fact they
         | have the same contents it's easy to extend one and not the
         | other.
         | 
         | During compilation, they could both end up using the same
         | actual implementation (though I'm not sure if any compiler does
         | this).
        
       ___________________________________________________________________
       (page generated 2024-11-20 23:02 UTC)