[HN Gopher] Safe dead code removal in a pure functional language
       ___________________________________________________________________
        
       Safe dead code removal in a pure functional language
        
       Author : gbrown_
       Score  : 73 points
       Date   : 2021-01-28 13:19 UTC (9 hours ago)
        
 (HTM) web link (jfmengels.net)
 (TXT) w3m dump (jfmengels.net)
        
       | imoverclocked wrote:
       | Removing dead code is incredibly valuable. Even in time-consuming
       | environments with highly effectful languages it can mean the
       | difference between pulling in and maintaining old/archaic
       | dependencies or not. I wish more of our static analysis tools for
       | non-functional languages had better concepts of purity. There are
       | so many places where it fails in languages without this concept.
       | eg: Java annotations can change the behavior of a method from a
       | simple getter to something else entirely with hooks to be
       | used/referenced at runtime.
       | 
       | One place I have found that static analysis fails is with API
       | contexts. Either they get it wrong and offer to remove API
       | elements or they get it less-wrong and just don't touch anything
       | that can be exported. Granted getting it right would require all
       | usages of the API or a blessed list of things we know we want to
       | keep ... in which case we are potentially back at our source code
       | as a source of truth anyway.
       | 
       | Another interesting hole is "do we keep code that is only
       | referenced by tests?" In some cases, it makes sense but it's
       | definitely a knob that the user needs to adjust manually so far
       | as I have seen.
        
         | bjoli wrote:
         | DCE is also extremely valuable when writing lisp macros. Not
         | having to do DCE in the macro is saves a lot of code.
         | 
         | There is a talk by Kent Dybvig called "the macro writer's bill
         | of rights" on YouTube that goes into optimizations that help
         | macro writers.
        
         | Twisol wrote:
         | > eg: Java annotations can change the behavior of a method from
         | a simple getter to something else entirely with hooks to be
         | used/referenced at runtime.
         | 
         | To be slightly fair to Java, annotations don't themselves do
         | anything to the annotated method. Calling a method directly is
         | the same regardless of whatever annotations are on it -- of
         | course, unless it or one of its upstreams looks at the
         | annotations, but that's isomorphic to just passing the data
         | along.
         | 
         | Java annotations induce different behaviors in specific callers
         | that are designed to take advantage of those annotations, which
         | is arguably even worse -- like a meta analogue of the COMEFROM
         | keyword.
        
           | imoverclocked wrote:
           | Totally agree. The annotations basically just mark code.
           | Things like AspectJ can then place side-effects around marked
           | (and unmarked!) code.
           | 
           | Also, coming across transient runtime errors because you use
           | multi-catch (a basic Java language feature) and some java
           | library that injects bytecode is pretty lame. All the
           | benefits of a typed+compiled language with all the benefits
           | of monkey patching!
        
       | UncleMeat wrote:
       | Pure-function analysis is not that hard. If you want the easy
       | one, it is a pretty straightforward local analysis. If you want
       | the hard stuff you need a sound call graph and some fix-point
       | computation over the whole program, but it is still achievable.
       | 
       | The thing that is valuable here is the existence of pure
       | functions, not necessarily the requirement. For a huge number of
       | real programs written in java or whatever the exact same
       | technique applies.
        
         | amelius wrote:
         | > but it is still achievable
         | 
         | I would say it is undecidable, to be honest. For example
         | if(undecidable_function())             return lots_of_code();
         | else             return 0;
         | 
         | Can you remove lots_of_code() or not?
        
           | choeger wrote:
           | They didn't say dead code analysis was decidable, but purity
           | analysis.
           | 
           | Purity analysis is indeed trivially decidable, if you limit
           | yourself to "probably always pure", not "pure when executed
           | on a Tuesday". It boils down to "is this function calling
           | only pure functions and not using any of the impure
           | operators?". The trouble comes with dynamic languages where
           | you usually don't know shit about a function's arguments,
           | though. In that case you need a control flow Analysis of the
           | whole program.
        
           | shwestrick wrote:
           | IMO one of the most valuable things I've learned about static
           | analysis is that the halting problem is overly pessimistic.
           | It's easy to prove very interesting things about real-world
           | code.
           | 
           | Things like dead-code removal are a pretty standard for
           | modern optimizing compilers!
        
           | AlexCoventry wrote:
           | You can get 99% of the way there without solving the halting
           | problem, though. :)
        
             | jfmengels1 wrote:
             | Sure, but under some assumptions. Assumptions like that
             | some piece of code won't be transformed by a macro, that
             | reasonably looking code like `a.b` doesn't have side
             | effects, etc.
             | 
             | It's not that it's impossible, but that 1% (or 5%, 10%,
             | ???% depending on how the project is structured and whether
             | it uses a lot of side-effects or dynamic properties) can be
             | enough make your program crash if the assumption turns out
             | to be wrong.
             | 
             | Another example that you could go for, is can you determine
             | whether a property of an object is used or not. If you have
             | a language like JavaScript where you can do
             | `object[propertyName]`, that turns out to be very hard. In
             | a pure functional language, that is comparatively pretty
             | straightforward to detect.
        
               | wolfadex wrote:
               | The javascript example gets further compounded by doing
               | something like `object["prefix" + variable]`.
               | 
               | I've seen some very large companies abuse JS object
               | proxies so that code is nearly impossible to follow even
               | with the best IDEs and someone who's excellent at
               | grepping.
        
         | mrkeen wrote:
         | > Pure-function analysis is not that hard.
         | 
         | Awesome! My colleagues won't come to Haskell. But I can stop
         | asking them if Java can tell pure from impure.
        
       | notretarded wrote:
       | And if I have a function executed from user input or defined from
       | configuration.
        
       | pbiggar wrote:
       | In Dark, we label (either manually for stdlib, or automatically
       | for other functions) whether they are pure or not. Pure functions
       | (most of them) do not need their results stored in traces (Dark
       | has an always-on-debugger that uses "traces" saved from real
       | inputs). Making a (mostly) pure functional language enables so
       | much!
        
       | zepatrik wrote:
       | It would actually make sense to have linter rules in javascript
       | to report impure functions and avoid them as much as possible.
       | But as the article said, JS is kinda broken already as you can
       | have proxies and stuff that possibly adds side effects to the
       | most basic expressions.
        
         | jfmengels1 wrote:
         | The article author wrote this ESLint plugin to do just that
         | https://github.com/jfmengels/eslint-plugin-fp
         | 
         | then he (I) happily switched to Elm, because even with a lot of
         | static analysis rules, it's really hard to avoid impure
         | functions when a lot of the libraries and existing code use
         | that, and sometimes even require that.
        
       | brundolf wrote:
       | This is one reason I go out of my way to divide functions between
       | those that _return_ something and those that _do_ something. In
       | JS I use class methods as a shorthand for the latter:  "this will
       | probably mutate something". Plain functions, then, are relatively
       | safe to treat as pure/remove if their return value isn't used.
        
         | jfmengels1 wrote:
         | In pure functional languages (at least in Elm), what happens is
         | that you have data that describes what should be done, instead
         | of doing it directly.
         | 
         | is talk called "Effects as Data" describing that idea:
         | https://www.youtube.com/watch?v=6EdXaWfoslc
        
           | brundolf wrote:
           | I'm aware; I'm not sold on this most extreme version of pure-
           | functionalism. I think it's unintuitive, syntactically
           | clunky, and hard to reason about, despite the benefits of
           | purity. I'm a believer in the functional core/imperative
           | shell approach.
        
           | nerpderp82 wrote:
           | I often use a pattern where I do a batch and group a bunch of
           | operations baked into a tuple that describes the computation.
           | Instead of do_thing(on_data), the intermediate pass would
           | emit a data structure full of list([("operation_do_a_thing",
           | on_data), ... ] and then a command executor can do the actual
           | scheduling of the operations, serial, parallel, batched,
           | try/except with backpressure, etc.
           | 
           | Separate the thing you want done from the doing it, because
           | when you intermingle the doing-the-thing it greatly
           | complications the scheduling logic. Think of it as a working
           | programmers io monad.
        
         | Blikkentrekker wrote:
         | It seems very bad design to begin with to have a function both
         | return a value and be non-transparent in any other way than
         | what it needs in order to return said value.
         | 
         | But in some cases, this is needed for efficiency.
        
           | tsimionescu wrote:
           | I think about it the other way. If you are doing something
           | effectful, how often is it useful to also return some
           | information?
           | 
           | To me, the answer is 'extremely often', so I don't see the
           | value of what GP is proposing. It's nice to have pure
           | functions and some kind of separation, but it's not worth it
           | to give up returning values from effectful procedures.
        
             | Blikkentrekker wrote:
             | Only to return information about the effect's success,
             | which would again make it easy to see that the function
             | must have some effect, if all it returns be true or false
             | or another trivial value that could be constructed from
             | aether.
        
             | brundolf wrote:
             | It probably varies by domain - most of my deep experience
             | is on the front-end - though even outside of that, I can't
             | think of many cases outside of errors where I fundamentally
             | need to return something in tandem with a side-effect. The
             | return value is either going to pertain to the state before
             | the side-effect, the state after the side-effect, or the
             | side-effect itself. The first two could easily be broken
             | out into pure functions. The third, again, mostly comes
             | down to error reporting.
             | 
             | I guess things get more complicated when you consider I/O.
             | A function that gets a file handle or makes a GET request
             | is technically impure and mutates the state of the macro-
             | system, and yet is primarily concerned with what it
             | returns. I tend to lump these into the "plain functions"
             | bucket, but it's a gray area.
        
               | magicalhippo wrote:
               | We're in the procedural world, but a typical case we have
               | is a variation on this theme:
               | 
               | A function takes some declaration id, generates one or
               | several records based on the declaration data, commits
               | them to the database and returns the record id's so they
               | can be used in the UI or similar.
               | 
               | Either the record id's are displayed to the user
               | directly, they're used for filtering in a grid, used for
               | generating response files to external system, or
               | something else.
               | 
               | Often this would get called several times for the same
               | declaration id, usually generating new records, so it's
               | easier to have the function return the id's rather than
               | trying to reconstruct what it did.
               | 
               | Could be there's an easy way to map this to pure
               | functions, I haven't thought about it. Would be
               | interested to know though.
        
               | brundolf wrote:
               | Where my mind initially goes is to have a pure function
               | going from id -> records, and then pass those records off
               | to a separate procedure that persists them to the
               | database. The same records can then also be passed around
               | to the UI or whatever else.
               | 
               | > Often this would get called several times for the same
               | declaration id, usually generating new records, so it's
               | easier to have the function return the id's rather than
               | trying to reconstruct what it did.
               | 
               | Are the "new" records meaningfully different in some way,
               | or are they just new instances in memory? In the former
               | case, there's state somewhere that's causing each one to
               | be different (there is not a pure relationship from id ->
               | record set). In the latter case, you don't actually need
               | the new instances (unless you expect them to be mutated
               | by other code down the line and are guarding against
               | side-effects, in which case you could defensively clone
               | the pure function's result as needed).
               | 
               | Edit: it's possible I misunderstood, and that what's
               | happening is you're inserting data into a database which
               | has its own auto-incrementing IDs that it then returns to
               | you. If so, the coupling between value creation and
               | state/mutation exists in the database's API itself, and
               | so is outside of your control. There's not much you can
               | do in that case because you don't own that code.
        
       ___________________________________________________________________
       (page generated 2021-01-28 23:01 UTC)