[HN Gopher] Safe dead code removal in a pure functional language
___________________________________________________________________
Safe dead code removal in a pure functional language
Author : gbrown_
Score : 73 points
Date : 2021-01-28 13:19 UTC (9 hours ago)
(HTM) web link (jfmengels.net)
(TXT) w3m dump (jfmengels.net)
| imoverclocked wrote:
| Removing dead code is incredibly valuable. Even in time-consuming
| environments with highly effectful languages it can mean the
| difference between pulling in and maintaining old/archaic
| dependencies or not. I wish more of our static analysis tools for
| non-functional languages had better concepts of purity. There are
| so many places where it fails in languages without this concept.
| eg: Java annotations can change the behavior of a method from a
| simple getter to something else entirely with hooks to be
| used/referenced at runtime.
|
| One place I have found that static analysis fails is with API
| contexts. Either they get it wrong and offer to remove API
| elements or they get it less-wrong and just don't touch anything
| that can be exported. Granted getting it right would require all
| usages of the API or a blessed list of things we know we want to
| keep ... in which case we are potentially back at our source code
| as a source of truth anyway.
|
| Another interesting hole is "do we keep code that is only
| referenced by tests?" In some cases, it makes sense but it's
| definitely a knob that the user needs to adjust manually so far
| as I have seen.
| bjoli wrote:
| DCE is also extremely valuable when writing lisp macros. Not
| having to do DCE in the macro is saves a lot of code.
|
| There is a talk by Kent Dybvig called "the macro writer's bill
| of rights" on YouTube that goes into optimizations that help
| macro writers.
| Twisol wrote:
| > eg: Java annotations can change the behavior of a method from
| a simple getter to something else entirely with hooks to be
| used/referenced at runtime.
|
| To be slightly fair to Java, annotations don't themselves do
| anything to the annotated method. Calling a method directly is
| the same regardless of whatever annotations are on it -- of
| course, unless it or one of its upstreams looks at the
| annotations, but that's isomorphic to just passing the data
| along.
|
| Java annotations induce different behaviors in specific callers
| that are designed to take advantage of those annotations, which
| is arguably even worse -- like a meta analogue of the COMEFROM
| keyword.
| imoverclocked wrote:
| Totally agree. The annotations basically just mark code.
| Things like AspectJ can then place side-effects around marked
| (and unmarked!) code.
|
| Also, coming across transient runtime errors because you use
| multi-catch (a basic Java language feature) and some java
| library that injects bytecode is pretty lame. All the
| benefits of a typed+compiled language with all the benefits
| of monkey patching!
| UncleMeat wrote:
| Pure-function analysis is not that hard. If you want the easy
| one, it is a pretty straightforward local analysis. If you want
| the hard stuff you need a sound call graph and some fix-point
| computation over the whole program, but it is still achievable.
|
| The thing that is valuable here is the existence of pure
| functions, not necessarily the requirement. For a huge number of
| real programs written in java or whatever the exact same
| technique applies.
| amelius wrote:
| > but it is still achievable
|
| I would say it is undecidable, to be honest. For example
| if(undecidable_function()) return lots_of_code();
| else return 0;
|
| Can you remove lots_of_code() or not?
| choeger wrote:
| They didn't say dead code analysis was decidable, but purity
| analysis.
|
| Purity analysis is indeed trivially decidable, if you limit
| yourself to "probably always pure", not "pure when executed
| on a Tuesday". It boils down to "is this function calling
| only pure functions and not using any of the impure
| operators?". The trouble comes with dynamic languages where
| you usually don't know shit about a function's arguments,
| though. In that case you need a control flow Analysis of the
| whole program.
| shwestrick wrote:
| IMO one of the most valuable things I've learned about static
| analysis is that the halting problem is overly pessimistic.
| It's easy to prove very interesting things about real-world
| code.
|
| Things like dead-code removal are a pretty standard for
| modern optimizing compilers!
| AlexCoventry wrote:
| You can get 99% of the way there without solving the halting
| problem, though. :)
| jfmengels1 wrote:
| Sure, but under some assumptions. Assumptions like that
| some piece of code won't be transformed by a macro, that
| reasonably looking code like `a.b` doesn't have side
| effects, etc.
|
| It's not that it's impossible, but that 1% (or 5%, 10%,
| ???% depending on how the project is structured and whether
| it uses a lot of side-effects or dynamic properties) can be
| enough make your program crash if the assumption turns out
| to be wrong.
|
| Another example that you could go for, is can you determine
| whether a property of an object is used or not. If you have
| a language like JavaScript where you can do
| `object[propertyName]`, that turns out to be very hard. In
| a pure functional language, that is comparatively pretty
| straightforward to detect.
| wolfadex wrote:
| The javascript example gets further compounded by doing
| something like `object["prefix" + variable]`.
|
| I've seen some very large companies abuse JS object
| proxies so that code is nearly impossible to follow even
| with the best IDEs and someone who's excellent at
| grepping.
| mrkeen wrote:
| > Pure-function analysis is not that hard.
|
| Awesome! My colleagues won't come to Haskell. But I can stop
| asking them if Java can tell pure from impure.
| notretarded wrote:
| And if I have a function executed from user input or defined from
| configuration.
| pbiggar wrote:
| In Dark, we label (either manually for stdlib, or automatically
| for other functions) whether they are pure or not. Pure functions
| (most of them) do not need their results stored in traces (Dark
| has an always-on-debugger that uses "traces" saved from real
| inputs). Making a (mostly) pure functional language enables so
| much!
| zepatrik wrote:
| It would actually make sense to have linter rules in javascript
| to report impure functions and avoid them as much as possible.
| But as the article said, JS is kinda broken already as you can
| have proxies and stuff that possibly adds side effects to the
| most basic expressions.
| jfmengels1 wrote:
| The article author wrote this ESLint plugin to do just that
| https://github.com/jfmengels/eslint-plugin-fp
|
| then he (I) happily switched to Elm, because even with a lot of
| static analysis rules, it's really hard to avoid impure
| functions when a lot of the libraries and existing code use
| that, and sometimes even require that.
| brundolf wrote:
| This is one reason I go out of my way to divide functions between
| those that _return_ something and those that _do_ something. In
| JS I use class methods as a shorthand for the latter: "this will
| probably mutate something". Plain functions, then, are relatively
| safe to treat as pure/remove if their return value isn't used.
| jfmengels1 wrote:
| In pure functional languages (at least in Elm), what happens is
| that you have data that describes what should be done, instead
| of doing it directly.
|
| is talk called "Effects as Data" describing that idea:
| https://www.youtube.com/watch?v=6EdXaWfoslc
| brundolf wrote:
| I'm aware; I'm not sold on this most extreme version of pure-
| functionalism. I think it's unintuitive, syntactically
| clunky, and hard to reason about, despite the benefits of
| purity. I'm a believer in the functional core/imperative
| shell approach.
| nerpderp82 wrote:
| I often use a pattern where I do a batch and group a bunch of
| operations baked into a tuple that describes the computation.
| Instead of do_thing(on_data), the intermediate pass would
| emit a data structure full of list([("operation_do_a_thing",
| on_data), ... ] and then a command executor can do the actual
| scheduling of the operations, serial, parallel, batched,
| try/except with backpressure, etc.
|
| Separate the thing you want done from the doing it, because
| when you intermingle the doing-the-thing it greatly
| complications the scheduling logic. Think of it as a working
| programmers io monad.
| Blikkentrekker wrote:
| It seems very bad design to begin with to have a function both
| return a value and be non-transparent in any other way than
| what it needs in order to return said value.
|
| But in some cases, this is needed for efficiency.
| tsimionescu wrote:
| I think about it the other way. If you are doing something
| effectful, how often is it useful to also return some
| information?
|
| To me, the answer is 'extremely often', so I don't see the
| value of what GP is proposing. It's nice to have pure
| functions and some kind of separation, but it's not worth it
| to give up returning values from effectful procedures.
| Blikkentrekker wrote:
| Only to return information about the effect's success,
| which would again make it easy to see that the function
| must have some effect, if all it returns be true or false
| or another trivial value that could be constructed from
| aether.
| brundolf wrote:
| It probably varies by domain - most of my deep experience
| is on the front-end - though even outside of that, I can't
| think of many cases outside of errors where I fundamentally
| need to return something in tandem with a side-effect. The
| return value is either going to pertain to the state before
| the side-effect, the state after the side-effect, or the
| side-effect itself. The first two could easily be broken
| out into pure functions. The third, again, mostly comes
| down to error reporting.
|
| I guess things get more complicated when you consider I/O.
| A function that gets a file handle or makes a GET request
| is technically impure and mutates the state of the macro-
| system, and yet is primarily concerned with what it
| returns. I tend to lump these into the "plain functions"
| bucket, but it's a gray area.
| magicalhippo wrote:
| We're in the procedural world, but a typical case we have
| is a variation on this theme:
|
| A function takes some declaration id, generates one or
| several records based on the declaration data, commits
| them to the database and returns the record id's so they
| can be used in the UI or similar.
|
| Either the record id's are displayed to the user
| directly, they're used for filtering in a grid, used for
| generating response files to external system, or
| something else.
|
| Often this would get called several times for the same
| declaration id, usually generating new records, so it's
| easier to have the function return the id's rather than
| trying to reconstruct what it did.
|
| Could be there's an easy way to map this to pure
| functions, I haven't thought about it. Would be
| interested to know though.
| brundolf wrote:
| Where my mind initially goes is to have a pure function
| going from id -> records, and then pass those records off
| to a separate procedure that persists them to the
| database. The same records can then also be passed around
| to the UI or whatever else.
|
| > Often this would get called several times for the same
| declaration id, usually generating new records, so it's
| easier to have the function return the id's rather than
| trying to reconstruct what it did.
|
| Are the "new" records meaningfully different in some way,
| or are they just new instances in memory? In the former
| case, there's state somewhere that's causing each one to
| be different (there is not a pure relationship from id ->
| record set). In the latter case, you don't actually need
| the new instances (unless you expect them to be mutated
| by other code down the line and are guarding against
| side-effects, in which case you could defensively clone
| the pure function's result as needed).
|
| Edit: it's possible I misunderstood, and that what's
| happening is you're inserting data into a database which
| has its own auto-incrementing IDs that it then returns to
| you. If so, the coupling between value creation and
| state/mutation exists in the database's API itself, and
| so is outside of your control. There's not much you can
| do in that case because you don't own that code.
___________________________________________________________________
(page generated 2021-01-28 23:01 UTC)