https://papl.cs.brown.edu/2018/func-as-data.html#%28part._.A_.Little_.Calculus%29

V  Programming and Programming Languages
 1  Introduction
 2  Basic Data and Expressions
 3  From Repeated Expressions to Functions
 4  Introduction to Tabular Data
 5  From Tables to Lists
 6  Processing Lists
 7  Introduction to Structured Data
 8  Collections of Structured Data
 9  Recursive Data
10  Tree-Shaped Data [EMPTY]
11  Interactive Games as Reactive Systems
12  Examples, Testing, and Program Checking
13  Functions as Data
14  Predicting Growth
15  Sets Appeal
16  [EMPTY]
17  Halloween Analysis
18  Sharing and Equality
19  Graphs
20  State, Change, and More Equality
21  Algorithms That Exploit State
22  [EMPTY]
23  Processing Programs:   Parsing
24  Processing Programs:   A First Look at Interpretation
25  Interpreting Conditionals
26  Interpreting Functions
27  Reasoning about Programs:   A First Look at Types
28  Safety and Soundness
29  Parametric Polymorphism
30  Type Inference
31  Mutation:   Structures and Variables
32  Objects:   Interpretation and Types
33  Control Operations
34  Pyret for Racketeers and Schemers
35  Glossary
> 13  Functions as Data
13.1  A Little Calculus
13.2  A Helpful Shorthand for Anonymous Functions
13.3  Streams From Functions
13.4  Combining Forces:   Streams of Derivatives
On this page:
13.1 A Little Calculus
13.2 A Helpful Shorthand for Anonymous Functions
13.3 Streams From Functions
13.4 Combining Forces:   Streams of Derivatives
    - prev  up  next - 

13 Functions as Data

    13.1 A Little Calculus

    13.2 A Helpful Shorthand for Anonymous Functions

    13.3 Streams From Functions

    13.4 Combining Forces: Streams of Derivatives


It's interesting to consider how expressive the little programming
we've learned so far can be. To illustrate this, we'll work through a
few exercises of interesting concepts we can express using just
functions as values. We'll write two quite different things, then
show how they converge nicely.

13.1 A Little Calculus

If you've studied the differential calculus, you've come across
curious sytactic statements such as this:

\begin{equation*}{\frac{d}{dx}} x^2 = 2x\end{equation*}

Let's unpack what this means: the \(d/dx\), the \(x^2\), and the \(2x
\).

First, let's take on the two expressions; we'll discuss one, and the
discussion will cover the other as well. The correct response to
"what does \(x^2\) mean?" is, of course, an error: it doesn't mean
anything, because \(x\) is an unbound identifier.

So what is it intended to mean? The intent, clearly, is to represent
the function that squares its input, just as \(2x\) is meant to be
the function that doubles its input. We have nicer ways of writing
those:

fun square(x :: Number) -> Number: x * x end
fun double(x :: Number) -> Number: 2 * x end


and what we're really trying to say is that the \(d/dx\) (whatever
that is) of square is double.We're assuming functions of arity one in
the variable that is changing.

So now let's unpack \(d/dx\), starting with its type. As the above
example illustrates, \(d/dx\) is really a function from functions to
functions. That is, we can write its type as follows:

d-dx :: ((Number -> Number) -> (Number -> Number))


(This type might explain why your calculus course never explained
this operation this way--though it's not clear that obscuring its true
meaning is any better for your understanding.)

Let us now implement d-dx. We'll implement numerical differentiation,
though in principle we could also implement symbolic differentiation--
using rules you learned, e.g., given a polynomial, multiply by the
exponent and reduce the exponent by one--with a representation of
expressions (Representing Arithmetic).

In general, numeric differentiation of a function at a point yields
the value of the derivative at that point. We have a handy formula
for it: the derivative of \(f\) at \(x\) is

\begin{equation*}\frac{f(x + \epsilon) - f(x)}{\epsilon}\end
{equation*}

as \(\epsilon\) goes to zero in the limit. For now we'll give the
infinitesimal a small but fixed value, and later [Combining Forces:
Streams of Derivatives] see how we can improve on this.

epsilon = 0.001


Let's now try to translate the above formula into Pyret:

fun d-dx(f :: (Number -> Number)) -> (Number -> Number):
  (f(x + epsilon) - f(x)) / epsilon
end


    Do Now!

        What's the problem with the above definition?

If you didn't notice, Pyret will soon tell you: x isn't bound.
Indeed, what is x? It's the point at which we're trying to compute
the numeric derivative. That is, d-dx needs to return not a number
but a function (as the type indicates) that will consume this x:

fun d-dx(f :: (Number -> Number)) -> (Number -> Number):
  lam(x :: Number) -> Number:
    (f(x + epsilon) - f(x)) / epsilon
  end
end


Sure enough, this definition now works. We can, for instance, test it
as follows (note the use of num-floor to avoid numeric precision
issues from making our tests appear to fail):

d-dx-square = d-dx(square)

check:
  ins = [list: 0, 1, 10, 100]
  for map(n from ins):
    num-floor(d-dx-square(n))
  end
  is
  for map(n from ins):
    num-floor(double(n))
  end
end


Now we can return to the original example that launched this
investigation: what the sloppy and mysterious notation of math is
really trying to say is,

d-dx(lam(x): x * x end) = lam(x): 2 * x end


or, in the notation of A Notation for Functions,

\begin{equation*}{\frac{d}{dx}} [x \rightarrow x^2] = [x \rightarrow
2x]\end{equation*}

Pity math textbooks for not wanting to tell us the truth!

13.2 A Helpful Shorthand for Anonymous Functions

Pyret offers a shorter syntax for writing anonymous functions.
Though, stylistically, we generally avoid it so that our programs
don't become a jumble of special characters, sometimes it's
particularly convenient, as we will see below. This syntax is

{(a): b}


where a is zero or more arguments and b is the body. For instance, we
can write lam(x): x * x end as

{(x): x * x}


where we can see the benefit of brevity. In particular, note that
there is no need for end, because the braces take the place of
showing where the expression begins and ends. Similarly, we could
have written d-dx as

fun d-dx-short(f):
  {(x): (f(x + epsilon) - f(x)) / epsilon}
end


but many readers would say this makes the function harder to read,
because the prominent lam makes clear that d-dx returns an
(anonymous) function, whereas this syntax obscures it. Therefore, we
will usually only use this shorthand syntax for "one-liners".

13.3 Streams From Functions

People typically think of a function as serving one purpose: to
parameterize an expression. While that is both true and the most
common use of a function, it does not justify having a function of no
arguments, because that clearly parameterizes over nothing at all.
Yet functions of no argument also have a use, because functions
actually serve two purposes: to parameterize, and to suspend
evaluation of the body until the function is applied. In fact, these
two uses are orthogonal, in that one can employ one feature without
the other. In Sugaring Over Anonymity we see one direction of this:
parameterized functions that are used immediately, so that we employ
only abstraction and not delay. Below, we will see the other: delay
without abstraction.

Let's consider the humble list. A list can be only finitely long.
However, there are many lists (or sequences) in nature that have no
natural upper bound: from mathematical objects (the sequence of
natural numbers) to natural ones (the sequence of hits to a Web
site). Rather than try to squeeze these unbounded lists into bounded
ones, let's look at how we might represent and program over these
unbounded lists.

First, let's write a program to compute the sequence of natural
numbers:

fun nats-from(n):
  link(n, nats-from(n + 1))
end


    Do Now!

        Does this program have a problem?

While this represents our intent, it doesn't work: running it--e.g., 
nats-from(0)--creates an infinite loop evaluating nats-from for every
subsequent natural number. In other words, we want to write something
very like the above, but that doesn't recur until we want it to,
i.e., on demand. In other words, we want the rest of the list to be 
lazy.

This is where our insight into functions comes in. A function, as we
have just noted, delays evaluation of its body until it is applied.
Therefore, a function would, in principle, defer the invocation of 
nats-from(n + 1) until it's needed.

Except, this creates a type problem: the second argument to link
needs to be a list, and cannot be a function. Indeed, because it must
be a list, and every value that has been constructed must be finite,
every list is finite and eventually terminates in empty. Therefore,
we need a new data structure to represent the links in these lazy
lists (also known as streams):
<stream-type-def> ::=
  data Stream<T>:

    | lz-link(h :: T, t :: ( -> Stream<T>))

  end

where the annotation ( -> Stream<T>) means a function from no
arguments (hence the lack of anything before ->), also known as a 
thunk. Note that the way we have defined streams they must be
infinite, since we have provided no way to terminate them.

Let's construct the simplest example we can, a stream of constant
values:

ones = lz-link(1, lam(): ones end)


Pyret will actually complain about this definition. Note that the
list equivalent of this also will not work:

ones = link(1, ones)


because ones is not defined at the point of definition, so when Pyret
evaluates link(1, ones), it complains that ones is not defined.
However, it is being overly conservative with our former definition:
the use of ones is "under a lam", and hence won't be needed until
after the definition of ones is done, at which point ones will be
defined. We can indicate this to Pyret by using the keyword rec:

rec ones = lz-link(1, lam(): ones end)


To understand more about recursive definitions, see Recursive
Functions. Note that in Pyret, every fun implicitly has a rec beneath
it, which is why we can create recursive functions with aplomb.

    Exercise

        Earlier we said that we can't write

        ones = link(1, ones)


        What if we tried to write

        rec ones = link(1, ones)


        instead? Does this work and, if so, what value is ones bound
        to? If it doesn't work, does it fail to work for the same
        reason as the definition without the rec?

Henceforth, we will use the shorthand [A Helpful Shorthand for
Anonymous Functions] instead. Therefore, we can rewrite the above
definition as:

rec ones = lz-link(1, {(): ones})


Notice that {(): ...} defines an anonymous function of no arguments.
You can't leave out the ()! If you do, Pyret will get confused about
what your program means.

Because functions are automatically recursive, when we write a
function to create a stream, we don't need to use rec. Consider this
example:

fun nats-from(n :: Number):
  lz-link(n, {(): nats-from(n + 1)})
end


with which we can define the natural numbers:

nats = nats-from(0)


Note that the definition of nats is not recursive itself--the
recursion is inside nats-from--so we don't need to use rec to define 
nats.

    Do Now!

        Earlier, we said that every list is finite and hence
        eventually terminates. How does this remark apply to streams,
        such as the definition of ones or nats above?

The description of ones is still a finite one; it simply represents
the potential for an infinite number of values. Note that:

 1. A similar reasoning doesn't apply to lists because the rest of
    the list has already been constructed; in contrast, placing a
    function there creates the potential for a potentially unbounded
    amount of computation to still be forthcoming.

 2. That said, even with streams, in any given computation, we will
    create only a finite prefix of the stream. However, we don't have
    to prematurely decide how many; each client and use is welcome to
    extract less or more, as needed.

Now we've created multiple streams, but we still don't have an easy
way to "see" one. First we'll define the traditional list-like
selectors. Getting the first element works exactly as with lists:

fun lz-first<T>(s :: Stream<T>) -> T: s.h end


In contrast, when trying to access the rest of the stream, all we get
out of the data structure is a thunk. To access the actual rest, we
need to force the thunk, which of course means applying it to no
arguments:

fun lz-rest<T>(s :: Stream<T>) -> Stream<T>: s.t() end


This is useful for examining individual values of the stream. It is
also useful to extract a finite prefix of it (of a given size) as a
(regular) list, which would be especially handy for testing. Let's
write that function:

fun take<T>(n :: Number, s :: Stream<T>) -> List<T>:
  if n == 0:
    empty
  else:
    link(lz-first(s), take(n - 1, lz-rest(s)))
  end
end


If you pay close attention, you'll find that this body is not defined
by cases over the structure of the (stream) input--instead, it's
defined by the cases of the definition of a natural number (zero or a
successor). We'll return to this below (<lz-map2-def>).

Now that we have this, we can use it for testing. Note that usually
we use our data to test our functions; here, we're using this
function to test our data:

check:
  take(10, ones) is map(lam(_): 1 end, range(0, 10))
  take(10, nats) is range(0, 10)
  take(10, nats-from(1)) is map((_ + 1), range(0, 10))
end


The notation (_ + 1) defines a Pyret function of one argument that
adds 1 to the given argument.

Let's define one more function: the equivalent of map over streams.
For reasons that will soon become obvious, we'll define a version
that takes two lists and applies the first argument to them
pointwise:
<lz-map2-def> ::=
  fun lz-map2<A, B, C>(

      f :: (A, B -> C),

      s1 :: Stream<A>,

      s2 :: Stream<B>):

    lz-link(

      f(lz-first(s1), lz-first(s2)),

      {(): lz-map2(f, lz-rest(s1), lz-rest(s2))})

  end

Now we can see our earlier remark about the structure of the function
driven home especially clearly. Whereas a traditional map over lists
would have two cases, here we have only one case because the data
definition (<stream-type-def>) has only one case! What is the
consequence of this? In a traditional map, one case looks like the
above, but the other case corresponds to the empty input, for which
it produces the same output. Here, because the stream never
terminates, mapping over it doesn't either, and the structure of the
function reflects this.This raises a much subtler problem: if the
function's body doesn't have base- and inductive-cases, how can we
perform an inductive proof over it? The short answer is we can't: we
must instead use  coinduction.

Why did I define lz-map2 instead of lz-map? Because it enables us to
write the following:

rec fibs =
  lz-link(0,
    {(): lz-link(1,
          {(): lz-map2({(a :: Number, b :: Number): a + b},
                fibs,
            lz-rest(fibs))})})


from which, of course, we can extract as many Fibonacci numbers as we
want!

check:
  take(10, fibs) is [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
end


    Exercise

        Define the equivalent of map, filter, and fold for streams.

Streams and, more generally, infinite data structures that unfold on
demand are extremely valuable in programming. Consider, for instance,
the possible moves in a game. In some games, this can be infinite;
even if it is finite, for interesting games the combinatorics mean
that the tree is too large to feasibly store in memory. Therefore,
the programmer of the computer's intelligence must unfold the game
tree on demand. Programming it by using the encoding we have
described above means the program describes the entire tree, lazily,
and the tree unfolds automatically on demand, relieving the
programmer of the burden of implementing such a strategy.

In some languages, such as Haskell, lazy evaluation is built in by
default. In such a language, there is no need to use thunks. However,
lazy evaluation places other burdens on the language [REF].

13.4 Combining Forces: Streams of Derivatives

When we defined d-dx, we set epsilon to an arbitrary, high value. We
could instead think of epsilon as itself a stream that produces
successively finer values; then, for instance, when the difference in
the value of the derivative becomes small enough, we can decide we
have a sufficient approximation to the derivative.

The first step is, therefore, to make epsilon some kind of parameter
rather than a global constant. That leaves open what kind of
parameter it should be (number or stream?) as well as when it should
be supplied.

It makes most sense to consume this parameter after we have decided
what function we want to differentiate and at what value we want its
derivative; after all, the stream of epsilon values may depend on
both. Thus, we get:

fun d-dx(f :: (Number -> Number)) ->
    (Number -> (Number -> Number)):
  lam(x :: Number) -> (Number -> Number):
    lam(epsilon :: Number) -> Number:
      (f(x + epsilon) - f(x)) / epsilon
    end
  end
end


with which we can return to our square example:

d-dx-square = d-dx(square)


Note that at this point we have simply redefined d-dx without any
reference to streams: we have merely made a constant into a
parameter.

Now let's define the stream of negative powers of ten:

tenths = block:
  fun by-ten(d):
    new-denom = d / 10
    lz-link(new-denom, lam(): by-ten(new-denom) end)
  end
  by-ten(1)
end


so that

check:
  take(3, tenths) is [list: 1/10, 1/100, 1/1000]
end


For concreteness, let's pick an abscissa at which to compute the
numeric derivative of square--say 10:

d-dx-square-at-10 = d-dx-square(10)


Recall, from the types, that this is now a function of type (Number
-> Number): given a value for epsilon, it computes the derivative
using that value. We know, analytically, that the value of this
derivative should be 20. We can now (lazily) map tenths to provide
increasingly better approximations for epsilon and see what happens:

lz-map(d-dx-square-at-10, tenths)


Sure enough, the values we obtain are 20.1, 20.01, 20.001, and so on:
progressively better numerical approximations to 20.

    Exercise

        Extend the above program to take a tolerance, and draw as
        many values from the epsilon stream as necessary until the
        difference between successive approximations of the
        derivative fall within this tolerance.

    - prev  up  next -