\count100= 1
\count101= 37

\input satmacros.tex
%\draft  \def\updated{November 2, 2004}
   %% This needs to be defined.


\title{The Weierstrass Approximation Theorems}
\author{Allan Pinkus}

\def\shorttitle{The Weierstrass Approximation Theorems}
\def\shortauthor{Allan Pinkus}


\def\alp{\alpha}                \def\Alp{\Alpha}
\def\bet{\beta}
\def\gam{\gamma}                \def\Gam{\Gamma}
\def\del{\delta}                \def\Del{\Delta}
\def\eps{\varepsilon}
\def\tet{\theta}                \def\Tet{\Theta}
\def\lam{\lambda}               \def\Lam{\Lambda}
\def\sig{\sigma}                \def\Sig{\Sigma}
\def\ome{\omega}                \def\Ome{\Omega}
\def\rchi{\raise 2pt\hbox{$\chi$}}

\def\bfa{{\bf a}}               \def\bfA{{\bf A}}
\def\bfb{{\bf b}}               \def\bfB{{\bf B}}
\def\bfc{{\bf c}}               \def\bfC{{\bf C}}
\def\bfd{{\bf d}}               \def\bfD{{\bf D}}
\def\bfe{{\bf e}}               \def\bfE{{\bf E}}
\def\bff{{\bf f}}               \def\bfF{{\bf F}}
\def\bfg{{\bf g}}               \def\bfG{{\bf G}}
\def\bfh{{\bf h}}               \def\bfH{{\bf H}}
\def\bfi{{\bf i}}               \def\bfI{{\bf I}}
\def\bfj{{\bf j}}               \def\bfJ{{\bf J}}
\def\bfk{{\bf k}}               \def\bfK{{\bf K}}
\def\bfl{{\bf l}}               \def\bfL{{\bf L}}
\def\bfm{{\bf m}}               \def\bfM{{\bf M}}
\def\bfn{{\bf n}}               \def\bfN{{\bf N}}
\def\bfo{{\bf o}}               \def\bfO{{\bf O}}
\def\bfp{{\bf p}}               \def\bfP{{\bf P}}
\def\bfq{{\bf q}}               \def\bfQ{{\bf Q}}
\def\bfr{{\bf r}}               \def\bfR{{\bf R}}
\def\bfs{{\bf s}}               \def\bfS{{\bf S}}
\def\bft{{\bf t}}               \def\bfT{{\bf T}}
\def\bfu{{\bf u}}               \def\bfU{{\bf U}}
\def\bfv{{\bf v}}               \def\bfV{{\bf V}}
\def\bfw{{\bf w}}               \def\bfW{{\bf W}}
\def\bfx{{\bf x}}               \def\bfX{{\bf X}}
\def\bfy{{\bf y}}               \def\bfY{{\bf Y}}
\def\bfz{{\bf z}}               \def\bfZ{{\bf Z}}

\def\CC{{\rlap {\raise 0.4ex \hbox{$\scriptscriptstyle |$}}
\hskip -0.1em C}}
\def\FF{{I\!\!F}}
\def\NN{{I\!\!N}}
\def\PP{{I\hskip-2pt P}}
\def\QQ{{\rlap {\raise 0.4ex \hbox{$\scriptscriptstyle |$}}
\hskip -0.1em Q}}
\def\RR{{I\!\!R}}
\def\ZZ{{Z\!\!\! Z}}
\def\AA{{\hskip -3pt  A}}

\def\all{\forall}
\def\func#1#2#3{#1\colon \,#2\rightarrow #3} %functio:from...to...
\def\incl{\subseteq}
\def\isom{\cong}
\def\nek{,\ldots,}
\def\onto{\mapsto}
\def\union{\bigcup}

\def\sqr#1#2{{\vcenter{\hrule height.#2pt\hbox{\vrule
width.#2pt height#1pt \kern#1pt \vrule width.#2pt}\hrule
height.#2pt}}}
\def\square{\mathchoice\sqr56\sqr56\sqr{3.2}3\sqr{2.3}3}
\def\span{{\rm span}}
\def\\{{\backslash}}
\def\tilC{{\widetilde C}}
\def\tilf{{\widetilde f}}
\def\oA{{\overline A}}

\def\W{{Weierstrass}}

\overfullrule=0pt

\abstract {This is a survey of the {\W} Approximation
Theorems and their various proofs.}

\sect{Introduction}
This survey is about the Weierstrass Approximation Theorems. We
consider these results within a historical context and also
discuss in detail many of the subsequent proofs. This is a shorter
version of the paper Pinkus [2000] with some alterations.

The Weierstrass Approximation Theorems are two theorems that
Weierstrass (1815--1897) published in 1885 in Weierstrass [1885]
when he was 70 years old. They prove the density of algebraic
polynomials in the space of continuous real-valued functions on a
finite interval in the uniform norm, and the density of
trigonometric polynomials in the space of $2\pi$-periodic
continuous real-valued functions on $\RR$ in the uniform norm.
These theorems did not arise from nowhere. They were born within a
historical context and it is of some interest to try to understand
their origins and their impact.

It has been said that two main themes stand out in {\W}' work. The first
is called the {\it arithmetization of analysis}. This was a program to
separate the calculus from geometry and to provide it with a proper
solid analytic foundation. Providing a logical basis for the real numbers,
for functions and for calculus was a necessary stage in the development of
analysis. {\W} was one of the leaders of this movement in his lectures
and in his papers. He not only brought a new standard of rigour to his
own mathematics, but attempted to do the same to much of mathematical
analysis.

The second theme which is everpresent in {\W}' work is that of
power series (and function series). {\W} is said to have stated
that his own work in analysis was ``nothing but power series'',
see Bell [1936, p.~462]. In fact we will see how Weierstrass
perceived his approximation theorems as theorems on convergent
series. These approximation theorems were also a counterbalance to
{\W}' famous example of a continuous nowhere differentiable
function. It is a generally accepted fact that the existence of
continuous nowhere differentiable functions was known and lectured
upon by {\W} in 1861. The approximation theorems are in a sense
its converse. Every continuous function on $\RR$ is a limit not
only of infinitely differentiable or even analytic functions, but
in fact of polynomials. Furthermore, this limit is uniform if we
restrict the approximation to any finite interval. Thus the set of
continuous functions contains very, very non-smooth functions, but
they can each be approximated arbitrarily well by the ultimate in
smooth functions. It is this dichotomy which very much lies at the
heart of approximation theory.


\sect{The Fundamental Theorems of Approximation Theory}
In this section we review the contents of {\W}' [1885] and
its variants. We first fix some notation. $C(\RR)$ will denote the
class of continuous real-valued functions on all of $\RR$, $C[a,b]$,
$-\infty<a<b<\infty$, the class of continuous real-valued functions on
the closed interval $[a,b]$, and $\tilC[a,b]$ the class of functions in
$C[a,b]$ satisfying $f(a)=f(b)$. ($\tilC[a,b]$ may, and sometimes
should, be considered as the restriction to $[a,b]$ of functions in
$C(\RR)$ which are $(b-a)$-periodic.)

The paper stating and proving what we, in approximation theory, call
``the'' Weierstrass theorems, i.e., those that prove the density of
algebraic polynomials in the space $C[a,b]$ (for every
$-\infty<a<b<\infty$) and trigonometric polynomials in $\tilC[0,2\pi]$,
is Weierstrass [1885]. It was published when {\W} was 70 years
old!! This is one paper, but it appeared in two parts. It seems that
the significance of the paper was immediately appreciated, as the
paper appeared in translation (in French) one year later in {\W}
[1886]. Again it was published in two parts under the same title
(but in different issues, which is somewhat confusing). The paper was
``reprinted'' in {\W}' collected works (Mathematische Werke). It
is contained in Volume 3 that appeared in 1903, although parts of
Volume 3 including, it seems, this paper, were edited by {\W} himself a
few years previously. Here the two parts do appear as one paper. In
addition, some changes were made. A half page was added at the
beginning, ten pages of material were appended to the end of the paper,
and some other minor changes were made. We will return to these
additions later.

{\W} had an abiding interest in complex function theory and in
representing functions by power series. The results he obtained in
this paper should definitely be viewed from that perspective. In
fact the title of this paper emphasizes this viewpoint. The paper
is titled {\it On the possibility of giving an analytic
representation to an arbitrary function of a real variable}. In
this section we review what {\W} did in this paper.

{\W} starts his original paper with the statement
that if $f$ is continuous and bounded on all of $\RR$ then, as is known,
$$\lim_{k\to 0^+} {1\over {k\sqrt{\pi}}} \int_{-\infty}^\infty f(u)
e^{-({{u-x}\over k})^2} \dd u = f(x). $$ He then immediately notes
that this may be generalized to any kernel $\psi$ that is
continuous, nonnegative, integrable and even on $\RR$. For such
$\psi$ he sets
$$F(x, k) = {1\over {2k\omega}} \int_{-\infty}^\infty f(u)
\psi\left({{u-x}\over k}\right)\dd u,$$ where
$$\omega =\int_0^\infty \psi(x)\dd x,$$
and proves that
$$\lim_{k\to 0^+} F(x, k) = f(x)$$
for each $x$. He not only proves pointwise convergence, but also {\it uniform
convergence on any finite interval}. The proof is standard. We will not
repeat it here. {\W} also notes that there are entire $\psi$, as above,
for which $F(\cdot, k)$ is entire for every $k>0$. He explicitly states
that $\psi(x)=e^{-x^2}$ is an example thereof. The consequence of the
above is the following.

\proclaim Theorem A.  Let $f$ be continuous and bounded on $\RR$. Then
there exists a sequence of entire functions $F(x,k)$ (as functions of $x$
for each positive $k$) such that for each $x$
$$\lim_{k\to 0^+} F(x, k) = f(x).$$

\smallskip
{\W} seems very much taken with this result that every bounded
continuous function on $\RR$ is a pointwise limit of entire functions.
In fact he prefaces Theorem A with the statement that this theorem ``strikes
me as remarkable and fruitful''. For unknown reasons this sentence,
and only this sentence, was deleted from the paper when it was
reprinted in {\W}' Mathematische Werke.

As mentioned, on any finite interval, one may obtain uniform
convergence. Furthermore, since $F(\cdot,k)$ is entire, the truncated
power series of $F(\cdot,k)$ uniformly converges to $F(\cdot,k)$ on any
finite interval. Each of the above statements is easily proved and gives:

\proclaim Theorem B. Let $f$ be continuous and bounded on $\RR$. Given
a finite interval $[a,b]$ and an $\eps>0$, there exists an algebraic
polynomial $p$ for which
$$|f(x)-p(x)|<\eps$$
for all $x\in [a,b]$.\nopf

Throughout the first part of {\W} [1885] and for much of the second part,
{\W} is concerned with functions defined on all of $\RR$. However
later in the second part he does
note that given any $f\in C[a,b]$, $-\infty<a<b<\infty$, we can define $f$
to equal $f(a)$ on $(-\infty, a)$, and to equal $f(b)$ on $(b, \infty)$.
We can then apply the above Theorem B to obtain what is technically
never explicitly stated, but nonetheless very implicitly stated, and
what is today considered as the main result of this paper.

\proclaim Fundamental Theorem of Approximation Theory.
Let $f\in C[a,b]$ where $-\infty<a<b<\infty$. Given $\eps>0$,
there exists an algebraic polynomial $p$ for which
$$|f(x)-p(x)|<\eps$$
for all $x\in [a,b]$.\nopf

Returning to {\W} [1885], and bounded $f\in C(\RR)$, {\W} considers
two sequences of positive values
$\{c_n\}$ and $\{\eps_n\}$, for which $\lim_{n\to\infty}
c_n=\infty$, and $\sum_{n=1}^\infty \eps_n<\infty$. From Theorem B it
follows that for $f$ as above there exists a polynomial $p_n$ such that
$$|f(x)-p_n(x)|< \eps_n$$
on $[-c_n,c_n]$.

Set $q_0=p_1$ and $q_m=p_{m+1}-p_m$, $m=1,2,\ldots$ . Then
$$\sum_{m=0}^n q_m(x)=p_{n+1}(x)$$
and, thus, in a pointwise sense
$$f(x)=\sum_{m=0}^\infty  q_m(x).\eqno(2.1)$$
Furthermore, let $[a,b]$ be a finite interval. Then for all
$m$ sufficiently large
$$|f(x)-p_m(x)|< \eps_m$$
for all $x\in [a,b]$, implying also
$$|q_m(x)| < \eps_m + \eps_{m+1}$$
for all $x\in [a,b]$. Thus for some $M$
$$\sum_{m=M}^\infty |q_m(x)| < 2\sum_{m=M}^\infty \eps_m$$
for all $x\in [a,b]$ and the series
$$\sum_{m=0}^\infty q_m(x)$$
therefore converges absolutely and uniformly to $f$ on $[a,b]$. This
{\W} states as Theorem C. That is,

\proclaim Theorem C. Let $f$ be continuous and bounded on $\RR$. Then
$f$ may be represented, in  many ways, by an infinite series of
polynomials. This series converges absolutely for every value of $x$,
and uniformly in every finite interval.\nopf

{\W} and subsequent authors would often phrase or rephrase these
approximation or density results (in this case Theorem B) in terms
of infinite series. It was only many years later that this
equivalent form went out of fashion. In fact such a phrasing was
at the time significant. One should also recall that it was only a
few years earlier that du Bois-Reymond had constructed a
continuous function whose Fourier series diverged at a point, see
du Bois-Reymond [1876]. {\W}' theorem was considered by many,
including {\W} himself, to be a ``representation theorem''. The
theorem was seen as a means of reconciling the ``analytic'' and
``synthetic'' viewpoints that had divided late 19th century
mathematics, see Gray [1984] and also Siegmund-Schultze [1988].
Much of the remaining parts of {\W} [1885] is concerned with the
construction (in some sense) of a good polynomial approximant or a
good representation for $f$ (as in (2.1)). {\W} was well aware
that he could not possibly construct a good power series
representation for $f$, but he did find, in some sense, a
reasonable expansion of $f$ in terms of Legendre polynomials.

In the latter part of {\W} [1885], {\W} proves the density
of trigonometric polynomials in $\tilC[0,2\pi]$.
His proof is interesting and proceeds as follows using complex function theory.

Let $\psi$ be an entire function that is
nonnegative, integrable and even on $\RR$ and has the following
property. Given an $f\in \tilC[0,2\pi]$, the functions
$$F(z, k) = {1\over {2k\omega}} \int_{-\infty}^\infty f(u)
\psi\left({{u-z}\over k}\right)\dd u,$$ where
$$\omega =\int_0^\infty \psi(x)\dd x,$$
are entire for each $k>0$ (as a function of $z\in \CC$) and satisfy
$$\lim_{k\to 0^+} F(x, k) = f(x)$$
uniformly on $[0,2\pi]$. {\W} notes that such functions $\psi$ exist,
e.g., $\psi(u)= e^{-u^2}$.

Since $f$ is $2\pi$-periodic so is $F$, i.e.,
$$F(z+2\pi,k)= F(z,k)$$
for all $z\in \CC$ and $k>0$. For each fixed $k>0$, set
$$G(z, k) = F({{\log z}\over i}, k).$$
In general, since $\log z$ is a multiple-valued function, $G$ would also
be a multiple-valued function. However from the $2\pi$-periodicity of
$F$, it follows that $G$ is single-valued and thus is an analytic function
on $\CC \\ \{0\}$. Consequently, $G$ has a Laurent series expansion of the
form
$$G(z, k) = \sum_{n=-\infty}^{\infty} c_{n,k} z^n$$
which converges absolutely and uniformly to $G$ on every domain bounded
away from $0$ and $\infty$. We will consider this expansion on the unit
circle $|z|=1$. Setting $z=e^{ix}$, it follows that
$$F(x,k) = \sum_{n=-\infty}^{\infty} c_{n,k} e^{inx}$$
where the series converges absolutely and uniformly to $F(x,k)$ for all
real $x$. (In fact, it may be shown that if $\psi(u)= e^{-u^2}$, then
$c_{n,k} = c_n e^{-n^2 k^2/4}$, where the $\{c_n\}$ are the Fourier
coefficients of $f$.) In other words, {\W} has given a proof of
the fact that for $F(x, k)$ $2\pi$-periodic and entire, its Fourier
series converges absolutely and uniformly to $F(x, k)$ on $\RR$. We now
truncate this series to get an arbitrarily good approximant to $F(x,k)$
which itself, by a suitable choice of $k$, was an arbitrary good
approximant to $f$. The truncated series is a trigonometric polynomial.
This completes {\W}' proof, the result of which we now formally state.

\proclaim Second Fundamental Theorem of Approximation Theory.
Let $f\in \tilC[0,2\pi]$. Given $\eps>0$,
there exists a trigonometric polynomial $t$ for which
$$|f(x)-t(x)|<\eps$$
for all $x\in [0, 2\pi]$.\nopf

As we stated at the beginning of this section, when {\W} [1885]
was reprinted in {\W}' Mathematische Werke there were two
notable additions. These are of interest and worth mentioning. We
recall that while this reprint appeared in 1903 there is reason to
assume that {\W} himself edited this paper.

The first addition was a short (half page) ``introduction''. We quote it
(verbatim in meaning if not in fact).

\noindent
{\it The main result of this paper, restricted to the one variable case,
can be summarized as follows:

Let $f\in C(\RR)$. Then there exists a sequence $f_1, f_2, \ldots$ of
entire functions for which
$$f(x)=\sum_{i=1}^\infty f_i(x)$$
for each $x\in \RR$. In addition the convergence of the above sum is
uniform on every finite interval.}

We can assume that this is the emphasis which {\W} wished to give his
paper. It is a repeat of Theorem C (although the boundedness
condition on $f$ seems to have been overlooked)
and curiously without mention of the fact that the
$f_i$ may be assumed to be polynomials.

The second addition is 10 pages appended to the end of the paper. In these
10 pages {\W} shows how to extend the results of this paper (or, to be
more precise, the results concerning algebraic polynomials) to
approximating continuous functions of several variables. He does this by
setting $F(x_1\nek x_n, k)$ equal to
$$  {1\over {2^nk^n\omega^n}} \int_{-\infty}^\infty
\!\cdots \!\int_{-\infty}^\infty \!\!\!f(u_1\nek u_n) \psi({{u_1-x_1}\over
{k}}) \cdots \psi({{u_n-x_n}\over {k}}) \dd u_1\cdots
\dd u_n$$
and then essentially mimicking the proofs of Theorems A and B. However
Picard [1891a] published already in 1891 an alternative proof of {\W}'
theorems and showed how to extend the results to functions of several
variables. As such, {\W}' priority to this result is somewhat in question.

\sect{Additional Proofs of the Fundamental Theorems}
In this section we present various alternative proofs of {\W}'
theorems on the density of algebraic and trigonometric polynomials
on finite intervals in $\RR$. We believe that the echo of
these proofs have an abiding value. Some of the papers we cite
contain additional results or emphasize other points of
view. We ignore such digressions. The proofs we present divide
roughly into three groups. The first group contains proofs that,
in one form or another, are based on singular integrals. The
proofs of {\W}, Picard, Fej\'er, Landau, and de la Vall\'ee
Poussin belong here. The second group of proofs is based on the
idea of approximating a particular function. In this group we find
the proofs of Runge/Phragm\'en, Lebesgue, Mittag-Leffler, and
Lerch. Finally, there is the third group that contain the proofs
which do not quite belong to either of the above groups. Here we
find proofs due to Lerch, Volterra  and Bernstein. These are what
we term the ``early proofs''. They all appeared prior to 1913.
Note the pantheon of names that were drawn to this theorem. The
main focus of these proofs are the Weierstrass theorems themselves
rather than any far-reaching generalizations thereof. There are
later proofs coming from different and broader formulations.
However we discuss only one of these later proofs. It is that due to Kuhn
which we consider to be wonderfully elegant and simple. For
historical consistency we have chosen to present these proofs
in more or less chronological order. This lengthens the paper, but
we hope the advantages of this approach offset the deficiencies.

We start by formally stating certain facts which will be obvious to most
readers, but perhaps not to everyone. The first two statements
follow from a change of variables, and are stated without proof.

\proclaim Proposition 1. Algebraic polynomials are dense in $C[a,b]$
iff they are dense in $C[0,1]$.\nopf

Analogously we have the less used:

\proclaim Proposition 2. The trigonometric polynomials
$$\span\{ 1, \sin x, \cos x, \sin 2x, \cos 2x,\ldots\}$$
are dense in $\tilC[0,2\pi]$ iff
$$\span\{ 1, \sin {{2\pi x}\over {b-a}}, \cos {{2\pi x}\over {b-a}},
\sin 2{{2\pi x}\over {b-a}}, \cos 2{{2\pi x}\over {b-a}},\ldots\}$$
are dense in $\tilC[a,b]$.\nopf

We now show that the density of algebraic polynomials in $C[a,b]$,
and trigonometric polynomials in $\tilC[0,2\pi]$, are in fact equivalent
statements. That is, we prove that each of the fundamental theorems
follows from the other, see also Natanson [1964, p.~16--19].

\proclaim Proposition 3. If trigonometric polynomials are dense in
$\tilC[0,2\pi]$, then algebraic polynomials are dense in $C[a,b]$.

\pf We present two proofs of this result. The first proof may
be found in Picard [1891a].

Assume, without loss of generality, that $0\le a< b<2\pi$. Extend $f\in
C[a,b]$ to some $\tilf\in \tilC[0,2\pi]$. Since trigonometric
polynomials are dense in $\tilC[0,2\pi]$, there exists a trigonometric
polynomial $t$ that is arbitrarily close to $\tilf$ on $[0,2\pi]$, and
thus to $f$ on $[a,b]$. Every trigonometric polynomial is a finite linear
combination
of $\sin nx$ and $\cos nx$. As such each is an entire function. Thus $t$ is
an entire function having an absolutely and uniformly convergent power
series expansion. By suitably truncating this power series we obtain an
algebraic polynomial that is arbitrarily close to $t$, and thus
ultimately to $f$.

A slight variant on the above bypasses the need to extend $f$ to
$\tilf$. Assume $f\in C[0,2\pi]$, and define
$$g(x)=f(x) + {{f(0) - f(2\pi)}\over {2\pi}} x.$$
Then $g\in \tilC[0,2\pi]$. We now apply the reasoning of the
previous paragraph to obtain an algebraic polynomial $p$ arbitrarily
close to $g$ on $[0,2\pi]$, whence it follows that
$$p(x) - {{f(0) - f(2\pi)}\over {2\pi}} x$$
is arbitrarily close to $f$ on $[0,2\pi]$.

A different and more commonly quoted proof is the following which does not
depend upon the truncation of a power series. According
to de la Vall\'ee Poussin [1918], [1919], the idea in this proof is due
to Bernstein.

Given $f\in C[-1,1]$, set
$$g(\tet) = f(\cos \tet),\qquad -\pi\le \tet\le \pi.$$
Then $g\in \tilC[-\pi,\pi]$ and $g$ is even.
As such given $\eps>0$ there exists a
trigonometric polynomial $t$ for which
$$|g(\tet)-t(\tet)|<\eps$$
for all $\tet\in [-\pi,\pi]$. We divide $t$ into its even and odd parts,
i.e.,
$$t_e(\tet) = {{t(\tet) + t(-\tet)}\over 2}$$
$$t_o(\tet) = {{t(\tet) - t(-\tet)}\over 2}$$
and note that $t_e$ and $t_o$ are also trigonometric polynomials.
(Equivalently, $t_e$ is composed of the cosine terms of $t$, while $t_o$
is composed of the sine terms of $t$.)

Since $g$ is even we have
$$\max  \{ |(g-t)(\tet)|,  |(g-t)(-\tet)|\}$$
$$=  \max  \{ |(g-t_e)(\tet)-t_o(\tet)|,  |(g-t_e)(\tet) + t_o(\tet)|\}
\ge  |(g-t_e)(\tet)|,$$ and, thus,
$$|g(\tet) - t_e(\tet)|< \eps$$
for all $\tet\in [-\pi,\pi]$. In other words, since $g$ is even we may
assume that $t$ is even.

Let
$$t(\tet) = \sum_{m=0}^n a_m \cos m\tet.$$
Each $\cos m\tet$ is a polynomial of exact degree
$m$ in $\cos \tet$. In fact
$$\cos m\tet = T_m(\cos \tet)$$
where the $T_m$ are the Chebyshev polynomials (see e.g., Rivlin [1974]).
Setting
$$p(x) = \sum_{m=0}^n a_m T_m(x),$$
we have
$$|f(x)-p(x)| <\eps$$
for all $x\in [0,1]$. \eop

\proclaim Proposition 4. If algebraic polynomials are dense in
$C[a,b]$, then trigonometric polynomials are dense in $\tilC[0,2\pi]$.

\pf The first proof of this fact was the one given by {\W} in
Section 2. To our surprise (and chagrin) we have essentially found only one
other proof of this result, and it is not simple. The proof we
give here is de la Vall\'ee Poussin's [1918], [1919] variation on a proof in
Lebesgue [1898].

Let $f\in \tilC[0,2\pi]$ and consider $f$ as being defined on all of
$\RR$. Set
$$g(\tet)={{f(\tet)+f(-\tet)}\over 2}$$
and
$$h(\tet)={{f(\tet)-f(-\tet)}\over 2}\sin \tet.$$
Both $g$ and $h$ are continuous even functions of period $2\pi$.

Define
$$\phi(x) = g(\arccos x),\qquad
\psi(x) = h(\arccos x).$$ These are well-defined functions in
$C[-1,1]$. Thus, given $\eps>0$ there exist algebraic polynomials
$p$ and $q$ for which
$$|\phi(x) - p(x)|< {\eps \over 4},\qquad |\psi(x) - q(x)|< {\eps
\over 4}$$
for all $x\in [-1,1]$. As $g$ and $h$ are even, it follows
that
$$|g(\tet) - p (\cos \tet)|<{\eps \over 4},\qquad
|h(\tet) - q (\cos \tet)|<{\eps \over 4}$$
for all $\tet$.
From the definition of $g$ and $h$, we obtain
$$\left|f(\tet)\sin^2\tet -\left[p(\cos \tet)\sin^2\tet + q(\cos \tet)\sin
\tet\right]\right|<{\eps\over 2}$$
for all $\tet$.

We apply this same analysis to the function $f(\tet+ \pi/2)$ to obtain
algebraic polynomials $r$ and $s$ for which
$$\left|f(\tet+{\pi\over 2})\sin^2\tet -\left[r(\cos \tet)\sin^2\tet + s
(\cos \tet)\sin \tet\right]\right|<{\eps\over 2}$$
for all $\tet$. Substituting for $\tet + \pi/2$ gives
$$\left|f(\tet)\cos^2\tet -\left[r(\sin \tet)\cos^2\tet -
s(\sin \tet)\cos \tet\right]\right|<{\eps\over 2}.$$

Thus the trigonometric polynomial
$$p(\cos \tet)\sin^2\tet + q(\cos \tet)\sin \tet + r(\sin \tet)\cos^2\tet -
s(\sin \tet)\cos \tet$$
is an $\eps$-approximant to $f$. \eop

After these preliminaries we can now look at the inherent methods and
ideas used in various alternative proofs of either of the two
Weierstrass fundamental theorems of approximation theory. We present
these proofs in more or less the order in which they appeared in print.

\medskip\noindent
{\bf Picard.}
\'Emile Picard (1856--1941) (Hermite's son-in-law) had an abiding
interest in the {\W}' theorems and in Picard [1891a] gave the first in a
series of different proofs of the Weierstrass theorems. This proof also
appears in Picard's famous textbook [1891b]. Later editions of this
textbook expanded upon this, often including other methods of proof, but
not always with complete references. Picard's proof, like that of
Weierstrass, is based on a smoothing procedure using singular integrals.
Picard, however, chose to use the Poisson integral. His proof proceeds as follows.

Assume $f\in \tilC [0,2\pi]$. As $f$ is continuous and $2\pi$-periodic on
$\RR$, it is
uniformly continuous thereon. As such, given $\eps>0$ there exists a
$\del>0$ such that for $|x-\tet|<\del$  we have $|f(x)-f(\tet)|<\eps$.
Let
$$P(r, \tet) = {1\over {2\pi}} \int_0^{2\pi} {{1-r^2}\over { 1-2r\cos
(x-\tet) +r^2}} f(x)\dd x$$ denote the Poisson integral of $f$.

We claim that, with the above notation,
$$|P(r,\tet)-f(\tet)| < \eps + {{\|f\|(1-r^2)}\over {r(1-\cos \del)}}$$
for all $\tet$. This may be explicitly proven as follows.
$$P(r,\tet)-f(\tet) =   {1\over {2\pi}} \int_0^{2\pi} {{1-r^2}\over { 1-2r\cos
(x-\tet) +r^2}} [f(x)-f(\tet)]\dd x$$
$$\!\!\!={1\over {2\pi}} \int_{|x-\tet|<\del} {{1-r^2}\over { 1-2r\cos
(x-\tet) +r^2}} [f(x)-f(\tet)]\dd x$$
$$\phantom{123}+ {1\over {2\pi}} \int_{\del\le |x-\tet|\le \pi} {{1-r^2}\over { 1-2r\cos
(x-\tet) +r^2}} [f(x)-f(\tet)]\dd x.$$ Now
$${1\over {2\pi}} \int_{|x-\tet|<\del} {{1-r^2}\over { 1-2r\cos
(x-\tet) +r^2}}|f(x)-f(\tet)|\dd x$$
$$ < {\eps \over {2\pi}} \int_0^{2\pi}
{{1-r^2}\over { 1-2r\cos(x-\tet) +r^2}}\dd x = \eps.$$ In addition
$${1\over {2\pi}} \int_{\del\le|x-\tet|\le \pi} {{1-r^2}\over { 1-2r\cos
(x-\tet) +r^2}} |f(x)-f(\tet)|\dd x$$
$$\le 2\|f\|
{1\over {2\pi}} \int_{\del\le|x-\tet|\le \pi} {{1-r^2}\over {
1-2r\cos (x-\tet) +r^2}}\dd x \le {{\|f\|(1-r^2)}\over {r(1-\cos
\del)}}.$$ This last inequality is a consequence of
$$1-2r\cos(x-\tet) +r^2\ge 2r -2r\cos \del =2r(1-\cos \del)$$
which holds for all $x,\tet$ satisfying $\del \le |x-\tet| \le \pi$.

As a function of $r$,
$${{\|f\|(1-r^2)}\over {r(1-\cos \del)}}$$
decreases to zero as $r$ increases to $1$. Choose some $r_1<1$ for which
$${{\|f\|(1-r_1^2)}\over {r_1(1-\cos \del)}}< \eps.$$
Thus
$$|f(\tet)-P(r_1,\tet)|<2\eps$$
for all $\tet$.

Let
$$a_0/2 +\sum_{n=1}^\infty \left[ a_n \cos nx + b_n \sin
nx\right]$$
denote the Fourier series of $f$. Recall that the Fourier series of
$P(r, \tet)$ is given by
$$a_0/2 +\sum_{n=1}^\infty r^n\left[ a_n \cos nx + b_n \sin
nx\right].$$ Since the $a_n$ and $b_n$ are uniformly bounded, the
above Fourier series converges absolutely, and uniformly converges
to $P(r,\tet)$ for each $r<1$. Thus there exists an $m$ for which
$$\left| P(r_1,\tet) -  \left[a_0/2 +\sum_{n=1}^m r_1^n
(a_n \cos nx + b_n \sin nx)\right]\right|<\eps$$
for all $\tet$. Set
$$g(\tet) = a_0/2 +\sum_{n=1}^m r_1^n(a_n \cos nx + b_n \sin nx).$$

We have ``constructed'' a trigonometric polynomial satisfying
$$|f(\tet)-g(\tet)|<3\eps$$
for all $\tet$.
In other words we have proven that in the uniform norm, trigonometric
polynomials are dense in the space of continuous $2\pi$-periodic
functions.

As noted in the proof of Proposition 3, Picard then proves the {\W}
theorem for algebraic polynomials based on the above result.
Picard ends his paper by noting that the same procedure can be used to
obtain parallel results for continuous functions of many variables. He
was the first to publish an extension of the {\W} theorems to several
variables.

As Picard [1891a] states, this proof is based on an inequality
obtained by H.~A.~Schwarz in his
well-known paper Schwarz [1871]. In fact, as Cakon [1987] points
out, almost the entire Picard proof can be found in Schwarz
[1871]. What is perhaps surprising is that Weierstrass did not
notice this connection.

\medskip\noindent {\bf Lerch I.}
M.~Lerch (1860--1922) was a Czech mathematician of some renown
(see Skrasek [1960] and MacTutor [2004]) who attended some of
Weierstrass' lectures. Lerch wrote two papers, Lerch [1892] and Lerch
[1903], that included proofs of the Weierstrass theorem for algebraic
polynomials. Unfortunately the paper Lerch [1892] is in Czech, difficult
to procure, and I have found no reference to it anywhere in the
literature except in Lerch [1903] and in a footnote in Borel [1905] (but
Borel did not see the paper). Subsequent authors mentioned in this work
were seemingly totally ignorant of this paper. Many of these authors
quote Volterra [1897], although Lerch [1892] contains a similar
proof with the same ideas. It is for the reader to decide whether, in
these circumstances, Lerch deserves prominence or only precedence.

We here explain the proof as is essentially contained in Lerch
[1892]. We defer the discussion of Lerch [1903] to a more appropriate
place. Let $f\in C[a,b]$. Since $f$ is uniformly continuous on $[a,b]$,
it can be uniformly approximated thereon by a polygonal (piecewise
linear) line. Lerch notes that every polygonal line $g$ may be
uniformly approximated by a Fourier cosine series of the form
$${{a_0}\over 2} + \sum_{n=1}^\infty a_n \cos{{x-a}\over {b-a}}n\pi,$$
where
$$a_n = {2\over {b-a}} \int_a^b g(x)\cos{{x-a}\over {b-a}}n\pi \dd x.$$
It was, at the time, well-known to any mathematician worth his salt that
the Fourier cosine series of a continuous function with a finite number
of maxima and minima uniformly converges to the function. This result
goes back to Dirichlet [1829], see e.g. Sz.-Nagy [1965, p.~399].
Alternatively it is today a standard result contained in every Fourier
series text that if the derivative of a continuous function is piecewise
continuous with one-sided derivatives at each point, then its Fourier
cosine series converges uniformly. Both these results follow from the
analogous results for periodic functions and the usual Fourier series.
Both these results hold for our polygonal line. As this Fourier cosine
series converges uniformly to our polygonal line we may truncate it to
obtain a trigonometric polynomial (but not a trigonometric polynomial as
in Proposition 2) which approximates our polygonal line
arbitrarily well. Finally, as the trigonometric polynomial is an entire
function we can suitably truncate its power series expansion to obtain
our desired algebraic polynomial approximant.

\medskip\noindent {\bf Volterra.}
The next published proof of {\W}' theorems is due to Volterra [1897].
V.~Volterra (1860--1940) proved only the density of trigonometric
polynomials in $\tilC[0,2\pi]$. As he was aware of Picard [1891a], this
should not detract from his proof.

Volterra was unaware of Lerch [1892], but his proof is much the same.
Let $f\in \tilC[0,2\pi]$. Since $f$ is continuous on a closed interval,
it is also uniformly continuous thereon. As such, it is possible to find
a polygonal line that approximates $f$ arbitrarily well. One can also
assume that the polygonal line is $2\pi$-periodic. It thus suffices to
prove that one can arbitrarily well approximate any continuous,
$2\pi$-periodic, polygonal line by trigonometric polynomials. As stated
in the proof of Lerch, the Fourier series of the polygonal line
uniformly converges to the function. We now suitably truncate
the Fourier series to obtain the desired approximation.

\medskip
C.~Runge (1856--1927), E.~Phragm\'en (1863--1937),
H.~Lebesgue (1875--1941) and G.~Mittag-Leffler (1846--1927)
all contributed proofs of
the Weierstrass approximation theorems, and their proofs are related
both in character and idea. What did each do?

Mittag-Leffler, in 1900, was the last of the above four to publish
on this subject. However he seems to have been the first to
point out, in print, Runge and Phragm\'en's contributions. As such we
start this story with Mittag-Leffler. The paper Mittag-Leffler [1900] is
an ``extract from a letter to E.~Picard''. This was, at the
time, a not uncommon format for an article. Journals were still in their
infancy, but were replacing correspondence as the primary mode of
dissemination of mathematical research. Thus this combination of these
two forms. The article came in response to what Picard had written in
his ``Lectures on Mathematics'' given at the Decennial Celebration at
Clark University, Picard [1899].
In this grand review Picard mentions the importance, in the development of
the understanding of functions, of Weierstrass' example of a continuous
nowhere differentiable function, and of {\W}' theorem on the
representation of every continuous function on a finite interval as an
absolutely and uniformly convergent series of
polynomials. Picard then goes on to mention his own proof and that of
Volterra [1897]. Mittag-Leffler [1900] points out that
Weierstrass' theorem also follows from work of Runge
[1885, 1885/86] although, as he notes, it is not explicitly contained
anywhere in either of these two papers. He then explains his own proof,
to which we shall return later. How did Mittag-Leffler know about {\W}'
theorem following from the work of Runge?
Firstly, Mittag-Leffler was the editor of Acta Mathematica and, as he
writes, he was the one who published Runge's paper.
(Mittag-Leffler founded Acta Mathematica in 1882 and was its editor for
45 years.) Moreover in the paper of Mittag-Leffler [1900] there is a
very interesting long footnote which seems to have been somewhat
overlooked. It starts as follows: {\it I found on this subject among my
papers an article of Phragm\'en, from the year 1886, which goes thus}.
What follows is two pages where Phragm\'en (who was 23 years
old at the time) explains how Weierstrass' theorem can follow from Runge's
work, Phragm\'en's simplification thereof, and also how to get from this
the Weierstrass theorem on the density of trigonometric polynomials in
$\tilC[0,2\pi]$ (with some not
insignificant additional work). Before we explain this in
detail, let us start with the general idea behind these various proofs.

Let $f\in C[0,1]$. Since $f$ is continuous on a closed interval, it is
also uniformly continuous thereon. As Lerch and Volterra pointed out,
it is thus
possible to find a polygonal line $g$ (which today we might also call a
spline of degree 1 with simple knots) that approximates $f$ uniformly
to within any given $\eps>0$, i.e., for which
$$|f(x)-g(x)|<\eps,$$
for all $x\in [0,1]$.
This polygonal line is the first idea in these proofs. The second idea
is to show that there is an arbitrarily good polynomial approximant to
the relatively ``simpler'' $g$. This will then suffice to prove that we
can find a polynomial that approximates our original $f$ arbitrarily
well. The third and more fundamental idea is to reduce the problem of
finding a good polynomial approximant to $g$ (which depends upon $f$) to
that of finding a good polynomial approximant to one and only one
function, independent of $f$. Each of Runge, Mittag-Leffler and
Lebesgue do this in a different way.

\medskip\noindent {\bf Runge/Phragm\'en.} We first fix some notation.
Let $0=x_0<x_1<\cdots<x_m=1$ be the abscissae (knots) of the polygonal
line $g$. There are various ways of writing $g$. One elementary way
is:
$$g(x)=g_1(x) + \sum_{i=1}^{m-1} \left[ g_{i+1}(x) - g_i(x)\right]
h(x-x_i)\eqno(3.1)$$
where $g_i$ is the linear polynomial agreeing with $g$ on $[x_{i-1},
x_i]$ and
$$h(x) =\cases{ 1,& $x\ge 0$\cr 0,& $x<0$\cr}.$$
$g_i$ may be explicitly given as
$$g_i(x)= y_{i-1} + \left({{x-x_{i-1}}\over {x_i-x_{i-1}}}\right)
(y_i-y_{i-1})$$
where $y_j=g(x_j)$, $j=0,1\nek m$.

What Runge did in his 1885/86 paper is the following. He considered the
function
$$\phi_n(x) = {1\over {1+x^{2n}}}$$
which has the property that
$$\lim_{n\to\infty} \phi_n(x)=\cases{ 1,& $|x|< 1$\cr
1/2, & $|x|=1$\cr 0, & $|x|> 1$\cr}.$$ Set $\psi_n(x) =
1-\phi_n(1+x)$. Then restricted to $[-1,1]$ we have
$$\lim_{n\to\infty} \psi_n(x)=\cases{ 1,& $0<x<1$\cr
1/2, & $x=0$\cr 0, & $-1<x<0 $\cr}.$$ Since each $\psi_n$ is
increasing on $[-1,1]$, and $\psi_{n+1}(x)> \psi_n(x)$ for $x\in
(0,1]$, while $\psi_{n+1}(x)< \psi_n(x)$ for $x\in (-1,0)$, it
follows that given any $\del>0$, small, the functions $\psi_n$ are
bounded on $[-1,1]$ and uniformly converge to the function $h$ on
$[-1,-\del] \union [\del,1]$ for any given $\del$.

Since the linear polynomial $g_{i+1} - g_i$ vanishes at $x_i$, a
short calculation verifies that for each $x_i\in (0,1)$
$$\left[ g_{i+1}(x) - g_i(x)\right] \psi_n(x-x_i)$$
uniformly converges to
$$\left[ g_{i+1}(x) - g_i(x)\right] h(x-x_i)$$
on $[0,1]$. Replacing the $h$ in (3.1) by $\psi_n$ we obtain a series of
functions which uniformly approximate $g$.

These functions
$$\Psi_n(x)=g_1(x) + \sum_{i=1}^{m-1} \left[ g_{i+1}(x) - g_i(x)\right]
\psi_n(x-x_i)$$
are not polynomials or entire functions. But they are rational
functions. Thus any continuous function on a finite real interval can be
uniformly approximated by rational functions.
This is the main result of Runge [1885/86]. It was published the
same year as Weierstrass' paper.

Runge also discussed what could be said in the case of continuous
functions on all of $\RR$. In that context he noted that from one of his
results in Runge [1885] one could always replace $\Psi_n$ by another
rational function, real on $\RR$, with exactly two conjugate poles.

Phragm\'en in the above-mentioned footnote in Mittag-Leffler [1900] (but
according to Mittag-Leffler written in 1886), remarks that apparently
Runge overlooked in Runge [1885/86] (or did not think important) the fact
that he could replace rational functions by polynomials. Runge quite
explicitly had the tools to do this from Runge [1885].

What is the relevant result from Runge [1885]? It is the following,
which we state in an elementary form. Assume $D$ is a compact set and
$\CC\\ D$ is connected. Let $R$ be a rational function
with poles outside $D$.  Then given any point
$w\in \CC\\ D$ there are rational functions, with only the one pole
$w$, that approximate $R$ arbitrarily well on $D$. This is not a
difficult result to prove. Here, essentially, is Runge's proof. The
rational function $R$ can be decomposed as $R=\sum_{j=1}^n R_j$ where
each $R_j$ is a rational function with only one pole $w_j$. We now show
how to move each $w_j$ to $w$ in a series of finite steps. For each $j$
we choose $a_0\nek a_m$, where $a_0=w_j$ and $a_m=w$, and the $a_i$ are
chosen so that
$$|a_{i-1}-a_i|< |z-a_i |,\qquad i=1\nek m$$
for all $z\in D$. This can be done. At each stage we will construct a
rational function $G_i$ ($G_0=R_j$) with only the simple pole $a_i$, and
such that $G_i$ is arbitrarily close to $G_{i-1}$. This follows from the
fact that for given $k\in \NN$ the function
$${1\over {(z-a_{i-1})^k}}$$
can be arbitrarily well approximated on $D$ by
$$\left[ {1\over {(z-a_{i-1})}} \left[ 1 - \left({{a_{i-1}-a_i}\over {z-a_i}}
\right)^n\right]\right]^k$$ by taking $n$ sufficiently large. Note
that the latter is a rational function with a pole only at $a_i$.
Runge further noted that by a linear fractional transformation
(and a bit of care) the pole could be shifted to $\infty$, whence
the rational function becomes a polynomial. As Phragm\'en points
out, if the function $f$ to be approximated on $[0,1]$ is real, we
can replace the polynomial approximant $G$ obtained above by ${\sl
Re}\,G$ on $[0,1]$ which is also a polynomial and which better
approximates $f$ thereon. Thus {\W}' theorem is proved.

Phragm\'en also notes that it is really not necessary to use the results
of Runge [1885]. If we go back to Runge [1885/86] and consider his
construction therein, we see that each of the rational approximants are
real on $[0,1]$, and have denominator $1+ (1+x)^{2n}$ for some $n$. Any such
$R$ may be decomposed as
$$R= g + r_1+ r_2$$
where $g$ is a polynomial, $r_1$ is a rational function, all of whose
poles lie in the upper half-plane, and $r_2(z)={\overline {r_1(\overline{z})}}$ is a
rational function, all of whose poles are conjugate to the poles of $r_1$ and
lie in the lower half-plane.
It is possible to choose a point $z_1$ in the lower half plane
such that there exists a circle centered at $z_1$ containing
$[0,1]$, but not containing any poles of $r_1$. As such the Taylor
series of $r_1$ about $z_1$ converges uniformly to $r_1$ in $[0,1]$.
Truncate it to obtain a polynomial $p_1$ that approximates $r_1$
arbitrarily well on $[0,1]$. It follows that $p_2(z)={\overline {p_1(\overline{z})}}$ has
the corresponding property with respect to $r_2$. As such
$$P=g+p_1+p_2$$
is a real polynomial that can be chosen to approximate $f$ arbitrarily
well.

Another simple option, not mentioned by Phragm\'en, is simply to use the result
of Runge [1885], to move the poles of any rational approximant away from
$[0,1]$ so that a circle can be put about $[0,1]$ which does not contain any
poles, and then use the truncated power series as above. Phragm\'en's
proof of the density of trigonometric polynomials in $\tilC[0,2\pi]$ is
more complicated and we will not present it here.

In any case, as we have seen, the algebraic Weierstrass theorem is a fairly simple
consequence of Runge's [1885] and [1885/86] results. It is unfortunate and
somewhat astonishing that Runge did not think of it.

\medskip\noindent {\bf Lebesgue.}
Let us now give Lebesgue's proof of {\W}' theorem as found in Lebesgue
[1898]. This is one of the more elegant and cited proofs of {\W}'
theorem. It
is interesting to note that this was Lebesgue's first published paper. He
was, at the time of publication, a 23 year old student at the \'Ecole
Normale Sup\'erieure. He obtained his doctorate in 1902.

A more ``modern'' form of writing the $g$ of (3.1) is as a spline.
That is,
$$g(x) = ax+b +\sum_{i=1}^{m-1} c_i(x-x_i)^1_+$$
where
$$x^1_+ =\cases{ x,& $x\ge 0$\cr 0,& $x<0$\cr}$$
and $ax+b=g_1(x)$. (This easily follows from the form (3.1). As
$g_{i+1}(x)-g_i(x)$ is a linear polynomial that vanishes at $x_i$, it
is necessarily of the form $c_i(x-x_i)$ for some constant $c_i$.) Since
$$2x^1_+= |x| + x$$
the above form of $g$ may also be rewritten as
$$g(x) = Ax+B +\sum_{i=1}^{m-1} C_i|x-x_i|\eqno(3.2)$$
for some real constants $A$, $B$, and $C_i$.

Lebesgue [1898] considers the form (3.2) of $g$, and argues as
follows. To approximate $g$ arbitrarily well by a polynomial it suffices
to be able to approximate $|x|$ arbitrarily well by a polynomial in
$[-1,1]$ (or in fact in any neighbourhood of the origin). If for given
$\eta>0$ there exists a polynomial $p$ satisfying
$$\left| |x|-p(x)\right|<\eta$$
for all $x\in [-1,1]$, then
$$\left| |x-x_i|-p(x-x_i)\right|<\eta$$
for all $x\in [0,1] \subset [x_i-1, x_i+1]$ (since $0\le x_i\le 1$). By
a judicious choice of $\eta$, depending on the predetermined constants
$C_i$ in (3.2), it then follows that
$$\left| g(x) -\left[ Ax+B +\sum_{i=1}^{m-1} C_ip(x-x_i)\right]\right|
<\eps$$
for all $x\in [0,1]$.

Thus our problem has been reduced to that of approximating just the one
function $|x|$. How can this be done? As Lebesgue explains, one can write
$$|x|=\sqrt{x^2}=\sqrt{1-(1-x^2)}= \sqrt{1-z}$$
where $z=1-x^2$, and then expand the above radical by the binomial
formula to obtain a power series in $z=1-x^2$ which converges uniformly
to $|x|$ in $[-1,1]$. One finally just truncates the power series.

To be more explicit, we have
$$(1-z)^{1/2} = \sum_{n=0}^\infty {{1/2}\choose n} (-z)^n$$
where
$${{1/2}\choose n} = {{{1\over 2}({1\over 2}-1)\cdots ({1\over 2} - n+1)}
\over {n!}} = {{(-1)^{n-1} {1\over 2}{1\over 2}{3\over 2}\cdots
{{2n-3}\over 2}}\over {n!}}.$$ Thus
$$(1-z)^{1/2} = 1 -\sum_{n=1}^\infty a_nz^n$$
with $a_1= 1/2$, and
$$a_n = {{(2n-3)!}\over {2^{2n-2} n! (n-1)!}},\qquad n=2,3,\ldots$$
This power series converges absolutely and uniformly to $(1-z)^{1/2}$ in
$|z|\le 1$. It is easily checked that the radius of convergence of this
power series is 1. An application of Stirling's formula shows that
$$a_n = {e\over {2\sqrt{\pi}}}{1\over {n^{3/2}}}(1+ o(1))$$
so that the series also has the correct convergence properties for
$|z|=1$. A different proof of this same fact may be found in Todd [1961,
p.~11]. This finishes Lebesgue's proof.

An alternative argument (see Ostrowski [1951, p.~168]
or Feinerman, Newman [1974, p.~5]) gets
around the more delicate analysis at $|z|=1$ by noting that
$(1-z)^{1/2}$ may be uniformly approximated on $[0,1]$ by $(1-\rho z)^{1/2}$
as $\rho\uparrow 1$. (In fact it is easily checked that for $0<\rho<1$
$$| (1- z)^{1/2} - (1-\rho z)^{1/2} | \le (1-\rho)^{1/2}$$
for all $z\in [0,1]$.) Now the power series for $(1-\rho z)^{1/2}$,
namely
$$(1-\rho z)^{1/2} = 1 -\sum_{n=1}^\infty a_n\rho^n z^n,$$
is absolutely and uniformly convergent in $|z|< \rho^{-1}$ and thus in
$|z|\le 1$.

Bourbaki [1949, p.~55] (see also Dieudonn\'e [1969, p.~137]) presents an
ingenious argument to obtain a sequence of polynomials
which uniformly approximate $|x|$.
For $t\in [0,1]$ define a sequence of polynomials recursively as follows.
Let $p_0(t)\equiv 0$ and
$$p_{n+1}(t)= p_n(t) +{1\over 2} (t- p^2_n(t)),$$
$n=0,1,2,\ldots$. It is readily verified that for each fixed $t\in [0,1]$,
${p_n(t)}$ is an increasing sequence bounded above by $\sqrt{t}$. The
former is a consequence of the latter which is proven as follows. Assume
$0\le p_n(t) \le \sqrt{t}$. Then
$$\eqalign{ \sqrt{t} - p_{n+1}(t) = & \sqrt{t} - p_{n}(t) -{1\over 2}
(t-p_n^2(t))\cr
=& (\sqrt{t} - p_{n}(t)) (1- {1\over 2}(\sqrt{t} + p_{n}(t)))\cr
\ge & 0\cr}$$
since $\sqrt{t} + p_{n}(t) \le 2\sqrt{t} \le 2$ for $t\in [0,1]$. Thus for
each $t\in [0,1]$
$$\lim_{n\to\infty} p_n(t) = p(t)$$
exists. Since $p(t)$ is nonnegative and satisfies
$$p(t)= p(t) - {1\over 2} (t-p^2(t))$$
we have $p(t)=\sqrt{t}$.
The $\{p_n\}$ are real-valued continuous functions (polynomials) which
increase, and converge pointwise to a continuous function $p$. This
implies that the convergence is uniform (Dini's theorem).
Let $q_n(x)=p_n(x^2)$ for $x\in [-1,1]$. Then the polynomials $\{q_n\}$
converge uniformly to $\sqrt{x^2}= |x|$ on $[-1,1]$.
A similar and equivalent proof may be found in Sz.-Nagy [1965, p.~77].
(Sz.-Nagy attributes his procedure to C.~Visser.)

\medskip\noindent {\bf Mittag-Leffler.}
The proof by Mittag-Leffler as given in Mittag-Leffler [1900] is the following. He
also considers the $g$ as given in (3.1), and sets
$$\rchi_n(x)= 1- 2^{1-(1+x)^n}.$$
It is easily checked that
$$\lim_{n\to \infty} \rchi_n(x) =\cases{ 1,& $0<x\le 1$\cr
0, & $x=0$\cr -1, & $-1\le x<0$\cr}.$$ Furthermore, since each
$\rchi_n$ is increasing on $[-1,1]$, and $\rchi_{n+1}(x)>
\rchi_n(x)$ for $x\in (0,1]$, while $\rchi_{n+1}(x)< \rchi_n(x)$
for $x\in (-1,0)$, it follows that given $\del>0$, small, the
function $\rchi_n$ uniformly converges to 1 on $[\del,1]$ and to
$-1$ on $[-1,-\del]$. Thus the functions
$$h_n = {{\rchi_n+1}\over 2}$$
are bounded on $[-1,1]$ and uniformly approximate the function $h$ of
(3.1) on $[-1,-\del] \union [\del,1]$ for any given $\del$. Furthermore
the $\rchi_n$ and thus the $h_n$ are entire (analytic) functions.

As previously, since $g_{i+1} - g_i$ is a linear
polynomial vanishing at $x_i$, a
short calculation verifies that for each $x_i\in (0,1)$
$$\left[ g_{i+1}(x) - g_i(x)\right] h_n(x-x_i)$$
uniformly converges to
$$\left[ g_{i+1}(x) - g_i(x)\right] h(x-x_i)$$
on $[0,1]$. Replacing the $h$ in (3.1) by $h_n$ we obtain a series of
functions $\{H_n\}$ that uniformly approximate $g$. Finally, since $h_n$ is
an entire function, each of the functions $H_n$ is
an entire function.  As such they may be approximated arbitrarily well by a
truncation of their power series. This again proves Weierstrass' theorem.

\medskip\noindent {\bf Fej\'er.}
L.~Fej\'er (1880--1959) was a student of H.~A.~Schwarz.
What we will report on here is taken from Fej\'er [1900]
(he had just turned 20 when the paper appeared). This fundamental paper
formed the basis for Fej\'er's doctoral thesis obtained in 1902 from the
University of Budapest. The paper contains what is today described as the
``classic'' theorem on  Ces\`aro ($C,1$) summability of Fourier series.
As we are interested in {\W}' theorem, we will restrict ourselves, a
priori, to
$f\in \tilC[0,2\pi]$, and prove that the Ces\`aro sum of the Fourier
series of any such $f$ converges uniformly to $f$. Note that this is the
first proof of {\W}' theorem (in the trigonometric polynomial case) that
actually provides, by a linear process, a sequence of easily
calculated approximants.

Let $\sig_0(x)=1/2$, and
$$\sig_m(x) = {1\over 2} + \cos x +\cos 2x + \cdots + \cos mx$$
for $m=1,2,\ldots\,$. Set
$$G_n(x) = {{\sig_0(x)+\cdots + \sig_{n-1}(x)}\over n}.$$
A calculation shows that
$$G_n(x) = {1\over {2n}} {{1-\cos nx}\over {1-\cos x}} =
{1\over {2n}} \left[ {{\sin\left({{nx}\over 2}\right)}\over
{\sin\left({{x}\over 2}\right)}}\right]^2.$$ Furthermore it is
easily seen that
$${1\over \pi} \int_0^{2\pi} G_n(x)\dd x = 1.$$
$G_n$ is a nonnegative kernel that integrates to 1 (and, as we
shall show approaches the Dirac-Delta function at $0$ as $n$ tends to
infinity, i.e., convolution against $G_n$ approaches the identity
operator).

Assume $f\in \tilC[0,2\pi]$. Let
$${{a_0}\over 2} + \sum_{k=1}^\infty a_k \cos kx + b_k \sin kx$$
denote the Fourier series of $f$. Let $s_0(x) = a_0/2$, and
$$s_m(x) ={{a_0}\over 2} + \sum_{k=1}^m a_k \cos kx + b_k \sin kx$$
denote the partial sums of the Fourier series of $f$. The functions
$s_m$ do not necessarily converge uniformly, or pointwise,
to $f$ as $m\to\infty$. This is a well-known result of du
Bois-Reymond [1876].
However let us now set
$$S_n(x) ={{s_0(x)+\cdots + s_{n-1}(x)}\over n}
= {1\over \pi} \int_0^{2\pi} f(y) G_n(y-x) \dd y .$$ Explicitly
the $S_n$ are given by
$$S_n(x) = {{a_0}\over 2} + \sum_{k=1}^{n-1} \left(1 - {k\over
n}\right) \left[a_k \cos kx + b_k \sin kx\right].$$ Surprisingly,
the $S_n$ always converge uniformly to $f$.

\proclaim Theorem 5. For each $f\in \tilC[0,2\pi]$, the trigonometric
polynomials $S_n$ converge uniformly to $f$ as $n\to\infty$.

\pf From the above
$$S_n(x) = {1\over \pi} \int_0^{2\pi} f(y) G_n(y-x) \dd y
= {1\over {2n\pi}} \int_0^{2\pi} f(y) {{1-\cos n(y-x)}\over
{1-\cos (y-x)}}\dd y .$$ Since $f\in\tilC[0,2\pi]$, $f$ may be
considered to be uniformly continuous on all of $\RR$. Thus given
$\eps>0$ there exists a $\del>0$ such that if $|x-y|<\del$, then
$$|f(x)-f(y)|< {\eps\over 2} .$$
In what follows we assume $\del< \pi/2$.

Since $G_n$ integrates to 1 we have
$$S_n(x)-f(x)= {1\over {\pi}} \int_0^{2\pi} [f(y)-f(x)]
G_n(y-x) \dd y $$
$$= {1\over {\pi}} \int_{|y-x|<\del} [f(y)-f(x)]
G_n(y-x) \dd y + {1\over {\pi}} \int_{\del\le |y-x|\le \pi}
[f(y)-f(x)] G_n(y-x)\dd y.$$ We estimate each of the above two
integrals.

On $|y-x|<\del$ we have $|f(x)-f(y)|< {\eps\over 2}$. Thus
$$\left| {1\over {\pi}} \int_{|y-x|<\del} [f(y)-f(x)]
G_n(y-x) \dd y\right| < {\eps\over 2}
{1\over {\pi}} \int_{|y-x|<\del}
G_n(y-x) \dd y$$
$$< {\eps\over 2}
{1\over {\pi}} \int_0^{2\pi} G_n(y-x) \dd y= {\eps\over 2}.$$ We have
here used the crucial fact that $G_n$ is nonnegative and
integrates to 1 over any interval of length $2\pi$.

From the explicit form of $G_n$ and the inequality $|f(y)-f(x)|\le
2\|f\|$ we have
$$\left| {1\over {\pi}} \int_{\del\le |y-x|\le \pi} [f(y)-f(x)]
G_n(y-x) \dd y\right| \le {{2\|f\|}\over {2n\pi}}
 \int_{\del\le |y-x|\le \pi} {{1-\cos n(y-x)}\over {1-\cos (y-x)}}\dd y.$$
Now $|1-\cos n(y-x)|\le 2$, while on $\del\le |y-x|\le \pi$ we have
$1-\cos(y-x) \ge 1-\cos \del$. Thus
$$\left| {1\over {\pi}} \int_{\del\le |y-x|\le \pi} [f(y)-f(x)]
G_n(y-x) \dd y\right| \le {{2\|f\|}\over {2n\pi}}{2\over {1-\cos
\del}}2\pi= {{4\|f\|}\over {n(1-\cos\del)}}.$$ For $n$
sufficiently large
$${{4\|f\|}\over {n(1-\cos\del)}}<{\eps\over 2}.$$
Thus for such $n$
$$|S_n(x)-f(x)|<\eps.\meop$$

Applying the method of the (second) proof of Proposition 3 to the above we
see that to each $f\in C[-1,1]$ we may obtain a sequence of algebraic
polynomials
$$p_n(x) = {{a_0}\over 2} + \sum_{k=1}^{n-1} \left(1 - {k\over
n}\right) a_k T_k(x)$$
where
$$a_k = {2\over \pi} \int^1_{-1} {{f(x)T_k(x)}\over {\sqrt{1-x^2}}}\dd x,$$
$k=0,1,\ldots$. These explicitly defined $p_n$ (each of degree at most
$n-1$) uniformly approximate $f$.

\medskip\noindent {\bf Lerch II.}
The paper Lerch [1903] contains yet another proof of the density of
algebraic polynomials in $C[0,1]$. In his previous proof, in Lerch
[1892], Lerch had used general properties of Fourier series to prove the
{\W} theorem for algebraic polynomials. His proof here is different in
that while the same general scheme is used, he only needs to consider
the Fourier series of two specific functions, and their properties. In
this sense it is more elementary than his previous proof.

We recall from Lerch [1892] that it suffices to be able to arbitrarily
approximate the polygonal line $g$ as given in (3.1). Lerch rewrites
(3.1) in the form
$$g(x) = \sum_{i=1}^m \ell_i(x)$$
where
$$\ell_i(x) =\cases{ 0, & $x< x_{i-1}$\cr
y_{i-1} + \left({{x-x_{i-1}}\over { x_i-x_{i-1}}}\right) (y_i-y_{i-1}), &
$x_{i-1}\le x <  x_{i}$\cr
0, & $x_{i}\le x$\cr}$$
(when defining $\ell_m$ we should, for precision, define it to equal
$y_m$ at $x_m=1$).

As we mentioned, Lerch bases his proof on quite explicit Fourier series.
It is well known and easily checked that
$${1\over 2} - x = \sum_{n=1}^\infty {{\sin 2n\pi x}\over {n\pi}},\qquad
0<x<1, \eqno(3.3)$$
while
$$x^2-x+{1\over 6} = \sum_{n=1}^\infty {{\cos 2n\pi x}\over {n^2\pi^2}},\qquad
0\le x \le 1. \eqno(3.4)$$ There is a problem with the convergence
of the Fourier series in (3.3). This series converges uniformly to
$1/2 - x$ on any $[a,b]$, $0<a<b<1$, but does not converge
uniformly in any neighbourhood of $x=0$ or $x=1$. (In fact its
value at $x=0$ and $x=1$ is 0.) However the series in (3.4) does
converge absolutely and uniformly to the given function on all of
$[0,1]$. It is also readily checked, using the 1-periodicity of
the Fourier series, that the function
$${1\over 2} (x_i-x_{i-1})(y_i+y_{i-1}) + \sum_{n=1}^\infty
{{y_{i-1}\sin 2n\pi (x-x_{i-1})-y_{i}\sin 2n\pi (x-x_{i})}\over
{n\pi}}$$
$$ - {1\over 2}{{(y_i-y_{i-1})}\over {(x_i-x_{i-1})}} \sum_{n=1}^\infty
{{\cos 2n\pi (x-x_{i-1})-\cos 2n\pi (x-x_{i})}\over
{n^2\pi^2}}$$
is the Fourier series of $\ell_i$ and that there is uniform convergence
of this series to $\ell_i$ on any compact subset of $[0,1]$ not
containing $x_{i-1}$ and $x_i$.

Thus
$${1\over 2} \sum_{i=1}^m (x_i-x_{i-1})(y_i+y_{i-1}) + \sum_{n=1}^\infty
{{y_{0}\sin 2n\pi x-y_{m}\sin 2n\pi (x-1)}\over
{n\pi}}$$
$$ - {1\over 2}\sum_{i=1}^m  {{(y_i-y_{i-1})}\over {(x_i-x_{i-1})}}
\sum_{n=1}^\infty {{\cos 2n\pi (x-x_{i-1})-\cos 2n\pi (x-x_{i})}\over
{n^2\pi^2}}$$
is the Fourier series of $g$. Note that this series converges uniformly to $g$
also at $x_1\nek x_{m-1}$. There remains the problem of convergence at
$x_0=0$ and $x_m=1$. (However if $g\in \tilC[0,1]$, i.e., $g$ is
1-periodic, then $y_0=y_m$ and the problematic term has disappeared.
In this case, we have constructed the Fourier series of $g$ which
converges absolutely and uniformly to $g$ on $[0,1]$. Truncate
this Fourier series to obtain a trigonometric polynomial which
approximates $g$ arbitrarily well. This proves the density of
trigonometric polynomials.) If $y_0\ne y_m$ then we may, as does Lerch,
again apply (3.3) to obtain
$${1\over 2} \sum_{i=1}^m (x_i-x_{i-1})(y_i+y_{i-1}) + (y_0 -
y_m)({1\over 2}-x)$$
$$ - {1\over 2}\sum_{i=1}^m  {{(y_i-y_{i-1})}\over {(x_i-x_{i-1})}}
\sum_{n=1}^\infty {{\cos 2n\pi (x-x_{i-1})-\cos 2n\pi
(x-x_{i})}\over {n^2\pi^2}}.$$ (Alternatively, just shift $g$ by a
polynomial so that the new $g$ satisfies $g(0)=g(1)$.) This series
converges absolutely and uniformly to $g$ on all of $[0,1]$.
Truncating this infinite series we obtain an entire function
(trigonometric polynomial) that approximates $g$ arbitrarily well.
We now appropriately truncate the power series of this entire
function to obtain the desired algebraic polynomial.

Unfortunately there is no indication, in Lerch [1903], that he
was aware of any of the other published proofs of the {\W}
theorem. A careful reading of this proof shows that it is
essentially a quasi-constructive version of Lebesgue's proof.

\medskip\noindent {\bf Landau.}
The proof of E.~Landau (1877--1938) in Landau [1908] follows
the tradition of the proofs of Weierstrass, Picard and Fej\'er in that
the essential underlying mechanism in his proof is a singular integral.
However it is more direct than the former two in its judicious choice of
the kernel. Let $f\in C[a,b]$ where, without loss of generality, it will
be assumed that $0<a<b<1$. Extend $f$ to be a continuous function on all
of $[0,1]$.

Define
$$k_n = \int_{-1}^1 (1-u^2)^n \dd u$$
and set
$$p_n(x) = {1\over {k_n}} \int_0^1 f(y) \left[1 -(x-y)^2\right]^n \dd y.$$
Note that $p_n$ is a polynomial of degree at most $2n$ in $x$. What Landau
proves is that the sequence of polynomials $\{p_n\}$ converge uniformly to
$f$ on $[a,b]$. Landau's sequence of polynomial approximants differ from
those of the previous proofs (except for Fej\'er's proof) in that they are
explicitly given, and in that they are obtained via a linear method.

We first present Landau's original proof.
In this proof we will use the following estimates. For every $0<\del<1$,
$$\int_{\del\le |u|\le 1} (1-u^2)^n \dd u \le \int_{\del\le |u|\le 1}
(1-\del^2)^n \dd u < 2 (1-\del^2)^n.$$
Similarly
$$k_n = \int_{-1}^1 (1-u^2)^n \dd u \ge \int_{|u|\le 1/\sqrt{n}} (1-u^2)^n \dd u
\ge \int_{|u|\le 1/\sqrt{n}} \left(1-{1\over n}\right)^{n} \dd u$$
$$= {2\over
{\sqrt{n}}}\left(1-{1\over n}\right)^{n}.$$
Thus
$${1\over {k_n}} \int_{\del\le |u|\le 1} (1-u^2)^n \dd u \le \sqrt{n}
(1-\del^2)^n \left(1-{1\over n}\right)^{-n}.$$
Note that for every fixed $\del \in (0,1)$ we have
$$\lim_{n\to \infty} \sqrt{n} (1-\del^2)^n \left(1-{1\over
n}\right)^{-n}=0.$$

Now choose $\eps> 0$. Since $f$ is uniformly continuous on $[0,1]$ there
exists a $\del>0$ such that if $x,y\in [0,1]$ satisfies $|x-y|<\del$,
then
$$|f(x)-f(y)|<\eps/3.$$
Assume $0<\del< \min\{a,1-b\}$.
Choose $N$ so that for all $n\ge N$
$$2\|f\| \sqrt{n} (1-\del^2)^n \left(1-{1\over n}\right)^{-n} < \eps/3.$$

For every $x\in [a,b]$,
$$|p_n(x)-f(x)| = \left|{1\over {k_n}}
\int_0^1 f(y) \left[1 -(x-y)^2\right]^n \dd y -f(x)\right|$$
$$\le {1\over {k_n}}
\int_0^1 |f(y)-f(x)| \left[1 -(x-y)^2\right]^n \dd y + |f(x)|
\left| 1 - {1\over {k_n}} \int_0^1  \left[1 -(x-y)^2\right]^n \dd y \right|.$$
We bound the integral
$${1\over {k_n}} \int_0^1 |f(y)-f(x)| \left[1 -(x-y)^2\right]^n \dd y$$
by considering separately integration over $\{y: |x-y|<\del\}$ and over
$\{ y:\del\le |x-y|\}$ for $y\in [0,1]$.

Now
$${1\over {k_n}} \int_{|x-y|<\del} |f(y)-f(x)| \left[1 -(x-y)^2\right]^n \dd y$$
$$\phantom{1234}< {\eps\over 3}{1\over {k_n}} \int_{|x-y|<\del} \left[1 -(x-y)^2\right]^n \dd y
< {\eps\over 3}.$$
Furthermore
$${1\over {k_n}} \int_{{\del\le |x-y|}\atop {0\le y\le 1}}
|f(y)-f(x)| \left[1 -(x-y)^2\right]^n \dd y
\le {{2\|f\|}\over {k_n}} \int_{\del\le |u|\le 1}
[1 -u^2]^n \dd u $$
$$\le 2\|f\| \sqrt{n} (1-\del^2)^n \left(1-{1\over n}\right)^{-n} < \eps/3.$$
Finally
$$\!\!\! |f(x)| \left| 1 - {1\over {k_n}} \int_0^1  \left[1 -(x-y)^2\right]^n \dd y
\right|$$
$$\phantom{1234} \le {\|f\|\over {k_n}} \left| \int_1^1 [1 -u^2]^n \dd u -
\int_{-x}^{1-x} [1 -u^2]^n \dd u \right|.$$
Since $x\in [a,b]$ and $ \del< \min\{a,1-b\}$, we have
$${\|f\|\over {k_n}} \left| \int_{-1}^1 [1 -u^2]^n \dd u -
\int_{-x}^{1-x} [1 -u^2]^n \dd u \right| \le {\|f\|\over {k_n}}
\int_{\del\le |u|\le 1} [1 -u^2]^n \dd u$$
$$\le
\|f\| \sqrt{n} (1-\del^2)^n \left(1-{1\over n}\right)^{-n} < \eps/3.$$
This proves the result.\eop

\medskip
For completeness and as a matter of interest, it easily follows from
integration by parts that
$$k_n= \int_{-1}^1 [1 -u^2]^n \dd u = {{2^{2n+1} (n!)^2}\over {(2n+1)!}}.$$
Applying Stirling's formula it may be shown that
$$\lim_{n\to\infty} \sqrt{n} k_n = \sqrt{\pi}.$$

The following is a variation on and simplification of Landau's proof. It
is due to Jackson [1934]. As above, assume $f\in C[a,b]$ with $0<a<b<1$.
Extend $f$ to be a continuous function on all of $\RR$ which also vanishes
identically off $[0,1]$. This latter fact, together with a change of variable
argument, gives
$$\eqalign{p_n(x) =& {1\over {k_n}}\int^1_0 f(y) \left[1 -
(x-y)^2\right]^n \dd y\cr
=& {1\over {k_n}}\int_{-1}^1 f(x+u)(1-u^2)^n \dd u\cr}$$
and thus we get the simpler
$$p_n(x)-f(x) = {1\over {k_n}}\int_{-1}^1
\left[f(x+u)-f(x)\right] (1-u^2)^n \dd u.$$
Let $\eps$ and $\del$ be as above. For $|u|\ge \del$, we have
$$|f(x+u) - f(x)| \le 2\|f\| \le {{2\|f\| u^2}\over {\del^2}},$$
while for $|u|<\del$ we have
$$|f(x+u) - f(x)| < {\eps\over 3}.$$
Thus
$$|f(x+u) - f(x)| <  {\eps\over 3}+  {{2\|f\| u^2}\over {\del^2}}$$
for all $x,u\in [0,1]$. Substituting it follows that
$$|p_n(x)-f(x)| < {1\over {k_n}}\int_{-1}^1
{\eps\over 3} (1-u^2)^n \dd u + {1\over {k_n}}\int_{-1}^1
{{2\|f\| u^2}\over {\del^2}} (1-u^2)^n \dd u$$
$$ =  {\eps\over 3} + {{2\|f\|}\over {\del^2 k_n}}  \int_{-1}^1
u^2(1-u^2)^n \dd u.$$
Set
$$j_n= \int_{-1}^1 u^2(1-u^2)^n \dd u.$$
Integration by parts yields
$$j_n = {{-u(1-u^2)^{n+1}}\over {2(n+1)}}\Big|_{-1}^1 + \int_{-1}^1
{{(1-u^2)^{n+1}}\over {2(n+1)}} \dd u = {{k_{n+1}}\over {2(n+1)}}.$$
Since $(1-u^2)\le 1$ on $[-1,1]$ we also have $k_{n+1}\le k_n$. Thus
$$j_n \le {{k_n}\over {2(n+1)}}.$$
Substituting we obtain
$$|p_n(x)-f(x)| < {\eps \over 3} + {\|f\| \over {\del^2(n+1)}}.$$
We now choose $n$ sufficiently large so that
$$| p_n(x)-f(x) | < \eps$$
for all $x\in [0,1]$ and thus on $[a,b]$.

For much more concerning the ``Landau'' polynomials, see Butzer, Stark
[1986], and the many references therein.

\smallskip
A few months after the appearance of Landau [1908], Lebesgue
``responded'' with Lebesgue [1908] which appeared in the same
journal and is an ``extract from a letter addressed to
E.~Landau''. Despite Lebesgue's flowery opening {\it Je me
f\'elicite de m'etre rencontr\'e avec vous sur un point
particulier ...}, Lebesgue then goes on to inform Landau that he
actually had the same proof for more than two years, but his
manuscript was not yet ready (he is probably referring to his
treatise Lebesgue [1909]). But since Landau did publish, then
Lebesgue feels called upon to tell Landau (and the world) about
some of his reflections on this matter. Aside from the
entertainment value of this exchange between two stars, Lebesgue
does make two valid points. The first has less to do with Landau's
particular proof than with the proofs of {\W}, Picard, Fej\'er,
and Landau. Lebesgue notes that these proofs can and should be
considered within the general context of integral convolutions
with sequences of non-negative kernels, where the convolution
approaches the identity. This was subsequently elaborated upon in
Lebesgue [1909]. Furthermore in the latter half of this short
paper Lebesgue goes on to ask questions about the order of
approximation. This is a clear indication that the subject is
evolving.


\medskip\noindent {\bf De la Vall\'ee Poussin.}
The treatise de la
Vall\'ee Poussin [1908] also contains a proof of {\W}' theorem using this
exact same integral. In fact Ch.~J.~de la Vall\'ee Poussin (1866--1962)
devotes over 30 pages of his paper to a study of its various approximation properties
(and not only the question of density). A footnote on p.~197 therein
states that de la Vall\'ee Poussin was made aware of Landau's paper
only while editing his own paper. (Landau's paper appeared in January of 1908.)
So it seems that three outstanding mathematicians almost simultaneously
discovered this method of proving {\W}' theorem. As Landau states,
this integral had in fact already been introduced by
Stieltjes in a letter to Hermite dated September 12, 1893
(see Baillaud, Bourget [1905]).

In addition, de la Vall\'ee Poussin introduced, in the second half of de la Vall\'ee
Poussin [1908], what he regarded as the periodic analogues of the
Landau polynomials. These are
$$I_n(x) = {1\over {h_n}} \int_{-\pi}^\pi f(y) \left[ \cos
\left({{y-x}\over 2}\right)\right]^{2n} \dd y$$
where
$$h_n= \int_{-\pi}^\pi \left[ \cos
\left({y\over 2}\right)\right]^{2n} \dd y = {{\pi (2n)!}\over {2^{2n-1}
(n!)^2}}.$$
$I_n$ is a trigonometric polynomial of degree at most $n$. The proof of
the fact that the $I_n$ uniformly converge to $f$ for $f\in
\tilC[-\pi,\pi]$ is very similar to the proof of the analogous result
for the Landau polynomials. We will not repeat the proof here. For more
concerning this proof, this paper, and de la Vall\'ee Poussin's other
contributions to approximation theory, we recommend Butzer, Nessel
[1993].

\medskip\noindent {\bf Bernstein.}
What we will arbitrarily call the last of the early proofs of the {\W}
theorems is due to S.~N.~Bernstein (1880--1968) and appeared in
Bernstein [1912/13]. (The thesis advisor of Bernstein's first doctorate was
Picard.) This paper is reproduced in Stark [1981].
A translation into Russian appears in his somewhat
more accessible collected works. This proof is very different from
the previous proofs, and has had a profound impact in various areas. It
is here that Bernstein introduces what we today call Bernstein
polynomials.

The Bernstein polynomial of $f\in C[0,1]$ is defined by
$$B_n(x) =\sum_{m=0}^n f\left({m \over n}\right) {n \choose m} x^m
(1-x)^{n-m}.$$
Bernstein demonstrates, using probabilistic ideas, that the $B_n$ converge
uniformly to $f$ on $[0,1]$. The proof of this fact, as generally given
today, is slightly different from Bernstein's original proof and has the
added advantage of providing ``error estimates''. We will here present
Bernstein's original proof, although it is somewhat overinvolved.

Since $f\in C[0,1]$, given $\eps>0$ there exists a $\del>0$ such that
$$|x-y|<\del$$
implies
$$|f(x)-f(y)|<{\eps \over 2}$$
for all $x,y\in [0,1]$. Set
$${\overline f}(x) = \max \{ f(y): y\in [x-\del,x+\del]\cap [0,1]\}$$
and
$${\underline f}(x) = \min \{ f(y): y\in [x-\del,x+\del]\cap [0,1]\}.$$
Thus for each $x\in [0,1]$
$$0\le {\overline f}(x)- f(x) < {\eps\over 2},$$
and
$$0\le f(x) - {\underline f}(x) < {\eps\over 2}.$$

For fixed $\del>0$ as above, set
$$\eta_n(x) = \sum_{\{m: |x-(m/n)|>\del\}} {n \choose m} x^m
(1-x)^{n-m}.$$
From the decomposition
$$B_n(x) = \sum_{m=0}^n f\left({m \over n}\right) {n \choose m} x^m
(1-x)^{n-m}$$
$$\!\!\!\! =\sum_{\{m: |x-(m/n)|\le \del\}} f\left({m \over n}\right) {n \choose m} x^m
(1-x)^{n-m} $$
$$\phantom{12345}+
\sum_{\{m: |x-(m/n)|>\del\}} f\left({m \over n}\right) {n \choose m} x^m
(1-x)^{n-m},$$
it easily follows that
$${\underline f}(x)[1-\eta_n(x)] - \|f\|\eta_n(x) \le B_n(x) \le
{\overline f}(x)[1-\eta_n(x)] + \|f\|\eta_n(x).$$
Bernstein then states that according to Bernoulli's theorem there
exists an $N$ such that for all $n>N$ and all $x\in [0,1]$ we have
$$\eta_n(x)< {\eps \over {4\|f\|}}.$$
Thus as a consequence of
$$f(x) + [{\underline f}(x)- f(x)] -\eta_n(x)[\|f\| + {\underline f}(x)]
\le B_n(x)$$
and $$ B_n(x) \le
f(x) +[{\overline f}(x)-f(x)] +\eta_n(x)[\|f\|- {\overline f}(x)],$$
we obtain
$$f(x) - {\eps\over 2} - {\eps\over {4\|f\|}} 2\|f\| < B_n(x) < f(x)
+ {\eps\over 2} + {\eps\over {4\|f\|}}2\|f\|,$$
which gives
$$|B_n(x)-f(x)|<\eps$$
for all $x\in [0,1]$.

For completeness we now verify Bernstein's statement regarding $\eta_n(x)$.
(For a probabilistic explanation of this quantity and estimate,
see e.~g.~Levasseur [1984].) To this end confirm that
$$\sum_{m=0}^n {n \choose m} x^m (1-x)^{n-m} = 1$$
$$\sum_{m=0}^n {m\over n} {n \choose m} x^m (1-x)^{n-m} = x$$
and
$$\sum_{m=0}^n {{m^2}\over {n^2}} {n \choose m} x^m (1-x)^{n-m} =
x^2 + {{x(1-x)}\over n}.$$
Then
$$\eqalign{\eta_n(x) & = \sum_{\{m: |x-(m/n)|>\del\}} {n \choose m} x^m
(1-x)^{n-m}\cr
& \le \sum_{\{m: |x-(m/n)|>\del\}} \left({{x- {m\over n}}\over
\del}\right)^2 {n \choose m} x^m (1-x)^{n-m}\cr
& \le {1\over {\del^2}}\sum_{m=0}^n \left(x- {m\over n}\right)^2
{n \choose m} x^m (1-x)^{n-m}\cr
& = {1\over {\del^2}} \left[ x^2 - 2x\cdot x + x^2 +
{{x(1-x)}\over n} \right]\cr
& = {{x(1-x)}\over {n\del^2}}\cr
& \le {1\over {4n\del^2}}.\cr}$$
for all $x\in [0,1]$. Thus for each fixed $\del>0$ we can in fact choose
$N$ such that for all $n\ge N$ and all $x\in [0,1]$
$$\eta_n(x) < {\eps \over {4\|f\|}}.$$
This ends Bernstein's proof.

\medskip
Bernstein's proof is beautiful and elegant! It constructs in a simple, linear
(but unexpected) manner
a sequence of approximating polynomials depending explicitly on the
values of $f$ at rational points. No further information regarding
$f$ is used. This was not the first attempt to find a proof of the
Weierstrass theorem using a suitable partition of unity.
In Borel [1905, p.~79--82], which seems to have been the
first textbook devoted mainly to approximation theory, we find the
following formula for constructing a sequence of polynomials approximating
every $f\in C[0,1]$.

E.~Borel (1871--1956) proved that the sequence of polynomials
$$p_n(x) = \sum_{m=0}^n f\left({m\over n}\right) q_{n,m}(x)$$
uniformly approximates $f$ where the $q_{n,m}$ are fixed polynomials
independent of $f$. His $q_{n,m}$ are constructed as follows. Set
$$g_{n,m}(x) =\cases{0, & $\left|x- {m\over n}\right| > {1\over n}$\cr
nx-(m-1), & ${{m-1}\over n} \le x\le {m\over n}$\cr
-nx+(m+1), & ${{m}\over n} \le x\le {{m+1}\over n}.$\cr}$$
Note that the $g_{n,m}$ are non-negative, sum to 1, and
$g_{n,m}(m/n)=1$. Let (by the {\W} theorem) $q_{n,m}$ be any polynomial
satisfying
$$|g_{n,m}(x)-q_{n,m}(x)| < {1\over {n^2}}$$
for all $x\in [0,1]$. It is now not difficult to verify that the $p_n$
do approximate $f$. However the Bernstein polynomials are so much more
satisfying in so many ways.

\medskip\noindent {\bf Kuhn's Proof.}
There are many elegant and simple proofs of {\W}' theorem. But
perhaps the most elementary proof (of which we are aware) is the
following due to Kuhn [1964]. Kuhn's proof uses one basic
inequality, namely Bernoulli's inequality
$$(1+h)^n \ge 1+ nh$$
which is valid for $h\ge -1$ and $n\in \NN$.

We present Kuhn's proof except that we save a step by
recalling (see (3.1)) that we need only approximate
continuous polygonal lines which we can write as
$$g(x) = g_1(x) +\sum_{i=1}^{m-1} [ g_{i+1}(x)-g_i(x)]h(x-x_i)$$
where the $0=x_0<x_1<\cdots<x_m=1$ are the abscissae of the polygonal line
$g$, each $g_i$ is linear, $g_{i+1}-g_i$ vanishes at $x_i$, and
$$h(x) =\cases{1,& $x\ge 0$\cr 0, & $x<0$\cr}.$$
This form was used in the proofs of Runge/Phragm\'en, of Mittag-Leffler
and of Lebesgue. In fact, in the first two of these proofs it was noted
that it suffices to find a sequence of polynomials bounded on $[-1,1]$
and approximating $h$ uniformly on $[-1,-\del]\cup [\del,1]$, for any
given $\del>0$.

Kuhn simply writes down such a sequence of polynomials, namely
$$p_n(x) =\left[ 1 - \left({{1-x}\over 2}\right)^n\right]^{2^n}.$$
(Note that the polynomials $\{x[2p_n(x)-1]\}$ uniformly converge to
$|x|$ on $[-1,1]$. See Lebesgue's proof.)

It is more convenient to consider the simpler
$$q_n(x) = (1-x^n)^{2^n},$$
which is just a shift and rescale of $p_n$. On $[0,1]$ the $q_n$ are
decreasing and satisfy $q_n(0)=1$, $q_n(1)=0$. The requisite facts
concerning the $p_n$ therefore reduce to showing
$$\lim_{n\to\infty} q_n(x) = \cases{1,& $0\le x < 1/2$ \cr 0, & $1/2<x\le
1$ \cr}.$$

Let $x\in [0,1/2)$. Then from Bernoulli's inequality
$$1 \ge q_n(x) = (1-x^n)^{2^n} \ge 1 - (2x)^n.$$
Since $0\le 2x <1$, we have
$$\lim_{n\to\infty} q_n(x)=1.$$
Let $x\in (1/2, 1)$. Then using Bernoulli's inequality we obtain
$${1\over {q_n(x)}} = {1\over {(1-x^n)^{2^n}}}=
\left( 1+ {{x^n}\over {1-x^n}}\right)^{2^n} \ge
1+ {{(2x)^n}\over {1-x^n}} > (2x)^n$$
and thus
$$0< q_n(x) < {1\over {(2x)^n}}.$$
As $2x>1$, it follows that
$$\lim_{n\to\infty} q_n(x)=0.$$
The monotonicity of the $q_n$ implies that this approximation
is appropriately uniform. This ends Kuhn's proof.


\References


\refB Baillaud, B., Bourget, H.;  Correspondance d'Hermite et de
Stieltjes; Tome I, Gauthier-Villars (Paris); 1905 ;

\refB Bell, E.~T.; Men of Mathematics; Scientific Book Club
(London); 1936;

\refJ Bernstein, S.~N.; D\'emonstration du th\'eor\`eme de
Weierstrass fond\'ee sur le calcul des probabilit\'es; Comm.\
Soc.\ Math.\ Kharkow; 13; 1912/13; 1--2; Also appears in Russian
translation in Bernstein's Collected Works.

\refJ du Bois-Reymond, P.; Untersuchungen \"uber die Convergenz
und Divergenz der Fourierschen Darstellungsformeln; Abhandlungen
der Mathematisch-Physicalischen Classe der K.\ Bayerische Akademie
der Wissenshaften; 12; 1876; 1--13;

\refB Borel, \'E.; Lecons sur les Fonctions de Variables R\'eelles
et les D\'eveloppe\-ments en S\'eries de Polynomes;
Gauthier-Villars (Paris); 1905; (2nd edition, 1928).

\refB Bourbaki, N.; Topologie G\'en\'erale (Livre III). Espaces
Fonctionnels Dictionnaire (Chapitre X); Hermann \& Cie (Paris);
1949;

\refJ Butzer, P.~L., Nessel, R.~J.; Aspects of de la Vall\'ee
Poussin's work in approximation and its influence; Archive Hist.\
Exact Sciences; 46; 1993; 67--95;

\refQ Butzer, P.~L., Stark, E.~L.;  The singular integral of
Landau alias the Landau polynomials - Placement and impact of
Landau's article ``\"Uber die Approximation einer stetigen
Funktion durch eine ganze rationale Funktion''; (Edmund Landau,
Collected Works, Volume 3), P.~T.~Bateman, L.~Mirsky,
H.~L.~Montgomery, W.~Schall, I.~J.~Schoenberg, W.~Schwarz,
H.~Wefelscheid (eds.), Thales-Verlag (Essen); 1986; 83--111;

\refD Cakon, R.; Alternative Proofs of Weierstrass Theorem of
Approximation: An Expository Paper; Master's Thesis, Department of
Mathematics, The Pennsylvania State University; 1987;

\refB Dieudonn\'e, J.; Foundations of Modern Analysis; Academic
Press (New York); 1969;

\refJ Dirichlet, L.; Sur la convergence des s\'eries
trigonom\'etriques qui servent \`a repr\'esenter une fonction
arbitraire entre des limites donn\'ees; J.\ f\"ur Reine und
Angewandte Math.;  4; 1829; 157--169;

\refB Feinerman, R.~P., Newman, D.~J.; Polynomial Approximation;
Will\-iams and Wilkins Co. (Baltimore); 1974;

\refJ Fej\'er, L.; Sur les fonctions born\'ees et int\`egrables;
Comptes Rendus Acad.\ Sci.\ Paris; 131; 1900; 984--987;

\refJ Gray, J.~D.; The shaping of the Riesz representation
theorem: A chapter in the history of analysis; Arch.\ Hist.\ Exact
Sciences; 31; 1984; 127--187;

\refJ Jackson, D.; A proof of Weierstrass's theorem; Amer.\ Math.\
Monthly; 41; 1934; 309--312;

\refJ Kuhn, H.; Ein elementarer Beweis des Weierstrassschen
Approximationssatzes; Arch.\ Math.; 15; 1964; 316--317;

\refJ Landau, E.; \"Uber die Approximation einer stetigen Funktion
durch eine ganze rationale Funktion; Rend.\ Circ.\ Mat.\ Palermo;
25; 1908; 337--345;

\refJ Lebesgue, H.; Sur l'approximation des fonctions; Bull.\
Sciences Math.; 22; 1898; 278--287;

\refJ Lebesgue, H.; Sur la repr\'esentation approch\'ee des
fonctions,; Rend.\ Circ.\ Mat.\ Palermo; 26; 1908; 325--328;

\refJ Lebesgue, H.; Sur les in\'tegrales singuli\`eres; Ann.\
Fac.\ Sci.\ Univ.\ Toulouse; 1; 1909; 25--117;

\refJ Lerch, M.; O hlavni vete theorie funkci vytvorujicich (On
the main theorem on generating functions); Rozpravy Ceske Akademie
v.~Praze; 1; 1892; 681--685;

\refJ Lerch, M.; Sur un point de la th\`eorie des fonctions
g\'en\'eratices d'Abel; Acta Math.; 27; 1903; 339--351;

\refJ Levasseur, K.~M.; A probabilistic proof of the Weierstrass
approximation theorem; Amer.\ Math.\ Monthly; 91; 1984; 249--250;

\refX MacTutor. [2004] {\tt
http://www-groups.dcs.st-and.ac.uk/$\sim$history}

\refJ Mittag-Leffler, G.; Sur la repr\'esentation analytique des
functions d'une variable r\'eelle; Rend.\ Circ.\ Mat.\ Palermo;
14; 1900; 217--224;

\refB Natanson, I.~P.; Constructive Function Theory, Volume I;
Frederick Ungar (New York); 1964;

\refB Ostrowski, A.;  Vorlesungen \"uber Differential-und
Integralrechnung, Volume II; Birkh\"auser (Zurich); 1951;

\refJ Picard, E.; Sur la repr\'esentation approch\'ee des
fonctions; Comptes Rendus Acad.\ Sci.\ Paris; 112; 1891a;
183--186;

\refB Picard, E.; Trait\'e D'Analyse; Tome I, Gauthier-Villars
(Paris); 1891b;  (Many subsequent editions followed).

\refQ Picard, E.; Lectures on Mathematics; (Clark University
1880-1899 Decennial Celebration), W.~E.~Story and L.~N.~Wilson
(eds.), Norwood Press (Norwood, Mass.); 1899; 207--259;

\refJ Pinkus, A.; Weierstrass and Approximation Theory; J.\
Approx.\ Theory;  107; 2000; 1-66;

\refB Rivlin, T.~J.; The Chebyshev Polynomial; John Wiley (New
York); 1974;

\refJ Runge, C.; Zur Theorie der eindeutigen analytischen
Functionen; Acta Math.; 6; 1885; 229--244;

\refJ Runge, C.; \"Uber die Darstellung willk\"urlicher
Functionen; Acta Math.; 7; 1885/86; 387--392;

\refJ Schwarz, H.~A.; Zur Integration der partiellen
Differentialgleichung $\partial^2 u/ \partial x^2$  $+ \partial^2
u/ \partial y^2= 0$; J.\ f\"ur Reine und Angewandte Math.; 74;
1871; 218--253;

\refJ Siegmund-Schultze, R.; Der Beweis des Weierstrasschen
Approximationssatzes 1885 vor dem Hintergrund der Entwicklung der
Fourieranalysis;  Historia Math.; 15; 1988; 299--310;

\refJ Skrasek, J.; Le centenaire de la naissance de Matyas Lerch;
Czech.\ Math.\ J.; 10; 1960; 631--635;

\refQ Stark, E.~L.; Bernstein-Polynome, 1912--1955; (Functional
Analysis and Approximation), P.~L.~Butzer, B.~Sz.-Nagy, and
E.~G\"orlich (eds.), ISNM 60, Birkh\"auser (Basel); 1981;
443--461;

\refB Sz.-Nagy, B.; Introduction to Real Functions and Orthogonal
Expansions; Oxford Univ.~Press (New York); 1965;

\refB Todd, J.; Introduction to the Constructive Theory of
Functions; CalTech Lecture Notes (); 1961;

\refJ de la Vall\'ee Poussin, Ch.~J.; Sur l'approximation des
fonctions d'une variable r\'eelle et leurs d\'eriv\'ees par des
polynomes et des suites limit\'ees de Fourier; Bull.\ Acad.\
Royale Belgique; 3; 1908; 193--254;

\refJ de la Vall\'ee Poussin, Ch.~J.; L'approximation des
fonctions d'une variable r\'eelle; L'Enseign.\ Math.; 20 ; 1918 ;
5--29;

\refB de la Vall\'ee Poussin, Ch.~J.; Le\c cons sur
L'Approximation des Fonctions d'une Variable R\'eelle ;
Gauthier-Villars (Paris); 1919;  Also in ``L'Approximation'',
Chelsea, New York, 1970.

\refJ Volterra, V.; Sul principio di Dirichlet; Rend.\ Circ.\
Mat.\ Palermo; 11; 1897; 83--86;

\refJ Weierstrass, K.; \"Uber die analytische Darstellbarkeit
sogenannter will\-k\"ur\-li\-cher Functionen einer reellen
Ver\"anderlichen; Sitzungsberichte der Akademie zu Berlin; ; 1885;
633--639 and 789--805; (This appeared in two parts. An expanded
version of this paper with ten additional pages also appeared in
Weierstrass' {\sl Mathematische Werke}, {\bf Vol.~3}, 1--37, Mayer \& M\"uller,
Berlin, 1903.)

\refJ Weierstrass, K.; Sur la possibilit\'e d'une repr\'esentation
analytique des fonctions dites arbitraires d'une variable
r\'eelle; J.\ Math.\ Pure et Appl.; 2; 1886; 105--113 and 115-138;
(This is a translation of Weierstrass [1885] and, as the original,
it appeared in two parts and in subsequent issues, but under the
same title. This journal was, at the time, called {\sl Journal de
Liouville})


{

\bigskip\obeylines
Allan Pinkus
Department of Mathematics
Technion, I.~I.~T.
Haifa, 32000
Israel
{\tt pinkus@tx.technion.ac.il}
{\tt http://www.math.technion.ac.il/\~{}pinkus}

}

\end

.