https://sidsite.com/posts/fourier-nets/

sidsite

The Fourier transform is a neural network

Published: Apr 29, 2021

"Can neural networks learn the Fourier transform?"

We can consider the the discrete Fourier transform (DFT) to be an
artificial neural network: it is a single layer network, with no
bias, no activation function, and particular values for the weights.
The number of output nodes is equal to the number of frequencies we
evaluate.

Here is the DFT:

\[y_k = \displaystyle \sum_{n=0}^{N-1} x_n \cdot \exp(-i\frac{2 \pi
k}{N}n) \tag{1}\]

Where $k$ is the number of cycles per $N$ samples, $x_n$ is the
signal's value at sample $n$, and $N$ is the length of the signal.

(A signal can be written as the sum of sinusoids. $y_k$ is a complex
value that gives us information about the sinusoid of frequency $k$
in signal $\boldsymbol{x}$; from $y_k$ we can compute the amplitude
and phase (i.e. location) of the sinusoid.)

We can rewrite (1) using matrix multiplication:

\[y_k = [x_0, x_1,... x_{N-1}] \begin{bmatrix} \exp(0)\\ \exp(-i\frac
{2 \pi k}{N})\\ \exp(-i\frac{2 \pi k}{N}2)\\ \exp(-i\frac{2 \pi k}{N}
3)\\ \exp(-i\frac{2 \pi k}{N}4)\\ \vdots \\ \exp(-i\frac{2 \pi k}{N}
(N-1)) \end{bmatrix}\]

This gives us the Fourier value for a particular $k$. However, we
most commonly want to compute the full frequency spectrum, i.e.
values of $k$ from $[0, 1, ... N-1]$. We can use a matrix for this ($k$
is incremented column-wise, and $n$ row-wise):

$$ [y_0... y_{N-1}] = [x_0... x_{N-1}] \begin{bmatrix} \exp(0) & \exp
(0) & \exp(0) & \dots & \exp(0) \\ \exp(0) & \exp(-i\frac{2\pi}{N}) &
\exp(-i\frac{2\pi2}{N}) & \dots & \exp(-i\frac{2\pi(N-1)}{N}) \\ \exp
(0) & \exp(-i\frac{2\pi}{N}2) & \exp(-i\frac{2\pi2}{N}2) & \dots & \
exp(-i\frac{2\pi(N-1)}{N}2)\\ \exp(0) & \exp(-i\frac{2\pi}{N}3) & \
exp(-i\frac{2\pi2}{N}3) & \dots & \exp(-i\frac{2\pi(N-1)}{N}3)\\ \
vdots & \vdots & \vdots & & \vdots \\ \exp(0) & \exp(-i\frac{2\pi}{N}
(N-1)) & \exp(-i\frac{2\pi2}{N}(N-1)) & \dots & \exp(-i\frac{2\pi
(N-1)}{N}(N-1)) \end{bmatrix} $$

More concisely:

\[\boldsymbol{y} = \boldsymbol{x} W_{Complex Fourier} \tag{2}\]

This should look familiar, because it is a neural network layer with
no activation function and no bias. The matrix of exponentials
contains our weights, which we'll call "complex Fourier weights".
Usually we don't know the weights of our neural networks in advance,
but in this case we do.

Non-complex

However, we typically don't use complex numbers for neural networks.
To fit in with this, we'll double the size of our matrix. The left
half will contain the real values, and the right half the imaginary
values.

Recall that:

\[e^{-i\theta} = cos(\theta) - i sin(\theta)\]

Substituting in $ \theta = \frac{2 \pi k}{N}n $ from our DFT
expression [1], we get:

\[\exp(-i \frac{2 \pi k}{N}n) = cos(\frac{2 \pi k}{N}n) - i sin(\frac
{2 \pi k}{N}n)\]

Then we use the real part (the $cos$ term) for the left half of our
matrix, and the imaginary part (the $sin$ term) for the right half.

$$ \hat{\boldsymbol{y}} = [x_0 ... x_{N-1}] \begin{bmatrix} \cos(0) &
\cos(0) & \dots & \cos(0) & -\sin(0) & -\sin(0) & \dots & -\sin(0)\\
\cos(0) & \cos(\frac{2\pi}{N}) & \dots & \cos(\frac{2\pi(N-1)}{N}) &
-\sin(0) & -\sin(\frac{2\pi}{N}) & \dots & -\sin(\frac{2\pi(N-1)}{N})
\\ \cos(0) & \cos(\frac{2\pi}{N}2) & \dots & \cos(\frac{2\pi(N-1)}{N}
2) & -\sin(0) & -\sin(\frac{2\pi}{N}2) & \dots & -\sin(\frac{2\pi
(N-1)}{N}2)\\ \cos(0) & \cos(\frac{2\pi}{N}3) & \dots & \cos(\frac{2\
pi(N-1)}{N}3) & -\sin(0) & -\sin(\frac{2\pi}{N}3) & \dots & -\sin(\
frac{2\pi(N-1)}{N}3)\\ \vdots & \vdots & & \vdots & \vdots & \vdots &
& \vdots\\ \cos(0) & \cos(\frac{2\pi}{N}(N-1)) & \dots & \cos(\frac{2
\pi(N-1)}{N}(N-1)) & -\sin(0) & -\sin(\frac{2\pi}{N}(N-1)) & \dots &
-\sin(\frac{2\pi(N-1)}{N}(N-1)) \end{bmatrix} $$

...More concisely:

\[\hat{\boldsymbol{y}} = \boldsymbol{x} W_{Fourier} \tag{3}\]

We'll call $W_{Fourier}$, "Fourier weights". Note that $\hat{\
boldsymbol{y}}$ and $\boldsymbol{y}$ effectively contain the same
information, but $\hat{\boldsymbol{y}}$ doesn't use complex numbers,
so it is twice as long. In other words, we could compute amplitudes
and phases using either $W_{Fourier}$ or $W_{Complex Fourier}$, but
we probably want to use $W_{Fourier}$ in order to avoid complex
numbers.

We are now equipped to add Fourier layers to our networks, if we so
choose...

Computing the Fourier transform using Fourier weights

Let's implement the method shown above (non-complex, expression 3),
and check it is correct using the fast Fourier transform.

import numpy as np

def create_fourier_weights(signal_length):
    "Create weights, as described above."
    k_vals, n_vals = np.mgrid[0:signal_length, 0:signal_length]
    theta_vals = 2 * np.pi * k_vals * n_vals / signal_length
    return np.hstack([np.cos(theta_vals), -np.sin(theta_vals)])

# Generate data:
signal_length = 64
x = np.random.random(size=[1, signal_length]) - 0.5

# Compute Fourier transform using method described above:
W_fourier = create_fourier_weights(signal_length)
y = np.matmul(x, W_fourier)

# Compute Fourier transform using the fast Fourier transform:
fft = np.fft.fft(x)
y_fft = np.hstack([fft.real, fft.imag])

# Compare the results:
print('rmse: ', np.sqrt(np.mean((y - y_fft)**2)))

rmse:  1.5429611622665193e-14

The tiny error confirms that we are computing what we expect.

Another way of checking is to reconstruct the signal:

import matplotlib.pyplot as plt

y_real = y[:, :signal_length]
y_imag = y[:, signal_length:]
tvals = np.arange(signal_length).reshape([-1, 1])
freqs = np.arange(signal_length).reshape([1, -1])
arg_vals = 2 * np.pi * tvals * freqs / signal_length
sinusoids = (y_real * np.cos(arg_vals) - y_imag * np.sin(arg_vals)) / signal_length
reconstructed_signal = np.sum(sinusoids, axis=1)

print('rmse:', np.sqrt(np.mean((x - reconstructed_signal)**2)))
plt.subplot(2, 1, 1)
plt.plot(x[0,:])
plt.title('Original signal')
plt.subplot(2, 1, 2)
plt.plot(reconstructed_signal)
plt.title('Signal reconstructed from sinusoids after DFT')
plt.tight_layout()
plt.show()

rmse: 2.3243522568191728e-15

png

Learning the Fourier transform via gradient-descent

We don't need to pre-compute the weights, like we did above, we can
instead learn them.

First, similarly to endolith (see GitHub Gist), we will use the FFT
to train a network to carry out the DFT:

import tensorflow as tf

signal_length = 32

# Initialise weight vector to train:
W_learned = tf.Variable(np.random.random([signal_length, 2 * signal_length]) - 0.5)

# Expected weights, for comparison:
W_expected = create_fourier_weights(signal_length)

losses = []
rmses = []

for i in range(1000):
    # Generate a random signal each iteration:
    x = np.random.random([1, signal_length]) - 0.5

    # Compute the expected result using the FFT:
    fft = np.fft.fft(x)
    y_true = np.hstack([fft.real, fft.imag])

    with tf.GradientTape() as tape:
        y_pred = tf.matmul(x, W_learned)
        loss = tf.reduce_sum(tf.square(y_pred - y_true))

    # Train weights, via gradient descent:
    W_gradient = tape.gradient(loss, W_learned)
    W_learned = tf.Variable(W_learned - 0.1 * W_gradient)

    losses.append(loss)
    rmses.append(np.sqrt(np.mean((W_learned - W_expected)**2)))

Final loss value 1.6738563548424711e-09
Final weights' rmse value 3.1525832404710523e-06

png

png

This confirms that neural networks are capable of learning the
discrete Fourier transform. However, this example is contrived - if
we are going to train a single layer to learn the Fourier transform,
we might as well use create_fourier_weights directly (or
tf.signal.fft, etc.).

Learning the Fourier transform via reconstruction

We used the FFT above, to teach the network to perform the Fourier
transform. Let's not use it anymore, and instead learn the DFT by
training the network to reconstruct the input signal. (Similar to
autoencoders.)

W_learned = tf.Variable(np.random.random([signal_length, 2 * signal_length]) - 0.5)

tvals = np.arange(signal_length).reshape([-1, 1])
freqs = np.arange(signal_length).reshape([1, -1])
arg_vals = 2 * np.pi * tvals * freqs / signal_length
cos_vals = tf.cos(arg_vals) / signal_length
sin_vals = tf.sin(arg_vals) / signal_length

losses = []
rmses = []

for i in range(10000):
    x = np.random.random([1, signal_length]) - 0.5

    with tf.GradientTape() as tape:
        y_pred = tf.matmul(x, W_learned)
        y_real = y_pred[:, 0:signal_length]
        y_imag = y_pred[:, signal_length:]
        sinusoids = y_real * cos_vals - y_imag * sin_vals
        reconstructed_signal = tf.reduce_sum(sinusoids, axis=1)
        loss = tf.reduce_sum(tf.square(x - reconstructed_signal))

    W_gradient = tape.gradient(loss, W_learned)
    W_learned = tf.Variable(W_learned - 0.5 * W_gradient)

    losses.append(loss)
    rmses.append(np.sqrt(np.mean((W_learned - W_expected)**2)))

Final loss value 4.161919455121241e-22
Final weights' rmse value 0.20243339269590094

png

png

png

And we pretty much learn the Fourier transform. The learned weights
don't get as close to the Fourier weights as they did in the example
above, but the reconstructed signals are perfect.

Using amplitude + phase reconstruction

Let's do the same as above, but using amplitude and phase.

W_learned = tf.Variable(np.random.random([signal_length, 2 * signal_length]) - 0.5)

losses = []
rmses = []

for i in range(10000):
    x = np.random.random([1, signal_length]) - .5

    with tf.GradientTape() as tape:
        y_pred = tf.matmul(x, W_learned)
        y_real = y_pred[:, 0:signal_length]
        y_imag = y_pred[:, signal_length:]
        amplitudes = tf.sqrt(y_real**2 + y_imag**2) / signal_length
        phases = tf.atan2(y_imag, y_real)
        sinusoids = amplitudes * tf.cos(arg_vals + phases)
        reconstructed_signal = tf.reduce_sum(sinusoids, axis=1)
        loss = tf.reduce_sum(tf.square(x - reconstructed_signal))

    W_gradient = tape.gradient(loss, W_learned)
    W_learned = tf.Variable(W_learned - 0.5 * W_gradient)

    losses.append(loss)
    rmses.append(np.sqrt(np.mean((W_learned - W_expected)**2)))

Final loss value 2.2379359316633115e-21
Final weights' rmse value 0.2080118219691059

png

png

png

Once again we get perfect reconstructions, so we know that the
amplitudes and phases must be accurate. However, as before, the
learned weights aren't exactly the same as the Fourier weights (but
pretty close). So we can surmise that there is some flexibility in
the weights that result in the Fourier transform, and that gradient
descent takes us to a local optimum.

So, to conclude: neural networks can learn the Fourier transform.

And this post leaves unaddressed the questions:

  * What is the explanation for the flexibility in the weights? How
    much flexibility is there?
  * Do we ever benefit from explicitly putting Fourier layers into
    our models? Or are we better off using good architectures and
    assuming the model will learn what it needs to?

References

https://en.wikipedia.org/wiki/Discrete_Fourier_transform

Training neural network to implement discrete Fourier transform (DFT/
FFT) - 2018 - https://gist.github.com/endolith/
98863221204541bf017b6cae71cb0a89

Discrete Fourier Transform Computation Using Neural Networks - 2008 -
https://publik.tuwien.ac.at/files/PubDat_171587.pdf

Share:     
 
 
 
Please enable JavaScript to view the comments powered by Disqus. 

  * Sidney Radcliffe
  * sidneyradcliffe@gmail.com

  * sradc
  * RSS

The site of Sid.