[HN Gopher] The Fourier transform is a neural network
___________________________________________________________________
The Fourier transform is a neural network
Author : montebicyclelo
Score : 255 points
Date : 2021-04-29 12:15 UTC (10 hours ago)
(HTM) web link (sidsite.com)
(TXT) w3m dump (sidsite.com)
| cl3misch wrote:
| > We can consider the the discrete Fourier transform (DFT) to be
| an artificial neural network: it is a single layer network, with
| no bias, no activation function, and particular values for the
| weights.
|
| So this post essentially shows that the Fourier transform is a
| linear combination.
| mhh__ wrote:
| Hammer meet "nail"
| threatripper wrote:
| The Fourier transformation is a special case of a linear
| transformation. Linear neural networks can represent any linear
| transformation. Both can also be expressed as matrix
| multiplication.
|
| It's not really all that surprising if you know some of the math
| behind neural networks, matrix algebra, linear transformations or
| the Fourier transformation.
| amelius wrote:
| Yes but this article shows that the matrix coefficients of the
| Fourier transform can be learned.
| threatripper wrote:
| This is also not really surprising for a single layer neural
| network.
| visarga wrote:
| It might not be surprising when they implemented it by
| supervised learning, but when they also solved it by
| reconstructing the input it was like a rediscovery.
| madhadron wrote:
| That's because "learning" in neural networks is a fancy way
| of saying "curve fitting."
| amcoastal wrote:
| I dont think its surprising but its definitely a cool little
| exercise. Somewhat aside of the content of the post, I could
| see some use cases where specifically implementing DFT layers
| in your architectures could lead to improved performance over
| using just the raw inputs or activations. Noisy datasets, or
| synthetic data based transfer learning come to mind as
| potential uses for the DFT as a step in your ML pipeline.
| threatripper wrote:
| There is the FFT in TensorFlow already:
| https://www.tensorflow.org/api_docs/python/tf/signal/fft
| amelius wrote:
| And FFT ( _Fast_ Fourier Transform) is more efficient than
| matrix multiplication (EDIT: by a matrix with Fourier
| elements).
| bigbillheck wrote:
| You know, I'll go out on the limb and say (unhelpfully)
| that a FFT is more efficient than any general dense
| matvec.
| touisteur wrote:
| Bah, when you've maxxed out your cuda cores, who's to
| prevent you from reaching for those sweet tensor core
| 100+ tflops mat-mul now?
| mochomocha wrote:
| This is analogous to saying "an umbrella is better than a
| toaster". They are two different things.
| Joker_vD wrote:
| Why do people insist on comparing only things that are
| _same_? The same things are same, no point whatsoever
| comparing them: they 're same! No, you should only
| compare things that are different, that way you learn
| more about their differences.
|
| And what's better, an umbrella or a toaster, depends on
| the one's goals: if one'd like to fry some bread, toaster
| is more useful; if one'd like to cover from rain,
| umbrella is far superior, although toaster could used be
| too; if one'd like to drive a nail into the wall, toaster
| again is better; etc.
| mochomocha wrote:
| My statement was as approximate as the one I was replying
| to. If the original statement had been "you can use FFT
| to implement some operations involving matrix
| multiplication faster" I wouldn't have taken issue with
| it. But saying "FFT is faster than matrix multiplication"
| has little meaning.
| vladTheInhaler wrote:
| The naive implementation of the DFT in terms of a matrix-
| vector multiplication has time complexity O(n^2) [1]. The
| point is that the FFT approach is much faster than that
| implementation.
|
| [1] https://en.wikipedia.org/wiki/DFT_matrix
| amelius wrote:
| Not if you perform these operations in order to obtain a
| Fourier transform.
| ok123456 wrote:
| You can implement a DFT with matrix multiplication using
| the vandermonde matrix with roots of the unitary
| polynomial as the basis.
| cdavid wrote:
| Fourier transform, more exactly DFT, is a special case of
| matrix multiplication, when seen as a linear operator.
| The fact that you can do operations faster than the usual
| N*2 matrix / vector operation is due to the specific form
| of the underlying matrix (Van Der Mont matrix).
|
| So they are not that different: the specific structure of
| the Fourier basis allows for more efficient matrix *
| vector operation.
| layoutIfNeeded wrote:
| Vandermonde. He was not Dutch, he was French.
| https://en.wikipedia.org/wiki/Alexandre-
| Theophile_Vandermond...
| [deleted]
| hamiltonkibbe wrote:
| They are different things, but you can use both to
| protect you from the rain, and an umbrella does that more
| efficiently than a toaster. The drawback is that the
| umbrella doesn't generalize to the broader field of bagel
| toasting
| amatic wrote:
| On the other hand, if you don't yet know a lot of NN math, like
| me, it is super surprising. Definitely a cool way to connect
| FFT and NNs.
| thearn4 wrote:
| I've come to internalize the Fourier transform as a type of
| sparse matrix algorithm. I.e. something that implements a
| particular matrix-vector product without requiring explicit
| construction of said matrix.
| enchiridion wrote:
| Do you mean the FFT? I've been trying to wrap my head around
| the Fourier transform lately, and I can see the matrix
| connection to an FFT, but not the Fourier transform in
| general.
| bonoboTP wrote:
| The Fourier transform performs a change of basis into the
| eigenbasis of convolution.
|
| It's linear and so in the discrete case can be expressed as
| matrix multiplication (every discrete linear operation is
| expressible as matmul).
| srean wrote:
| And pushing the parantheses to factor out common
| subexpressions. Essentially exploiting the distributive law
| peter_retief wrote:
| https://math.stackexchange.com/questions/2569814/fourier-tra...
| layoutIfNeeded wrote:
| Ummmm... sure. A layer of a neural network is a linear operator
| followed by an arbitrary (nonlinear in practice) mapping. Which
| means that neural networks can trivially implement linear
| operators (like the Fourier transform) by making the second
| mapping to be identity. QED
| Vhano wrote:
| this is why linear dynamics should be a basic course before
| ML/DL. Next week OP discovers Kalman filter... and surprise,
| surprise!
| ajross wrote:
| Not to be too glib, but as a non-ML person with decent numerics
| ability:
|
| ... did this post just spend a ton of boring math explaining that
| a neural network has the ability to detect whether an image is
| bumpy or smooth?
|
| I mean, Fourier analysis has a very straightforward "intuitive"
| interpretation. That's the point of its value in applied domains,
| after all. Straightforward intuitive things are "neural" by their
| nature, no?
| cblconfederate wrote:
| If the fourier transform is written as two real parts, the
| universal approximation theorem says it can be used to
| approximate any function, the same way NNs do
| cantagi wrote:
| Great post!
|
| I once had to port a siamese neural network from Tensorflow to
| Apple's CoreML to make it run on an iPhone. Siamese neural
| networks have a cross convolution step which wasn't something
| CoreML could handle. But CoreML could multiply two layer outputs
| element-wise.
|
| I implemented it using a fourier transform (not a fast fourier
| transform), with separate re and im parts, since a fourier
| transform is just a matrix multiplication, and convolution is
| element-wise multiplication in the fourier domain. Unsurprisingly
| it was very slow.
| Gravityloss wrote:
| Of course. What about the Haar transform? Or digital filters?
| It's very much all about the same thing.
|
| You compare the original signal with some reference signals, look
| at how much it correlates. Then you save those correlations only.
| With the correlations and the reference signals, you can roughly
| reproduce the original signal. (Extremely paraphrased.)
| 6gvONxR4sf7o wrote:
| > [with regards to y = A x:] This should look familiar, because
| it is a neural network layer with no activation function and no
| bias.
|
| This line makes me think this might be satire. Are there really
| people who see y = A x and think neural networks?
|
| Otherwise, a better title would be "The Fourier transform is a
| linear operation. (neural networks can do linear operations too)"
| ... which makes it pretty boring.
| zeroxfe wrote:
| Taking this one more step forward, generating spectrograms with
| neural networks: https://0xfe.blogspot.com/2020/03/generating-
| spectrograms-wi...
| photonemitter wrote:
| I'll leave these here:
| https://en.wikipedia.org/wiki/Multiresolution_analysis
| https://en.wikipedia.org/wiki/Discrete_wavelet_transform
|
| And as a lot of people have mentioned in here, DFT is pretty much
| implicated in neural networks already because of the mathematics
| (especially in convolutional/correlational neural networks, which
| often make use of the convolution theorem (which is "just"
| fourier coefficient multiplication) to do the convolution)
|
| Extending this post it seems more interesting to look more
| generally at the correspondence with wavelet-transforms.
| phonebucket wrote:
| That was a fun read. While it might be unsurprising to some, it's
| a testament to breadth of applications of modern machine learning
| frameworks.
|
| I enjoyed this sentence in particular.
|
| > This should look familiar, because it is a neural network layer
| with no activation function and no bias.
|
| I thought it should look familiar because it's matrix
| multiplication. That it looks like a neural network layer first
| and foremost to some is maybe a sign of the times.
| 6gvONxR4sf7o wrote:
| > it's a testament to breadth of applications of modern machine
| learning frameworks.
|
| More like a testament to the the breadth of applications of
| linear algebra. It is absolutely remarkable what we're able to
| compute and analyze in the form of y = A x (hilbert spaces are
| a wild invention).
|
| But it really isn't a testament to modern ML frameworks in any
| way. The fourier transform has been easy to compute/fit in this
| exact way (fourier = linear problem + solving linear problems
| by optimization) by modern-at-the-time frameworks for over two
| centuries.
| slver wrote:
| Technically physical reality is also a neural network.
| ktln2 wrote:
| Just another example of
|
| https://en.wikipedia.org/wiki/Universal_approximation_theore...
| [deleted]
| throwawayffffas wrote:
| I have always thought of it the other way around, neural networks
| are a kind of transform similar to the Fourier transform. But you
| know, instead of a sum of sinusoid components you get a sum of
| weighted <insert your activation function>.
| RocketSyntax wrote:
| I think the author's point is that every data transformation can
| be represented as a neural network. Neural nets are just many
| degree polynomials.
| TaupeRanger wrote:
| "Multiplication of matrices is a neural network"
| omarhaneef wrote:
| Here is the special sauce: "We can consider the the discrete
| Fourier transform (DFT) to be an artificial neural network: it is
| a single layer network, with no bias, no activation function, and
| particular values for the weights. The number of output nodes is
| equal to the number of frequencies we evaluate."
|
| A single layer neural network is a sum of products, the basic
| Fourier equation is a sum of products.
|
| In this view there are lots of single layer neural networks out
| there. For me, it's the training algorithm (backprop) that sets
| apart the neural net.
| nerdponx wrote:
| In my mind, the _layers_ make the network "non-trivial". A
| Fourier transform is only a neural network in a trivial,
| definitional sense.
| hyperman1 wrote:
| The activation function is what keeps layers separated. When
| it isn't there, a pair of layers devolves to a matrix
| multiplication, which can be replaced by its resulting
| matrix, a single layer.
| omarhaneef wrote:
| That is a thoughtful distinction, too. I think you're right.
| 29athrowaway wrote:
| Biological neural networks do not use backpropagation, or at
| least, there's no evidence of that yet.
|
| Backpropagation is a placeholder for stuff we don't understand.
| etienne618 wrote:
| It seems that backpropagation might be a good abstraction for
| biological learing rules after all: "local predictive coding
| converges asymptotically (and in practice rapidly) to exact
| backprop gradients on arbitrary computation graphs using only
| local learning rules" from https://arxiv.org/abs/2006.04182
| ta988 wrote:
| Not exactly the same kind of backpropagation but signals fly
| the other way in neurons as well
| https://en.wikipedia.org/wiki/Neural_backpropagation
| [deleted]
| cl3misch wrote:
| Isn't "backpropagation" just a synonym for the partial
| derivative of the scalar cost function with respect to the
| weights? And in the mathematical formulation of biological
| neural networks these derivatives can't be computed
| analytically. Your comment sounds like "backpropagation" is
| some kind of natural phenomenon.
| 29athrowaway wrote:
| > Your comment sounds like "backpropagation" is some kind
| of natural phenomenon.
|
| No it's not, read again.
| leecarraher wrote:
| Alt title, can a neural network learn the fast fourier transform?
| SuchAnonMuchWow wrote:
| the discrete Fourier transform, not the fast fourrier
| transform: the learned algorithm runs in O(n2) and not O(n*log
| n)
| leecarraher wrote:
| oh then this is just approximating the values of the linear
| transform. I figured at least they'd use mlp to learn each
| layer of the FFT butterfly network, and the corresponding
| weights. What would be of interest would be if you could
| extend this for n length fft using an RNN network
| mikewarot wrote:
| I have zero doubt that given sufficient resources ANY function
| can be learned by a neural network.
|
| Why not just initialize part of the layers in a network to do the
| FFT, or DFT when you're setting up a system to learn, that it
| doesn't have to waste megawatt hours of power? Consider it a form
| of discretized / explicit transfer learning.
|
| Why not use the fact that you could just set the coefficients in
| a neural network to do DFTs or FFTs to utilize neural network
| chips as ersatz DSP chips, without learning?
|
| Just like when GPUs turned out to be useful for neural networks,
| the TPUs will turn out to be useful for DSP and other things.
| These neural network chips showing up can do a lot of cool things
| unrelated to AI, if we use them properly.
| ellisv wrote:
| Neural networks can learn many things but I can't imagine a
| practical use-case of this.
| SuchAnonMuchWow wrote:
| Here is an alternative idea around this:
|
| Winograd transform can be seen as a depthwise convolution with
| fixed weights.
|
| And similarly to DTF, winograd transform can be used to speed
| up convolutions: a 3x3 convolution can be implemented with a
| winograd transform + a 1x1 convolution + inverse winograd
| transform. This basically reduce the number of operations
| performed by 2.
|
| Now, if a neural network is able to learn the weights of a
| depthwise convolution to match the coefficients of a winograd
| transform, that means you should be able to do this
| transformation when describing the architecture of your
| network. That way, you don't rely on your deep learning
| framework to do the transformation for you.
|
| And more importantly you might end up with something more
| generic: perhaps the weights in the depthwise convolutions will
| not converge to the weights of a winograd transform, but to
| weight better suited to what you are trying to learn.
|
| This is the same ideas which lead to the development of CNN:
| replace the fixed weights of convolutions used in traditional
| computer vision algorithm with weights that are learned by the
| network.
| killerstorm wrote:
| I guess it is mostly an illustration that time-domain =>
| frequency-domain transformation is trivial for ANNs. And e.g.
| it won't make any sense to pre-process your signal with FFT
| before training ANN since it will do it itself if needed.
| raphaelj wrote:
| Well, you can still probably reduce the training
| computational cost by feeding the additional FFT features to
| your NN.
| ellisv wrote:
| The Fourier transform layer might not be the immediate
| hidden layer.
| ellisv wrote:
| That's a fair point. My assumption was that the neural
| network isn't going to be as efficient at the FFT but the
| article doesn't include that type of assessment.
| meiji163 wrote:
| > What is the explanation for the flexibility in the weights? How
| much flexibility is there?
|
| most of the terms are on the order of 1/N so on these would be
| negligible
| jhidding wrote:
| Next exercise: implement the _fast_ Fourier transform as a multi-
| layered network.
| mensetmanusman wrote:
| This is a great idea, I bet one could find a bit-faster fftw
| that would be close enough for many applications, but it would
| apply a statistical approach to each weight.
| wiggle_woof wrote:
| Nice idea. Turns out, doing this works well and has some nice
| properties https://dawn.cs.stanford.edu/2019/06/13/butterfly/
| threatripper wrote:
| This could be highly interesting. Especially when starting with
| the FFT butterfly structure for connections and the
| corresponding weights as network initialization and then
| adapting that to real data via training.
| Aardwolf wrote:
| So is the identity transform...
| srean wrote:
| I wish your comment is voted up. Where do we go next ? -- look
| addition is a neural network, look weighted average is a neural
| network, look linear regression, logistic regression, Poisson
| regression are neural networks ...
| Certhas wrote:
| I have definitely seen regressions referred to as linear
| machine learning. Not even kidding.
| bonoboTP wrote:
| So? Machine learning is not some buzzword fad. Regression
| is part of ML for sure.
|
| Some people seem to think ML means fancy deep learning
| GPT-3 transformer running on a TPU farm. Actually ML is a
| discipline that has existed for several decades and has
| various theoretical results too, including VC theory etc.
|
| It is also not the same as statistics. They are adjacent
| but different fields.
| Certhas wrote:
| Yeah, but no. It definitely is a buzzword that randomly
| gets attached to every method of data analysis these
| days. If you disagree, then please give me a definition
| that is both distinct from statistics but also means that
| Regression is an ML technique.
|
| BTW, that doesn't mean there isn't also very substantial
| work done under the umbrella term.
|
| Regression analysis was first introduced by Legendre in
| the early 19th century. I never would have dreamed to
| pretend it is Machine Learning 10 years ago when the term
| ML was still used somewhat more sparingly, and carried a
| bit more meaning...
|
| But "I will use ML to analyse the data" gets more funding
| than "I will run a regression on the data".
| nikk1 wrote:
| I agree ML is a buzzword and using a sentence like that
| would sound silly but I can elaborate. First, though,
| regression does not need a separate ML definition to make
| it distinct from statistics... there's tons of overlap in
| the field of mathematics.
|
| Machine learning is the study of computer algorithms that
| improve automatically through experience and by the use
| of data.
|
| This definition would include something as simple as
| linear regression.
|
| Linear regression attempts to model the relationship
| between two variables by fitting a linear equation to
| observed data.
|
| The purpose of a neural network is exactly the same as a
| line of best fit. That is, you are just approximating an
| unknown function based on input/output data. The only
| difference is that a NN can better approximate a
| nonlinear function.
| nonameiguess wrote:
| The generally accepted definition in the community is Tom
| Mitchell's:
|
| > A computer program is said to learn from experience E
| with respect to some task T and performance measure P, if
| its performance at task T, as measured by P, improves
| with experience E.
|
| Statistical estimation methods are one way to achieve
| this, but not the only way, especially when an exact
| function can be learned, i.e. a typical layer 2 switch is
| a learning device. You don't program in the mapping from
| connected device MAC addresses to switch port, the switch
| itself learns this from receiving a message from a MAC on
| a port and then records the mapping. That is a very
| simple form of non-statistical machine learning.
|
| I'm not really sure how you can start here but then say
| regression is not a form of machine learning.
| "Regression" is a pretty broad class of techniques that
| just means using labeled examples to estimate a function
| with a continuous response, typically contrasted with
| "classification," where the response is discrete. The
| method you use to do the function approximation may or
| may not be statistical. A genetic algorithm is not, for
| instance. I'm not sure least squares, which is what
| Legendre invented, should really be considered
| statistical, either. The original purpose was for
| approximating the solution to an overdetermined system of
| equations, before statistics was even formalized. It
| certainly became a preferred method in mathematical
| statistics, but mathematical statistics wasn't even
| formalized until later. It didn't start being called
| "regression" until Galton published his paper showing
| children of exceptionally tall or short people tended to
| regress to the mean, which was 80 years later. But you're
| performing regression analysis whether you use the normal
| equations, LU decomposition, QR decomposition, a genetic
| algorithm, gradient descent, stochastic gradient descent,
| stochastic gradient descent with dropout. Doesn't matter.
| As long as you're doing function approximation with a
| continuous response, it's still regression. Whether or
| not it can also be considered "machine" learning just
| depends on whether you're doing it by hand or via
| machine.
|
| Though sure, typically people tend to imagine the more
| exotic and newer techniques that scale to very large data
| sets and reduce overfitting and deal with noise
| automatically and involve hyperparameters, i.e. not least
| squares.
| bonoboTP wrote:
| I won't type up a chapter about this now but ML is about
| learning from data. It is concerned with
| training/validation/test data, the generalization error
| etc.
|
| Statisticians care more about "modeling", actually
| estimating parameters that stand in for something real,
| to check assumptions and hypotheses. The cultures are
| very different. What makes total sense to one may baffle
| the other as lunacy (there's more convergence now though,
| by realizing the complementary nature of the two
| approaches). The terminology is different too: in stats,
| they call logistic regression regression, while in ML it
| would have been called classification (but the name is
| now stuck). ML also tends to be more Bayesian than
| frequentist, as opposed to classical stats.
|
| I could write more but having taken ML courses before the
| big hype times, I can assure you ML doesn't just mean
| hyped up startup fluff vaporware.
|
| Open up Kevin Murphy's ML book and compare the TOC with a
| Statistics textbook. There is overlap but it's certainly
| different in its goals and mentality.
|
| It seems like some bitter bickering from the side of
| stats people that they didn't manage to hype up their
| field as much. Yeah they did have many of the same tools
| at their disposal but didn't use them in this way.
| 6gvONxR4sf7o wrote:
| You're setting up a strawman of statistics. You're
| describing a small part of a large field. Stats has been
| more than what you're describing for longer than ML has
| been a phrase.
|
| The only real useful definition these days is that stats
| is learning from data that happens in the stats
| department and ML is learning from data that happens in
| the CS department.
| bonoboTP wrote:
| I'm fine with that. I would also hope for more collab
| cross-departments, but the incentives in academia are
| unfortunately not exactly set up the right way. People
| become protective of "their turf".
|
| I wrote a bit more here:
| https://news.ycombinator.com/item?id=26984234
|
| I think it's good if people are aware of the origins of
| the methods they use. Regression wasn't invented under
| the umbrella of ML, but is analyzed from a particular
| angle in ML more than in stats.
| 6gvONxR4sf7o wrote:
| > Regression is part of ML for sure.
|
| > ML is a discipline that has existed for several decades
|
| If you're going to say regression is part of ML, you
| can't say ML has only existed for decades. If you define
| it with that expansive scope, it's existed for centuries.
| bonoboTP wrote:
| ML uses regression in a particular way with particular
| aims and has developed prior approaches like OLS and
| Ridge regression, lasso, kriging etc. into a particular
| theoretical framework and analyses them from the angle of
| learning, i.e. heavy focus on generalization from
| training data to test data.
|
| The truth is, these are tools that many communities have
| used in parallel and these communities are in constant
| flux regarding what promising directions they find and
| exploit and when they make a big splash in one place,
| others take notice and incorporate those ideas etc. There
| are no rigidly predefined disciplines and fields over
| long timescales.
|
| When electrical engineers and signal processing
| communities worked on similar things they called it
| "pattern recognition".
|
| Machine learning as a field started out with more
| ambition than mere regression and classification (and
| indeed covers a lot more ground), but it turns out that
| supervised learning has had the most practical success.
| But it's a research program and community that goes
| beyond that.
|
| Similarly, there are parallels and similar equations
| between control theory and reinforcement learning. And
| indeed some controllers can be expressed as reinforcement
| learning agents. But the aims of the two communities are
| not the same.
|
| Maybe people would be happier if "statistical learning"
| (which is also used) was used more instead of "machine
| learning"? But indeed as another comment points out, ML
| as a paradigm does not necessarily require learning of a
| statistical kind.
|
| Labels grow and wane in popularity, it doesn't mean it's
| the same thing repackaged, rather that the aims and the
| focus changes depending on what we find most productive
| and fruitful.
|
| For example many of these things were also called "soft
| computing" a few years ago, but that term is rarely seen
| nowadays.
| dunefox wrote:
| You mean as in linear and logistic regression, two popular
| classical ML models whose parameters can be optimised using
| gradient descent?
| aulin wrote:
| why linear though
| nikk1 wrote:
| Regression is an approach to machine learning
|
| https://en.wikipedia.org/wiki/Machine_learning#Regression_a
| n...
| visarga wrote:
| > Where do we go next ?
|
| Neural nets are all you need[*].
|
| [*] if what you need is non-robust black boxes
| enriquto wrote:
| of course (it's a single fully-connected layer, aka a linear
| map), but more interestingly, the FFT algorithm can be
| implemented as a very sparse network of logarithmic depth.
| ondrysak wrote:
| Given the fact that fourier basis, can be seen as polynomials on
| unit circle, this may be intresting for some of you.
|
| https://arxiv.org/abs/2008.07669
| AndrewGYork wrote:
| One of the questions raised at the end of the article:
|
| "Do we ever benefit from explicitly putting Fourier layers into
| our models?"
|
| Has a simple, partial answer here:
|
| https://docs.scipy.org/doc/scipy/reference/generated/scipy.l...
|
| "...multiplying a vector by the matrix returned by dft is
| mathematically equivalent to (but much less efficient than) the
| calculation performed by scipy.fft.fft."
|
| fft-as-a-matrix-multiply is _much_ slower to compute than a
| standard fft, especially on large input.
| modeless wrote:
| You can do the linear parts of neural nets in the frequency
| domain, but AFAIK you can't do the nonlinearity, so you have to
| inverse transform back to the spatial domain for that. The
| nonlinearity is an absolutely essential part of pretty much
| every neural net layer, so there is no big win to be had
| unfortunately. For convolutional nets in particular there are
| other ways of going faster than a naive matrix multiply, e.g.
| winograd convolution.
| aesthesia wrote:
| But if your network contains a layer whose linear part
| performs an (approximate) DFT, you will get an efficiency
| gain by replacing it with an exact FFT.
|
| You wouldn't want to use an FFT for most CNNs anyway because
| the kernels have very small support. Convolution with them is
| O(n) in the spatial domain as long as you recognize the
| sparsity.
| evancox100 wrote:
| Why can't you apply the nonlinear activation functions in the
| frequency domain? What's stopping this or making it not work?
| pontus wrote:
| Is there anything deep here? There's a parametrized
| representation of a function and it's being interpreted as a
| neural network, why is this surprising? It's like saying that
| Newton's second law is a neural network: F=ma can be written
| log(F) = log(m) + log(a). Aha, this is a neural network! The
| inputs are m and a, the first layer is sparsely connected with
| log activation functions, the second layer is fully connected
| with an exponential activation function:
|
| F = exp(c1 * o1 + c2 * o2) o1 = log(c3 * a + c4 * m) o2 = log(c5
| * a + c6 * m)
|
| If you feed it enough data you'll find c1=c2=c3=c6=1 and c4=c5=0.
|
| But saying that Newton's second law is a Neural Network, while
| correct, seems a bit deceptive in that it's not a deep idea at
| all.
| korijn wrote:
| Perhaps it is the compsci glasses talking, but this is just one
| very specific instance where someone figured out a way to map
| the DFT problem to a neural net. I agree that it is unfortunate
| that it is being presented as some kind of big discovery, and
| that the fundamental lesson is either still undiscovered to the
| author or just unclearly communicated, but there is still good
| intention in there (sharing something you've learned and
| eliciting feedback).
| haecceity wrote:
| Can be represented as a neural network is not the same as is a
| neural network??
| candiodari wrote:
| Well the point is that it does not make sense to DFT data
| before feeding it into a multilayered neural network (or
| calculating the force generated by mass and acceleration as you
| point out). Those formulas make no sense: the network can just
| learn them on the fly.
|
| In fact you'll find that this does not just work for the
| Fourier transform, but for any FIR filter (and some other
| classes), and therefore neural networks can deal with signals
| and construct low-pass filters, high-pass filters, bandgap
| filters, ... as required for the task at hand without the
| network (or it's designer) having any idea at all what is
| happening.
|
| I mean there's some basic assumptions these reasonings make
| (main one is that you need to feed many discretized values from
| a time window).
|
| Of course, a problem remains: local optima. Just because a
| neural network _can_ construct a filterbank or do a DFT, doesn
| 't mean that it _will_ actually do it when the situation
| warrants it. If there 's a local optimum without filters ...
| well, you may get unlucky. It there's many local optima without
| filters ... sucks to be you.
| ska wrote:
| > Those formulas make no sense: the network can just learn
| them on the fly.
|
| > Just because [...] can construct a filterbank or do a DFT,
| doesn't mean that it will actually do it when the situation
| warrants it.
|
| These statements seem in conflict, no?
| nonameiguess wrote:
| I don't think I can agree with that. You do a Fourier
| transform when the data you're working with doesn't admit
| easily or at all certain operations in the time domain but it
| does in frequency domain. If you already know that to be the
| case, preprocessing with a FFT is a better idea than hoping a
| neural network with enough layers uses a few of those to much
| less efficiently perform a DFT. Always take advantage of pre-
| existing knowledge of structure in your data. With the FFT
| especially, depending on how your data is being ingested, you
| might be able to use specialized DSPs that implement the FFT
| directly in hardware. These are cheap and easy to find since
| they're used in frequency-division multiplexing.
| dkarras wrote:
| Practically, yes, absolutely! Theoretically, we want the
| "black box" learning system to do such feature extraction
| itself. At least, that is the goal.
| whimsicalism wrote:
| > does not make sense to DFT data before feeding it into
| a multilayered neural network
|
| is false.
|
| "It would be nice if we didn't have to DFT data before
| feeding it into a multilayered neural network" is true,
| but a completely different statement.
| monocasa wrote:
| Which is weird, because biological neural nets seem to have
| some evolutionary pressure to do hardware fourier transforms
| before neurons even get involved.
|
| You can see this most clearly in the auditory system where
| the incoming signal is transformed into the frequency domain
| by the cochlea before the signal is received by the
| epithelial cells.
|
| Neurons absolutely love working in the frequency domain, but
| they seem to prefer to not be the ones to do the binning in
| the first place.
| philipswood wrote:
| This is probably because the neurons involved can't switch
| fast enough. Our neural hardware is simply too slow to work
| in the time domain directly.
| monocasa wrote:
| They do just fine with acoustic frequency ranges.
| philipswood wrote:
| Firing rates for neurons are in the order of 10hz - under
| the minimum acoustic frequencies of about 20hz.
|
| Peak neuron firing rates basically only touch the lower
| bounds of frequencies that need to be processed.
|
| Without the "hardware acceleration" of the cochlea that
| preprocesses the time domain signals to frequency domain
| first, basically our whole audio sensorium is out of
| processing range.
|
| Normal nervous signal transmission speeds are in the
| order of the speed of sound, switching rates are in Hz
| range. Voltages are in the 10s of millivolt range.
| nomel wrote:
| Are you suggesting that anything over 10Hz is perceived
| with the representation of a rate, rather than direct
| stimulus?
| monocasa wrote:
| Neuron spiking isn't rate encoded, it's temporal encoded.
| Ie. the distance from the next expected regular interval
| is the signal, not the spiking interval itself.
|
| Additionally there's large variations in neuron spiking
| rates.
| saltcured wrote:
| I think I understand your sentiment, but consider our
| tactile sense of vibrations. We are able to distinguish
| different frequencies with the nerves in the skin,
| without cochlea doing the frequency domain transform. I
| think there are very different spectral limits and
| aliasing artifacts, but it's not like we can only sense a
| 5 Hz vibration either...
| windsignaling wrote:
| It's a stretch to call it a neural network. It was already
| well-known that the Fourier transform can be seen as a matrix
| multiply which minimizes some least squares problem.
|
| This can be found somewhere in S.M. Kay's Fundamentals of
| Statistical Signal Processing: Estimation Theory (Vol 1),
| Detection Theory (Vol 2).
| montebicyclelo wrote:
| Op here, IMO the "deepest" bit is [1] - the network learns the
| DFT in order to reconstruct the signal, and is not explicitly
| trained on the DFT values from the FFT.
|
| Admittedly, I should have mentioned that any linear transform
| can be considered to be a single layer neural network (if you
| want to see the world through a neural network lens), and will
| add this to the post at some point.
|
| In fact, I have a series of posts planned, which will reveal
| that well known algorithms/models are actually neural
| networks...
|
| [1] https://sidsite.com/posts/fourier-nets/#learning-the-
| fourier...
| mochomocha wrote:
| You might be interested in this line of work:
| https://eng.uber.com/neural-networks-jpeg/
|
| Training straight from DCT coefficients to avoid spending
| time learning a similar representation in the bottom layers
| of the net. I've personally toyed with something similar on
| GANs to gauge the computational benefits of not doing
| convolutions in the bottom layers of a net but learning
| directly in a FFT-like compressed space instead.
| touisteur wrote:
| Isn't there something like Fouriernets and spdnets already,
| and stuff on Riemannian manifolds and my head is spinning?
| Followed Daniel Brook's work for some time, e.g.
| https://hal.archives-ouvertes.fr/hal-02290838
| hctaw wrote:
| > I should have mentioned that any linear transform can be
| considered to be a single layer neural network
|
| I think this should be turned around. A single layer neural
| network can be considered a linear mapping (but not
| necessarily an orthogonal transform or change of basis, like
| the DFT).
|
| A more clear example of this are adaptive filters, which are
| trained in real time using gradient descent.
|
| This is an important distinction because thinking of "X is a
| Neural Net" doesn't provide meaningful insight, whereas
| "Neural Nets with X properties are a case of linear dynamic
| systems, here's an example of how we can equate one linear
| transformation to a neural net" leads you to deeper
| conclusions on the analysis and synthesis of ANNs in the
| context of dynamics - which encompasses a much larger surface
| area than the DFT.
| hcrisp wrote:
| I've sometimes wondered how many signal processing kernels
| or filter architectures you could learn via SGD given the
| right data and search strategy. Maybe using something like
| Neural Architecture Search, so not just the weights /
| coefficients but the architecture too.
|
| Could it discover an interpolation filter like upfirdn? An
| IIR Butterworth filter (it would have to be recurrent)? A
| frequency transfer function derived from two signals?
|
| I imagined that the solutions it found wouldn't be "clean"
| but have other non-essential operators bloating them. Could
| it find new architectures that haven't been created from
| first principles?
| hctaw wrote:
| I think a big question here is, "what is the goal?" A key
| distinction is whether the NN is intended to handle the
| signal directly (the NN _is_ the filter) or if it 's just
| a technique for finding filter coefficients (the NN
| _designs_ the filter for the signal).
|
| For some tasks, like system identification (used in echo
| cancellation), the NN _is the filter_ - aformentioned
| adaptive filters are used for this case right now. It can
| also be used for black box modeling, which has numerous
| application in real time or otherwise.
|
| For others like Butterworth (and other classic designs)
| there's not really a good reason to use a NN. Butterworth
| (and Chebychev I/II, elliptical, optimum-L, and others)
| are filter design formulae with a closed form (for a
| given filter order) that yield roots of transfer
| functions that have desirable properties - I'm not sure
| how a learning approach can beat a formulae that are
| derived from the properties of the filter they design.
|
| There are iterative design algorithms that do not have a
| closed form, like Parks-McClellan. It is however _quite_
| good at what it does - it would be interesting to compare
| these methods against some design tricks to reduce filter
| order.
|
| There are some applications of filter design that NNs can
| do that we don't have good solutions for with traditional
| algorithms, like system identification, another is in
| optimization (in terms of filter order) and iteratively
| designing stable IIR filters to fit arbitrary frequency
| curves (for FIR it's a solved problem, at significant
| extra cost compared to IIR).
|
| As for topologies of the filter itself that is an
| interesting angle. Topologies affect quantization effects
| (if you wanted an FPGA to realize the filter with the
| fewest number of bits, topologies matter) and for time
| variant filters (like an adaptive IIR) there are
| _drastic_ differences between the behavior of various
| topologies. I 'm not sure how it would manifest with a NN
| designing the filter.
| hcrisp wrote:
| I meant learning the topology (I called it architecture)
| from scratch, or minimally from heuristics. What would I
| discover? Something that we recognize as an IIR filter?
| Probably not, but if it did, what would it discover that
| we didn't recognize over years of academic thought but
| _could_ have from first principles? To me that 's the
| intriguing angle.
| hctaw wrote:
| I don't think I follow. These designs _do_ come from
| first principles. In fact the reason we 've had so much
| academic thought put into it is because 100 years ago we
| designed systems empirically and a few smart people went
| back to first principles to find the fundamental
| constraints on filter networks.
| suvakov wrote:
| Actually, "learns" here means to fit reverse linear
| transformation a.k.a. inverse matrix. He defines reverse FT
| matrix and using gradient descent numerically converges to
| inverse matrix. Nothing more than inefficient way to solve
| system of linear equations.
| aesthesia wrote:
| As far as I can tell, you're still using a fixed inverse DFT
| as the reconstruction layer, so it's not just rediscovering
| the DFT on its own. Instead of learning a linear
| transformation from input-output pairs, it's learning the
| inverse of a linear transformation when that transformation
| is given as an oracle. It's not terribly surprising that this
| works, although there are probably some interesting issues of
| numerical conditioning in the general case.
| volta83 wrote:
| > But saying that Newton's second law is a Neural Network,
| while correct, seems a bit deceptive in that it's not a deep
| idea at all.
|
| I guess the point is that neither is the idea of a Neural
| Network.
| Nasrudith wrote:
| A neural network can also be set to learn unconditionally
| return a fixed value with no learning feedback. I don't think
| lower bounds on capabilities are very informative. So could
| many arbitrarily complex arrangements that do massive amounts
| of work only to discard it and return a constant. An upper
| bound of what an approach is capable of is more useful. Say
| no matter how vast a look up table is it will never return a
| different value for the same input regardless of prior
| sequence.
| Imnimo wrote:
| I think this is the sort of thing that is very obvious if you
| are already comfortable with neural networks and the FFT. But
| if you're only comfortable with neural networks, and the FFT
| feels like arcane magic, this exercise might be very
| instructive.
| xyzzy21 wrote:
| The "Deep" part ia to realize what this really means in terms
| of limitations of ML/NN!
| akrymski wrote:
| I'm really interested in training NNs in the frequency domain:
|
| - a ton of data (JPEGs, MPEGs, MFCCs) exists in the compressed
| form of FFT-ed features
|
| - FFTs are hardware encoded/decoded & optimised
|
| - FFTs are great at separating signal from noise, leading to a
| huge reduction in input features (eg JPEG encoded) as compared to
| raw pixel values
|
| - convolutions in the time domain are just multiplications in the
| frequency domain
|
| - converting to log polar coordinates turns them into additions
|
| - distance between two log polar coordinates could be the loss
| function
|
| However, I've not had much luck getting a frequency domain
| network like this to converge. Anyone tried something similar
| with success?
| ashertrockman wrote:
| The choice of activation function isn't entirely clear to me,
| but I think it's definitely possible to make a network that
| operates entirely in the frequency domain. It would probably be
| pretty easy to start experimenting with such a thing with the
| nice complex number and FFT support in PyTorch 1.8. :)
|
| Like you said, there's already a significant connection between
| convolutional networks and the Fourier domain (the convolution
| theorem).
|
| Tangentially, I've recently worked on a project that focused on
| implementing convolution in the Fourier domain, and how that
| allows one to control useful properties of convolutions (like
| "smoothness" and orthogonality).
|
| I made a demonstration of convolution in the Fourier domain in
| PyTorch, which you might find interesting:
| https://nbviewer.jupyter.org/github/locuslab/orthogonal-conv...
|
| More generally, you could look here for more code and the
| corresponding paper: https://github.com/locuslab/orthogonal-
| convolutions
| akrymski wrote:
| I don't see why processing JPEG coefficients directly as input
| features shouldn't have similar accuracy to a much larger NN
| using raw pixel data.
|
| In particular I'm interested in the efficiency gains from
| avoiding convolutions and the possibility of running a
| compressed frequency-domain NN on CPUs.
| mochomocha wrote:
| I have trained GANs on raw JPEG coefficients with moderate
| success as a pet project. Read the raw DCT coefficients without
| decompressing and train in this space. JPEG-decompress the
| output of the net to reconstruct an image in the pixel space.
| There are few papers doing similar things for supervised
| learning tasks iirc
| akrymski wrote:
| Using a standard loss function like MSE?
|
| Yeah it kinda works when you feed JPEG coefficients into a
| typical time-domain CNN, but mathematically it seems that if
| you're using frequencies as inputs, your convolutions should
| become simple multiplications. Am I wrong?
| mochomocha wrote:
| Yep, you can successfully do it without convolutions. Here
| is a pointer if you want to dig deeper:
| https://eng.uber.com/neural-networks-jpeg/ (there's prior
| art to that, but this one is well written)
| [deleted]
| jl2718 wrote:
| A 'neural' network is tautologically nonlinear.
| master_yoda_1 wrote:
| My fart molecule also have non linear interaction with other
| atom. So my farting is a neural network too. #bullshitArticle
| turtletontine wrote:
| I don't see anything here beyond "neural networks generalize
| linear operators." (But of course they're more interesting for
| nonlinear applications)
|
| I think neutral nets are such a generalized thing that tons of
| stuff can be shown to map to them. I saw a talk once about
| evolution being equivalent to neural networks: consider a genome
| as a set of neuron weights, and your iterative refinement as new
| generations in evolution. Maybe there's something deep there, but
| maybe if you make your model arbitrarily complicated enough you
| can make it look like any system you want. Which is the whole
| idea with neural nets.
| chobytes wrote:
| ...This could be a brilliant technique for collecting funding.
| srean wrote:
| Mono layer DNNs with a block-chained backpropagated backend
| n_kr wrote:
| People may find this interesting too:
| https://ocw.mit.edu/courses/mechanical-engineering/2-71-opti...
| cochne wrote:
| This is kind of silly. Neural networks are universal function
| approximators, meaning any function "is" (more accurately, "can
| be represented by") a neural network. In the DFT case, it is a
| linear function so we could get an exact representation. Though
| there is also nothing stopping you from just saying, f(x) is a
| neural network, because I can choose my activation function to be
| f.
| TheRealPomax wrote:
| If silly but interesting things weren't allowed, there would be
| no point to having a website like this. The article is a neat
| exploration of two things that many, _many_ people don 't
| realise are related because they simply don't know the maths
| involved in either of the two subjects.
| vletal wrote:
| Maybe, if the title wan not so overblown the pushback againt
| the articles premise would not be so hard.
| visarga wrote:
| > Neural networks are universal function approximators
|
| What about discontinuous functions?
| hyperbovine wrote:
| Continuous functions are dense in <pick your favorite
| function space> so yes.
| estebarb wrote:
| Every function can be made continuous if you add one
| dimension in it's output for defined/not defined.
| volta83 wrote:
| > Neural networks are universal function approximators,
|
| Do we have tight error bounds proofs for neural networks as
| approximators ?
| ska wrote:
| https://en.wikipedia.org/wiki/Universal_approximation_theore.
| ..
| swiley wrote:
| Anything can be expressed by a neural network to some arbitrary
| degree of precision. That's the whole point. The hard part isn't
| expressing the thing but getting the machine to figure out how to
| express it correctly on it's own. There's also no way to prove
| your learning algorithm actually works outside of toy problems
| like this which is why ML is really only good for data mining
| rather than steering a car.
| eigenket wrote:
| *anything continuous
| mabbo wrote:
| I love this kind of post.
|
| While it's all very posh to scoff while saying "well,
| obviously!", the author has explained it, demonstrated it,
| documented it. It's not about proving something new, it's about
| sharing something neat with the world. It's also well written and
| emotes a bit of a fun in the telling.
|
| I'm reminded of xkcd's "Ten Thousand" principle[0]: (Paraphrased)
| "For each thing that 'Everybody knows', every day there are, on
| average, 10,000 people in the US learning it for the first time".
| Good writing makes the experience better for those learning it
| for the first time today.
|
| [0]https://xkcd.com/1053/
| 6gvONxR4sf7o wrote:
| The problem isn't an "everybody knows" kind of problem. People
| (myself included) are rolling our eyes because it's the most
| rube goldbergian way to say it.
|
| It's analogous to saying that subtraction is a neural network.
| Addition and negation are core elements of modern state of the
| art neural networks, and you can express subtraction as a
| combination of those two modern neural network techniques.
| Therefore subtraction is a neural network.
|
| The line in the article about the formula y = A x is
| illustrative:
|
| > This [y = A x] should look familiar, because it is a neural
| network layer with no activation function and no bias.
|
| Imagine that as:
|
| > This [y = a + b] should look familiar, because it is a neural
| network layer with no activation function and no bias and no
| multiplication. Meanwhile, this [y = -x] should also look
| familiar, because it is a neural network layer with no
| activation function, no bias, no addition, and only negation.
|
| You'd expect that kind of explanation of subtraction if someone
| had never learned about addition and negation outside of neural
| networks, but you'd roll your eyes and say it's just such a
| bizarre way to talk about addition, negation, subtraction, and
| neural networks.
| Certhas wrote:
| But it explains the wrong thing. It's a fundamentally
| ridiculous framing. A basic neural network layer is a linear
| transformation + constant offset + activation function. Linear
| transformations are the elementary building block. And Fourier
| Transform is a quintessential example of a linear transform.
| But the article doesn't actually discuss that at all. If you
| don't understand this how will you ever deal with more
| complicated NN architectures?
| albertzeyer wrote:
| Also interesting/related:
|
| In speech recognition, we usually use log Mel features, or MFCC
| features, which do a Fourier transformation on the raw audio
| frames.
|
| You can also train neural networks directly on the raw audio
| features. When you do so, you can inspect the weights of the
| first layer (convolutional layer), and you see that it pretty
| much learned the short Fourier transformation.
|
| https://www-i6.informatik.rwth-aachen.de/publications/downlo...
| bigbillheck wrote:
| You can't just point at things and call them 'a neural network'.
|
| Linear regression's just learning an affine transform but I think
| it's misleading and unhelpful to call that a neural network
| either.
___________________________________________________________________
(page generated 2021-04-29 23:01 UTC)