[HN Gopher] Statistical vs. Deep Learning forecasting methods
___________________________________________________________________
Statistical vs. Deep Learning forecasting methods
Author : maxmc
Score : 138 points
Date : 2022-12-01 16:29 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Xcelerate wrote:
| I wish we could start moving to better approaches for evaluating
| time series forecasts. Ideally, the forecaster reports a
| probability distribution over time series, then we evaluate the
| predictive density with regard to an error function that is
| optimal for the intended application of the forecast at hand.
| 1980phipsi wrote:
| You mean I can't just go on CNBC and say my forecast is X?
| graycat wrote:
| I can have some interest in, hope for, etc. _machine learning_.
| One reason is, for the _curve fitting_ methods of classic
| statistics, i.e., versions of _regression_ , the math assumptions
| that give some hope of some good results are essentially
| impossible to verify and look like they will hold closely only
| rarely. So, even when using such _statistics_ , good advice is to
| have two steps, (1) apply the statistics, i.e., fit, using half
| the data and then (2) verify, test, check using the other half.
| But, gee, those two steps are also common in _machine learning_.
| Sooo, if can 't find much in classic math theorems and proofs to
| support machine learning, then, are just put back into the two
| steps statistics has had to use anyway.
|
| So, if we have to use the two steps anyway, then the possible
| advantages of non-linear fitting have some promise.
|
| So, to me, a larger concern comes to the top: In my experience in
| such things, call it statistics, optimization, data analysis,
| whatever, a huge advantage is bringing to the work some
| _understanding_ that doesn 't come with the data and/or really
| needs a human. The understanding might be about the real problem
| or about some mathematical methods.
|
| E.g., once some guys had a problem in optimal allocation of some
| resources. They had tried _simulated annealing_ , run for days,
| and quit without knowing much about the _quality_ of the results.
|
| I took the problem as 0-1 integer linear programming, a bit
| large, 600,000 variables, 40,000 constraints, and in 900 seconds
| on a slow computer, with Lagrangian relaxation, got a feasible
| solution guaranteed, from the bounding, to be within 0.025% of
| optimality. The big advantages were understanding the 0-1
| program, seeing a fast way to do the primal-dual iterations, and
| seeing how to use Lagrangian relaxation. My guess is that it
| would be tough for some very general _machine learning_ to
| compete much short of _artificial general intelligence_.
|
| One way to describe the problem with the simulated annealing was
| that it was just too general, didn't exploit what a human might
| understand about the real problem and possible solution methods
| selected for that real problem.
|
| I have a nice collection of such successes where the keys were
| some insight into the specific problems and some math techniques,
| that is, some human abilities that would seem to need machine
| learning to have _artificial general intelligence_ to compete.
| With lots of data, lots of computing, and the advantages of non-
| linear operations, at times machine learning might be the best
| approach even now.
|
| Net, still, in many cases, human intelligence is tough to beat.
| uoaei wrote:
| A point about gradient-free methods such as simulated annealing
| and genetic algorithms: the transition (sometimes called
| "neighbor") function is the most important part by far. The
| most important insight is the most obvious one in some way: if
| your task is to search a problem space efficiently for an
| optimal solution, it pays to know exactly how to move from
| where you are to where you want to be in that problem space. To
| that point, (the structure of) transitions between successive
| state samples should be refined to your specific problem and
| encoding of the domain in order to be useful in any reasonable
| amount of time.
| clircle wrote:
| What is the point of this kind of comparison? It is completely
| dependent on the 3000 datasets they chose to use. You're not
| going to find that one method is better than another in-general
| or find some type of time series for which you can make a
| specific methodological recommendation (unless that series is
| specifically constructed with a mathematical feature, like
| stationarity).
|
| What matters is "which method is better for _MY_ data? " but
| that's not something an academic can study. You just have a test
| a few different things.
| MrMan wrote:
| so your corollary to the No Free Lunch theorem is "Lunch Is
| Impossible"?
| tomrod wrote:
| My thoughts exactly. Unless the method can be shown to be
| inferior in certain or all dimensions, it is a meaningless
| comparison.
| stefanpie wrote:
| Timeseries data can sometimes be deceptive, depending on what you
| are trying to model.
|
| I have been hacking on a peroneal research project to predict
| hurricane tracks furcating using deep learning. Only given track
| and intensity data at different points in time (every 6 hours)
| and some simple feature engineering, you will not get any good
| results close to the official NHC forecast, no matter what model
| you use.
|
| In hindsight, this is a little obvious. Hurricane forecasting
| time series models depend more on other factors than time itself.
| A sales forecast can depend on seasonal trends and key events in
| time, but a hurricane forecast is much more dependent on long-
| range spatial data like the state atmosphere and ocean that are
| very non-trivial to model simply using just track data.
|
| However, deep leading models and techniques in this scenario are
| helpful because they can allow you to integrate multiple
| modalities like images, graphs, and volumetric data into this one
| model, which may not be possible with statistical models alone.
| jwilber wrote:
| Seems like these guys just wasted $11k to erroneously claim,
| "deep learning bad! Simple is better!"
|
| There's definitely use for these classical, model-based methods,
| for sure. But a contrived comparison claiming they're king is
| just misinformation.
|
| Eg, here are a number of issues with classical techniques where
| dl succeeds ('they' here refers to classical techniques):
|
| - they often don't support missing/corrupt data
|
| - they focus on linear relationships and not complex joint
| distributions
|
| - they focus on fixed temporal dependence that must be diagnosed
| and specified a priori
|
| - they take as input univariate, not multiple interval, data
|
| - they focus on one-step forecasts, not long time horizons
|
| - they're highly parameterized and rigid to assumptions
|
| - they fail for cold start problems
|
| A more nuanced comparison would do well to mention these.
| srean wrote:
| > they often don't support missing/corrupt data
|
| You gotta be kidding right, that's one thing that they do well.
| brrrrrm wrote:
| I'm heavily involved in this area of research (getting deep
| learning competitive with computationally efficient statistical
| methods), and I'd like to note a couple things I've found:
|
| 1. Deep learning doesn't require thorough understanding of priors
| or statistical techniques. This opens the door to more
| programmers in the same way high level languages empower far more
| people than pure assembly. The tradeoffs are analogous - high
| human efficiency, loss of compute efficiency.
|
| 2. Near-CPU deep learning accelerators are making certain classes
| of models far easier to run efficiently. For example, an M1 chip
| can run matrix multiplies (DL primitive composed of floating
| point operations) 1000x faster than individual instructions
| (2TFlops vs 2GHz). This really changes the game, since we're now
| able to compare 1000 floating point multiplications with a single
| if statement.
| zmachinaz wrote:
| Regarding 1)
|
| I am not sure if you are not trading "high human efficiency"
| against increased risk of blowing up at some point. Good luck
| doing forecasting without thorough understanding of priors and
| statistics in general.
| brrrrrm wrote:
| that's a good point. I guess as an addendum it's not just
| compute efficiency but also "statistical efficiency" (if that
| has any meaning?)
| singhrac wrote:
| I think that term already has usage as a proxy for "lowest
| sampling variance"; for example the Gauss Markov theorem
| shows that OLS is the most efficient unbiased linear
| estimator.
|
| I guess this is echoing your point 2, but I would have
| generally said that "principled" statistical models are
| less efficient these days than DL (see: HMC being much
| slower than variational Bayes). Priors are usually
| overrated but I think the risk is more that basic mistakes
| are made because people don't understand what assumptions
| go into "basic" machine learning ideas like train/test
| splits or model selection. I'm not sure it warrants a lot
| of panic though.
| epgui wrote:
| Agreed, I see the "lower barrier to entry" in this particular
| case as coming with potentially huge risks. IMO, statistics
| is vastly, vastly, vastly under-appreciated and under-
| estimated.
| PaulHoule wrote:
| It is something that bothers me about the ML literature is that
| they frequently present a large number of evaluation results such
| as precision and AUC but these are not qualified by error bars.
| Typically they make a table which has different algorithms on one
| side and different problems on the other side and the highest
| score for a given problem gets bolded.
|
| I know if you did the experiment over and over against with
| different splits you'd get slightly different scores so I'd like
| to see some guidance as to significance in terms of 1 statistical
| significance, and 2 is it significant on a business level. Would
| customers notice the difference? Would it make better decisions
| that move the needle for revenue or other business metrics?
|
| This study is an example where a drastically more expensive
| algorithm seems to produce a practically insignificant
| improvement.
| zone411 wrote:
| Every researcher would love to include error bars but it's a
| matter of limited computing resources at universities. Unless
| you're training on a tiny dataset like MNIST, these training
| runs get expensive. Also, unless you parallelize from the start
| and risk wasting a lot of resources if something goes wrong, it
| could take longer to get the results.
| PaulHoule wrote:
| Using bootstrap and/or repeated runs is a great way to get
| error bars but there are low cost ways to do it.
|
| For instance they estimate error bars on public opinion polls
| based on simple formulas and not redoing the poll a large
| number of times.
| nequo wrote:
| If you don't have an analytical expression for your
| asymptotic variance, you do have to use bootstrap though.
|
| For public opinion polls, the estimator is simple (i.e., a
| sample mean), so we have an analytical expression for its
| asymptotic variance.
| [deleted]
| time_to_smile wrote:
| Simple formulas only work because the models themselves for
| those polls are incredibly simple and adding a bit more
| complexity requires a lot of tools to compute these
| uncertainties (this is part of the reason you see
| probabilistic programming so popular for people doing non-
| trivial polling work).
|
| There are no simple approximations for a range of even
| slightly complex models. Even some nice computational
| tricks like the Laplace approximation don't work on models
| with high numbers of parameters (since you need to compute
| the diagonal of the Hessian).
|
| A good overview of the situation is covered in Efron &
| Hastie's "Computer Age Statistical Inference".
| [deleted]
| maxmc wrote:
| Thanks for the comment!
|
| In Machine Learning literature, the variance of accuracy
| measurements originates from different network parameters
| initialization. Since the deep learning ensembles already use
| aggregate computation in the hundreds of days, computing the
| variance would elevate the computational time into thousands of
| days.
|
| In contrast, statistical methods that we report optimize convex
| objectives; their optimal parameters are deterministic.
|
| That being said, we like the idea of including cross-validation
| with different splits for future experiments.
| igorkraw wrote:
| This is one of my default suggestions when I act as reviewer: t
| test with bonferroni correction please. ML, ironically, has
| absolutely horrible practices in terms of distinguishing signal
| from noise( which at least is partially offset by the social
| pressure to share code, but still)
| maxmc wrote:
| Bonferroni's correction on hold-out data is an excellent
| suggestion. To adapt it into time series forecasting, one
| could perform temporal cross-validation with rolling windows
| and follow the performance's variance through time.
|
| Unfortunately, the computational time would explode if the ML
| method's optimization is performed naively. Precise
| measurements of the statistical significance would crowd out
| researchers except for Big Tech.
| mattkrause wrote:
| Bonferroni is probably not the right choice because it can
| be overly conservative, especially if the tests are
| positively-correlated.
|
| Holm-Sidak would be better--but something like false
| discovery rate might be easier to interpret.
| tomrod wrote:
| Question: why do we care about the Bonferroni correction if
| the model being reviewed shows high performance on
| holdout/test samples?
|
| I mean, it's nice to know that the p-values of coefficients
| on models you are submitting for publication are
| appropriately reported under the conservative approach
| Bonferroni applies, but I would think making it a _default_
| is an inappropriate forcing function when the performance on
| holdout is more appropriate. Data leakage would be a much,
| much larger concern IMHO. Variance of the performance metrics
| is also important.
|
| What am I missing?
| mattkrause wrote:
| The test sample is just a small, arbitrary sample from a
| universe of similar data.
|
| You (probably) don't care about test-set performance _per
| se_ but instead want to be able to claim that one model
| works better _in general_ than another. For that, you need
| to bust out the tools of statistical inference.
| igorkraw wrote:
| Because the variance can be uniformly high, making it
| difficult to properly judge the improvement of one method
| vs the baseline method: did you actually improve, or did
| you just get a few lucky seeds? It's much harder to get a
| paper debunking new "SotA" methods so I default to showing
| a clear improvement over a good baseline. Simply looking at
| the performance is also not enough because a task can look
| impressive, but be actually quite simple (and vice versa),
| so using these statistical measures makes it easy to
| distinguish good models on hard tasks from bad models on
| easy tasks.
|
| I should also note 1) this is about testing whether the
| performance of a model is meaningfully different from
| another, not the coefficient of the models 2) I don't
| _reject_ papers just because they lack this, or if they
| fail to achieve a statistical significance, I just want it
| in the paper so the reader can use that to judge (and it
| also helps suss out cherry picked results)
| tomrod wrote:
| Thanks, that makes sense. I was confused where you where
| and how you were applying the Bonferroni correction
| yardstick.
| goosedragons wrote:
| You'd want to do some sort of test because it can help
| assess whether your method did better than the alternatives
| by chance. For example can you really say Method A is
| better than B if A got 88% accuracy on the holdout set and
| B got 86% accuracy? Would that be true of all possible
| datasets?
|
| t-test with Bonferroni isn't necessarily the best test for
| all metrics either.
| hulalula wrote:
| Would this work for every kind of data? I imagine maybe not?
| PaulHoule wrote:
| See https://en.wikipedia.org/wiki/Bonferroni_correction
| tylerneylon wrote:
| What would be a better method for machine learning folks to
| take? As a sincere curiosity / desire to learn, not meant as
| a rhetorical implication that I disagree.
|
| I interpret your criticism to mean that ML folks tend to re-
| use a test set multiple times without worrying that doing so
| reduces the meaning of the results. If that's what you mean,
| then I do agree.
|
| Informally, some researchers are aware of this and aim to use
| a separate validation data set for all parameter tuning, and
| would like to use a held out test set as few times as
| possible -- ideally just once. But it gets more complicated
| than that because, for example, different subsets of the data
| may not really be independent samples from the run-time
| distribution (example: data points = medical data about
| patients who lived or died, but only from three hospitals;
| the model can learn about different success rates per
| hospital successfully but it would not generalize to other
| hospitals). In other words, there are a lot of subtle ways in
| which a held out test set can result in overconfidence, and I
| always like to learn of better ways to resist that
| overconfidence.
| igorkraw wrote:
| Ben Recht actually has a line of work showing that we
| aren't over fitting the validation/test set for now
| (amazingly...). What I mean is, by chasing higher and
| higher SotA with more and more money and compute, whole
| fields can go "improving" only for papers like
| https://arxiv.org/abs/2003.08505 or "Implementation matters
| in deep RL" to come out and show that what's going on is
| different from the literature consensus. The standards for
| showing improvement are low, while standards for negative
| resultats are high (I'm a bit biased because I have a
| rejected paper trying to show empirically some deep RL work
| didn't add marginal value but I think the case still
| holds). Everyone involved is trying their best to do good
| science but unless someone like me asks for it, there
| simply isn't a value add for your career to do exhaustive
| checking.
|
| A concrete improvement would be only being allowed to
| change 1 thing at a time per paper, and measure the impact
| of changing that one thing. But then you couldn't
| realistically publish _anything_ outside of megacorps.
| Another solution might be banning corporate papers, or at
| least making a separate track...from reviewing papers, it
| seems like single authors or small teams in academia need
| to compete with Google where multiple teams might share
| aspects of a project, one doing the architecture, the other
| a new training algorithm etc...which won 't be disclosed,
| you'll just read a paper where for _some reason_ a novel
| architecture is introduced using a baseline which is a bit
| exotic but _also_ used in another paper that came out close
| to this one, and a regulariser which was introduced just
| before that ...
|
| If you limit the pools, you can put much higher standards
| on experiments on corporate where you have the budget,
| while giving academia more points for novelty and
| creativity
| IfOnlyYouKnew wrote:
| The tests sets are large enough to render this moot, as the
| confidence intervals are almost certainly smaller than the
| precisions typically reported, i. e. 0.1 %.
| PaulHoule wrote:
| I've worked on commercial systems where N<=10,000 in the
| evaluation set and the confidence interval there is probably
| not so good as 0.1% for that. For instance there is a lot of
| work on this data set (which we used to tune up a search
| engine)
|
| https://ir-datasets.com/gov2.html
|
| and sometimes it as bad as N=50 queries with judgements. I
| don't see papers that are part of TREC or based on TREC data
| dealing with sampling errors in any systematic way.
| jll29 wrote:
| NIST's TREC workshop series uses Cyril Cleverdon's
| methodology ("Cranfield paradigm") from the 1960s, and more
| could surely be done at the evaluation front:
|
| - systematically addressing sampling error;
|
| - more than 50 queries;
|
| - more/all QRELs;
|
| - full evaluation instead of system pooling;
|
| - study IR not just of the English language (this has been
| picked up by CLEF and NTCIR in Europe and Japan,
| respectively)
|
| - to devise metrics that take energy efficiency into
| account.
|
| - ...
|
| At the same time, we have to be very grateful to NIST/TREC
| for executing an international (open) benchmark annually,
| which has moved the field forward a lot in the last 25
| years.
| MrMan wrote:
| why are middle-ground (but SOTA) techniques like guassian
| processes and GBM regression not in this comparo
| maxmc wrote:
| A lot of M3 datasets we use are high-frequency, with large
| seasonal inputs. Considering Gaussian Processes (GP) complexity
| is O(N^3), a careful study of their performance would be
| challenging.
|
| Also... I'm not aware of any efficient GP Python
| implementations.
| thanatropism wrote:
| Just write your GP model in Pyro or something like that.
| vladf wrote:
| GPs over time series can leverage low-dimensional index sets
| for O(N lg N) fitting and inference. This can be done by
| interpolating the inputs onto a regular grid which admits
| Toeplitz kernels. See https://arxiv.org/abs/1503.01057.
| kgarten wrote:
| Nice article and interesting comparison. Yet, I have a minor
| issue with the title: Deep Learning are also statistical methods
| ... "univariate models vs. " would be a better title.
| nerdponx wrote:
| You could argue that deep learning is not a statistical method
| in the traditional sense, in that a typical neural network
| model is not a probability model, and some neural networks are
| well known to produce specifically bad probability models,
| requiring some amount of post processing in order to produce
| correctly "calibrated" probability predictions.
|
| However I don't like that there is often a strict dichotomy
| presented between "deep learning" and "statistics". There is a
| whole world of gray areas and hybrid techniques, which tend to
| be both more accessible, easier to reason about, and more
| effective in practice, especially on smaller "tabular"
| datasets. What about generalized additive models, random
| forests, gradient boosted trees, etc.?
|
| The author of the document I'm sure is aware of these
| techniques, and I assume they are left out because they didn't
| perform well enough to be considered here. But I don't think it
| does the discourse any favors to promulgate the false
| dichotomy.
| fedegr wrote:
| Co-author here: all in due time. Next iteration we will
| include LigthGBM, XGBoost, and newer DL models like TFT and
| NHiTS.
| uoaei wrote:
| Statistical models and probabilistic models are not
| synonymous.
|
| Vanilla deep learning models are _statistical_ models (a la
| linear regression) and not _probabilistic_ models (a la
| Gaussian mixture). It is important to maintain the
| distinction.
|
| But to your point about the dichotomy between deep learning
| and more "traditional" statistical methods: this confusion in
| common parlance clearly has negative effects on model-
| building among engineers. You are right that when people
| think "deep learning" they think of very specific
| architectures with very specific features, and don't seem to
| conceive of the possibility that automatic differentiation
| techniques mean you can incorporate all sorts of new model
| components that blur the line between deep learning and older
| methods. For instance, you could feed the results of a kernel
| SVM to an ARIMA model in such a way that the whole thing is
| end-to-end differentiable. In fact, the great benefit of deep
| learning long-term is (in my opinion) that the ability to
| build these compositional models means you can bake in that
| much more inductive bias into the models you build, meaning
| they can be smaller and more stable in training.
| salty_biscuits wrote:
| "Vanilla deep learning models are statistical models (a la
| linear regression) and not probabilistic models (a la
| Gaussian mixture). It is important to maintain the
| distinction."
|
| Isn't this just a matter of interpretation of the models?
| You can interpret linear regression in a Bayesian way and
| say that the prediction of the linear model is the MAP of
| the mean, you can also calculate the variance, the l2 norm
| objective is saying the distribution of errors is normally
| distributed, l2 regularisation is a normal prior on the
| coefficients, etc, etc? All the same stuff can be applied
| to deep learning models.
|
| Maybe I don't understand your distinction between
| statistical and probabilistic though?
| uoaei wrote:
| > Isn't this just a matter of interpretation of the
| models?
|
| Not really. This is the classic frequentist vs Bayesian
| debate. In frequentist-land, you are computing point
| estimates of the model parameters. In Bayesian-land, you
| are computing distribution estimates of the model
| parameters. It is true that there is a difference in
| interpretation of the _generative process_ but the two
| choices demand fundamentally different models because of
| the decision about which of the parameters or data are
| considered "real" and which are considered "generated".
|
| I think a more abstract/general way to put it is:
| "statistics" is concerned with statistical _summary
| values_ (i.e. mean-field estimates over measures) while
| "probability" is concerned more with _distributions_
| (i.e., topologies of measures). I 'm not sure this is a
| rigorously correct way to characterize it, but it
| illustrates the intuition I'm trying to convey.
| dumb1224 wrote:
| I have very limited statistical background but doesn't
| variational inference applied in the neural networks make
| them probabilistic models? The modelling definitely seems
| so because the math in those papers doesn't even specify
| whether it's a network (it implies that it can be any
| model).
| uoaei wrote:
| Yes indeed. This synthesis of concepts is a great
| illustration of moving beyond hardened dichotomies in
| this research space and I believe similar approaches will
| be fruitful in the years to come.
| stellalo wrote:
| They are all univariate models: some are trained offline on a
| bunch of different series before being applied (deep learning,
| "global" models), others are applied directly to each series to
| forecast ("statistical", "local" models), but the task is the
| same univariate time series prediction for every model there.
| maxmc wrote:
| Comparison of several Deep Learning models and ensembles to
| classical statistical models for the 3,003 series of the M3
| competition.
| macrolime wrote:
| What deep learning could instead be used for in this case is to
| incorporate more data, like text describing events that affects
| macroeconomics when doing macroeconomic predictions.
| em500 wrote:
| The conclusion, that a low-complexity statistical ensemble is
| almost as good as a (computationally) complex Deep Learning
| model, should not come as a surprise, given the data.
|
| The dataset[1] used here are 3003 time series from the M3
| competition ran by the International Journal of Forecasting.
| Almost all of these are sampled at the yearly, quarterly or
| monthly frequency, each with typically 40 to 120 observations
| ("samples" in Machine Learning lingo), and the task is to
| forecast a few months/quarters/years out of sample. Most
| experienced Machine Learners will realize that there is probably
| limited value in fitting high complexity n-layer Deep Learning
| model to 120 data-points to try to predict the next 12. If you
| have daily or intraday (hourly/minutely/secondly) time series,
| more complex models might become more worthwhile, but such series
| are barely represented in the dataset.
|
| To me the most surprising result was only how bad AutoARIMA
| performed. Seasonal ARIMA was one of the traditional go-to
| methods for this kind of data.
|
| [1] https://forecasters.org/resources/time-series-
| data/m3-compet...
| tylerneylon wrote:
| This readme lands to me like this: "People say deep learning
| killed stats, but that's not true; in fact, DL can be a huge
| mistake."
|
| Ok, I fully agree with their foundational premise: Start simple.
|
| But: They've overstated their case a bit. Saying that deep
| learning will cost $11,000 and need 14 days on this data set is
| not reasonable. I believe you can find some code that will cost
| that much. The readme suggests that this is typical of deep
| learning, which is not true. DL models have enormous variety. You
| can train a useful, high-performance model on a laptop CPU in a
| seconds-to-minutes timeframe; examples include multilayer
| perceptrons for simple classification, a smaller-scale CNN, or a
| collaborative filtering model.
|
| While I don't endorse all details of their argument, I do think
| the culture of applied ML/data science has shifted too far toward
| default-DL. The truth is that many problems faced by real
| companies can be solved with simple techniques or pre-trained
| models.
|
| Another perspective: A DL model is a spacecraft (expensive,
| sophisticated, powerful). Simple models like logistic regression
| are bikes and cars (affordable, efficient, less powerful). Using
| heuristics is like walking. Often your goal is just a few blocks
| away, in which case it would be inefficient to use a spacecraft.
| sigmoid10 wrote:
| >They've overstated their case a bit. Saying that deep learning
| will cost $11,000 and need 14 days on this data set is not
| reasonable.
|
| After glancing at the paper they're criticising, I really
| wonder how they arrived at these insane figures. From what I
| saw, they were mostly using stuff like MLPs with a handful
| layers at O(100) neurons at most. Yeah, if you put a hundred
| million parameter transformer in there you will train forever
| (and waste tons of compute since that would be complete
| overkill), but not with simple perceptrons. I don't know the
| extent of the data, but given these architectures I very much
| doubt a practical model would take this long to train - even on
| a CPU - given that you could run a statistical ensemble in 5
| minutes.
___________________________________________________________________
(page generated 2022-12-01 23:00 UTC)