[HN Gopher] Do Machine Learning Models Memorize or Generalize?
___________________________________________________________________
Do Machine Learning Models Memorize or Generalize?
Author : 1wheel
Score : 344 points
Date : 2023-08-10 13:56 UTC (9 hours ago)
(HTM) web link (pair.withgoogle.com)
(TXT) w3m dump (pair.withgoogle.com)
| lewhoo wrote:
| So, the TLDR could be: they memorize at first and then generalize
| ?
| mjburgess wrote:
| Statistical learning can typically be phrased in terms of k
| nearest neighbours
|
| In the case of NNs we have a "modal knn" (memorising) going to a
| "mean knn" ('generalising') under the right sort of training.
|
| I'd call both of these memorising, but the latter is a kind of
| weighted recall.
|
| Generalisation as a property of statistical models (ie., models
| of conditional freqs) is not the same property as generalisation
| in the case of scientific models.
|
| In the latter a scientific model is general because it models
| causally necessary effects from causes -- so, _necessarily_ if X
| then Y.
|
| Whereas generalisation in associative stats is just about whether
| you're drawing data from the empirical freq. distribution or
| whether you've modelled first. In all automated stats the only
| diff between the "model" and "the data" is some sort of weighted
| averaging operation.
|
| So in automated stats (ie., ML,AI) it's really just whether the
| model uses a mean.
| autokad wrote:
| I disagree, it feels like you are just fusing over words and
| not what's happening in the real world. If you were right, a
| human doesn't learn anything either, they just memories.
|
| you can look at it by results: I give these models inputs its
| never seen before but it gives me outputs that are correct /
| acceptable.
|
| you can look at it in terms of data: we took petabytes of data,
| and with an 8gb model (stable difusion) we can output an image
| of anything. That's an unheard of compression, only possible if
| its generalizing - not memorizing.
| bippihippi1 wrote:
| it's been proven that all models learned by gradient descent
| are equivalent to kernel machines. interpolation isn't
| generalization. if theres a new input sufficiently different
| from the training data the behaviour is unknown
| xapata wrote:
| One weird trick ...
|
| There's some fox and hedgehog analogy I've never understood.
| visarga wrote:
| but when the model trains on 13T tokens it is hard to be OOD
| ActivePattern wrote:
| I'd be curious how much of the link you read.
|
| What they demonstrate is a neural network learning an algorithm
| that approximates modular addition. The exact workings of this
| algorithm is explained in the footnotes. The learned algorithm
| is general -- it is just as valid on unseen inputs as seen
| inputs.
|
| There's no memorization going on in this case. It's _actually_
| approximating the process used to generate the data, which just
| isn 't possible using k nearest neighbors.
| visarga wrote:
| > Statistical learning can typically be phrased in terms of k
| nearest neighbours
|
| We have suspected that neural nets are a kind of kNN. Here's a
| paper:
|
| Every Model Learned by Gradient Descent Is Approximately a
| Kernel Machine
|
| https://arxiv.org/abs/2012.00152
| [deleted]
| xaellison wrote:
| what's the TLDR: memorize, or generalize?
| greenflag wrote:
| It seems the take home is weight decay induces sparsity which
| helps learn the "true" representation rather than an overfit one.
| It's interesting the human brain has a comparable mechanism
| prevalent in development [1]. I would love to know from someone
| in the field if this was the inspiration for weight decay (or
| presumably just the more equivalent nn pruning [2]).
|
| [1] https://en.wikipedia.org/wiki/Synaptic_pruning [2]
| https://en.wikipedia.org/wiki/Pruning_(artificial_neural_net...
| tbalsam wrote:
| ML researcher here wanting to offer a clarification.
|
| L1 induces sparsity. Weight decay explicitly _does not_, as it
| is L2. This is a common misconception.
|
| Something a lot of people don't know is that weight decay works
| because when applied as regularization it causes the network to
| approach the MDL, which reduces regret during training.
|
| Pruning in the brain is somewhat related, but because the brain
| uses sparsity to (fundamentally, IIRC) induce representations
| instead of compression, it's basically a different motif
| entirely.
|
| If you need a hint here on this one, think about the implicit
| biases of different representations and the downstream impacts
| that they can have on the learned (or learnable)
| representations of whatever system is in question.
|
| I hope this answers your question.
| joaogui1 wrote:
| That looks interesting, do you know what paper talks about
| the connection between MDL, regret, and weight decay?
| tbalsam wrote:
| I would start with Shannon's information theory and the
| Wikipedia page on L2/the MDL as a decent starting point.
|
| For the first, there are a few good papers that simplify
| the concepts even further.
| pcwelder wrote:
| Afaik weight decay is inspired from L2 regularisation which
| goes back to linear regression where L2 regularisation is
| equivalent to having gaussian prior on the weights with zero
| mean.
|
| Note that L1 regularisation produces much more sparsity but it
| doesn't perform as well.
| nonameiguess wrote:
| This. Weight decay is just a method of dropping most weights
| to zero which is a standard technique used by statisticians
| for regularization purposes for decades. As far as I
| understand, it goes back at least to Tikhorov from 1970 and
| was mostly called ridge regression in the regression context.
| Normal ordinary least squares attempts to minimize the L2
| norm of the squared residuals. When a system is
| overdetermined, adding a penalty term (usually just a scalar
| multiple of an identity matrix) and also minimizing the L2
| norm of that biases the model to produce mostly near-zero
| weights. This helps with underdetermined systems and gives a
| better conditioned model matrix that is actually possible to
| solve numerically without underflow.
|
| It's kind of amazing to watch this from the sidelines, a
| process of engineers getting ridiculously impressive results
| from some combo of sheer hackery and ingenuity, great data
| pipelining and engineering, extremely large datasets,
| extremely fast hardware, and computational methods that scale
| very well, but at the same time, gradually relearning lessons
| and re-inventing techniques that were perfected by
| statisticians over half a century ago.
| whimsicalism wrote:
| this comment is so off base, first off no l2 des not
| encourage near 0 weights, second off they are not
| relearning, everyone already knew what l1/l2 penalties are
| tbalsam wrote:
| L1 drops weights to zero, L2 biases towards Gaussianality.
|
| It's not always relearning lessons or people entirely
| blindly trying things either, many researchers use the
| underlying math to inform decisions for network
| optimization. If you're seeing that, then that's probably a
| side of the field where people are newer to some of the
| math behind it, and that will change as things get more
| established.
|
| The underlying mathematics behind these kinds of systems
| are what has motivated a lot of the improvements in hlb-
| CIFAR10, for example. I don't think I would have been able
| to get there without sitting down with the fundamentals,
| planning, thinking, and working a lot, and then executing.
| There is a good place for blind empirical research too, but
| it loses its utility past a certain point of overuse.
| visarga wrote:
| The inspiration for weight decay was to reduce the capacity to
| memorize of the model until it perfectly fits the complexity of
| the task, not more not less. A model more complex than the task
| is over-fitting, the other one is under-fitting. Got to balance
| them out.
|
| But the best cure for over-fitting is to make the dataset
| larger and ensure data diversity. LLMs have datasets so large
| they usually train one epoch.
| crdrost wrote:
| And there have been a lot of approaches to do this, my
| favorite one being the idea that maybe if we just randomly
| zap out some of the neurons while we train the rest, that
| forcing it to acquire that redundancy might privilege
| structured representations over memorization. Just always
| seemed like some fraternity prank, "if you REALLY know the
| tenets of Delta Mu Beta you can recite them when drunk after
| we spin you around in a circle twelve times fast!"
| whimsicalism wrote:
| https://nitter.net/Yampeleg/status/1688441683946377216
| kaibee wrote:
| > But the best cure for over-fitting is to make the dataset
| larger and ensure data diversity.
|
| This is also good life advice.
| nightski wrote:
| It sounds nice in theory, but the data itself could be
| problematic. There is no temporal nature to it. You can have
| duplicate data points, many data points that are closely
| related but describe the same thing/event/etc.. So while only
| showing the model each data point once ensures you do not
| introduce any extra weight on a data point, if the dataset
| itself is skewed it doesn't help you at all.
|
| Just by trying to make the dataset diverse you could skew
| things to not reflect reality. I just don't think enough
| attention has been paid to the data, and too much the model.
| But I could be very wrong.
|
| There is a natural temporality to the data humans receive.
| You can't relive the same moment twice. That said, human
| intelligence is on a scale too and may be affected in the
| same way.
| visarga wrote:
| > I just don't think enough attention has been paid to the
| data, and too much the model.
|
| I wholly agree. Everyone is blinded by models - GPT4 this,
| LLaMA2 that - but the real source of the smarts is in the
| dataset. Why would any model, no matter how its
| architecture is tweaked, learn about the same ability from
| the same data? Why would humans be all able to learn the
| same skills when every brain is quite different. It was the
| data, not the model
|
| And since we are exhausting all the available quality text
| online we need to start engineering new data with LLMs and
| validation systems. AIs need to introspect more into their
| training sets, not just train to reproduce them, but
| analyse, summarise and comment on them. We reflect on our
| information, AIs should do more reflection before learning.
|
| More fundamentally, how are AIs going to evolve past human
| level unless they make their own data or they collect data
| from external systems?
| Salgat wrote:
| This is definitely current models' biggest issue. You're
| training a model against millions of books worth of data
| (which would take a human tens of thousands of lifetimes)
| to achieve a superficial level of conversational ability
| to match a human, which can consume at most 3 novels a
| day without compromising comprehension. Current models
| are terribly inefficient when it comes to learning from
| data.
| whimsicalism wrote:
| You have to count the training process from the origin of
| the human brain imo, not from the birth of any individual
| human.
|
| Neural nets look much more competitive by that standard.
| imtringued wrote:
| They are inefficient by design. Gradient descent and
| backpropagation scale poorly, but they work and GPUs are
| cheap, so here we are.
| og_kalu wrote:
| Modern LLMs are nowhere near the scale of the human brain
| however you want to slice things so terribly inefficient
| is very arguable. also language skills seemingly take
| much less data and scale when you aren't trying to have
| it learn the sum total of human knowledge.
| https://arxiv.org/abs/2305.07759
| Salgat wrote:
| Scale is a very subjective thing since one is analog (86B
| neurons) and one is digital (175B parameters).
| Additionally, consider how many compute hours GPT 3 took
| to train (10,000 V100s were set aside for exclusive
| training of GPT 3). I'd say that GPT 3 scale vastly
| dwarfs the human brain, which runs at a paltry 12 watts.
| ben_w wrote:
| > It was the data, not the model
|
| It's _both_.
|
| It's clearly impossible to learn how to translate Linear
| A into modern English using only content written in pure
| Japanese that never references either.
|
| Yet also, none of the algorithms before Transformers were
| able to first ingest the web, then answer a random
| natural language question in any domain -- closest was
| Google etc. matching on indexed keywords.
|
| > how are AIs going to evolve past human level unless
| they make their own data?
|
| Who says they can't make their own data?
|
| Both _a priori_ (by development of "new" mathematical
| and logical tautological deductions), and _a posteriori_
| by devising, and observing the results of, various
| experiments.
|
| Same as us, really.
| whimsicalism wrote:
| > Yet also, none of the algorithms before Transformers
| were able to first ingest the web, then answer a random
| natural language question in any domain -- closest was
| Google etc. matching on indexed keywords.
|
| Wrong, recurrent models were able to do this, just not as
| well.
| riversflow wrote:
| I see this brought up consistently on the topic of AI
| take-off/X-risk.
|
| How does an AI _language model_ devise an experiment and
| observe the results? The language model is only trained
| on what's already known, I'm extremely incredulous that
| this language model technique can actually reason a
| genuinely novel hypothesis.
|
| A LLM is a series of weights sitting in the ram of GPU
| cluster, it's really just a fancy prediction function. It
| doesn't have the sort of biological imperatives (a result
| of being complete independent beings) or entropy that
| drive living systems.
|
| Moreover, if we consider how it works for humans, people
| have to _think_ about problems. Do we even have a model
| or even an idea about what "thinking" is? Meanwhile
| science is a looping process that mostly requires a
| physical element(testing/verification) to it. So unless
| we make some radical breakthroughs in general purpose
| robotics, as well as overcome the thinking problem I
| don't see how AI can do some sort tech breakout/runaway.
| ben_w wrote:
| Starting with the end so we're on the same page about
| framing the situation:
|
| > I don't see how AI can do some sort tech
| breakout/runaway.
|
| I'm expecting (in the mode, but with a wide and shallow
| distribution) a roughly 10x increase in GDP growth, from
| increased automation etc., _not_ a singularity /foom.
|
| I think the main danger is bugs and misuse (both
| malicious and short-sighted).
|
| -
|
| > How does an AI language model devise an experiment and
| observe the results?
|
| Same way as Helen Keller.
|
| Same way scientists with normal senses do for data
| outside human sense organs, be that the LHC or nm/s^2
| acceleration of binary stars or gravity waves (or the
| confusingly similarly named but very different
| gravitational waves).
|
| > The language model is only trained on what's already
| known, I'm extremely incredulous that this language model
| technique can actually reason a genuinely novel
| hypothesis.
|
| Were you, or any other human, trained on things
| _unknown_?
|
| If so, how?
|
| > A LLM is a series of weights sitting in the ram of GPU
| cluster, it's really just a fancy prediction function. It
| doesn't have the sort of biological imperatives (a result
| of being complete independent beings) or entropy that
| drive living systems.
|
| Why do you believe that biological imperatives are in any
| way important?
|
| I can't see how any of a desire to eat, shag, fight, run
| away, or freeze up... help with either the scientific
| method nor pure maths.
|
| Even the "special sauce" that humans have over other
| animals didn't lead to _any_ us doing the scientific
| method until very recently, and _most_ of us still don
| 't.
|
| > Do we even have a model or even an idea about what
| "thinking" is?
|
| AFAIK, only in terms of output, not qualia or anything
| like that.
|
| Does it matter if the thing a submarine does is swimming,
| if it gets to the destination? LLMs, for all their
| mistakes and their... utterly inhuman minds and
| transhuman training experience... can do many things
| which would've been considered "implausible" even in a
| sci-fi setting a decade ago.
|
| > So unless we make some radical breakthroughs in general
| purpose robotics
|
| I don't think it needs to be _general_ , as labs are
| increasingly automated even without general robotics.
| imtringued wrote:
| It's not just a series of weights. It is an unchanging
| series of weights. This isn't necessarily artificial
| intelligence. It is the intelligence of the dead.
| BaseballPhysics wrote:
| The human brain has synaptic pruning. The exact purpose of it
| is theorized but not actually understood, and it's a gigantic
| leap to assume some sort of analogous mechanism between LLMs
| and the human brain.
| [deleted]
| djha-skin wrote:
| How is this even a shock.
|
| Anyone who so much as taken a class on this knows that even the
| simplest of perceptron networks, decision trees, or any form of
| machine learning model generalizes. That's why we use them. If
| they don't, it's called _overfit_ [1], where the model is so
| accurate on the training data that its inferential ability on new
| data suffers.
|
| I know that the article might be talking about a higher form of
| generalization with LLMs or whatever, but I don't see why the
| same principle of "don't overfit the data" wouldn't apply to that
| situation.
|
| No, really: what part of their base argument is novel?
|
| 1: https://en.wikipedia.org/wiki/Overfitting
| halflings wrote:
| The interesting part is the sudden generalization.
|
| Simple models predicting simple things will generally slowly
| overfit, and regularization keeps that overfitting in check.
|
| This "grokking" phenomenon is when a model first starts by
| aggressively overfitting, then gradually prunes unnecessary
| weights until it _suddenly_ converges on the one generalizable
| combination of weights (as it 's the only one that both solves
| the training data _and_ minimizes weights).
|
| Why is this interesting? Because you could argue that this
| justifies using overparametrized models with high levels of
| regularization; e.g. models that will tend to aggressively
| overfit, but over time might converge to a better solution by
| gradual pruning of weights. The traditional approach is not to
| do this, but rather to use a simpler model (which would
| initially generalize better, but due to its simplicity might
| not be able to learn the underlying mechanism and reach higher
| accuracy).
| timy2shoes wrote:
| It's interesting that the researchers chose example problems
| where the minimum norm solution is the best at
| generalization. What if that's not the case?
| godelski wrote:
| It's because you over generalized your simple understanding.
| There is a lot more nuance to that thing you are calling
| overfitting (and underfitting). We do not know why it happens
| or when it happens, in all cases. We do know cases where it
| does happen and why it happens, but that doesn't me we don't
| know others. There is still a lot of interpretation left that
| is needed. How much was overfit? How much underfit? Can these
| happen at the same time? (yes) What layers do this, what causes
| this, and how can we avoid it? Reading the article shows you
| that this is far from a trivial task. This is all before we
| even introduce the concept of sudden generalization. Once we do
| that then all these things start again but now under a
| completely different context that is even more surprising. We
| also need to talk about new aspects like the rate of
| generalization and rate of memorization what what affects
| these.
|
| tldr: don't oversimplify things: you underfit
|
| P.S. please don't fucking review. Your complaints aren't
| critiques.
| tipsytoad wrote:
| Seriously, are they only talking about weight decay? Why so
| complicated?
| SimplyUnknown wrote:
| First of all, great blog post with great examples. Reminds me of
| distill.pub used to be.
|
| Second, the article correctly states that typically L2 weight
| decay is used, leading to a lot of weights with small magnitudes.
| For models that generalize better, would it then be better to
| always use L1 weight decay to promote sparsity in combination
| with longer training?
|
| I wonder whether deep learning models that only use sparse
| fourier features rather than dense linear layers would work
| better...
| qumpis wrote:
| Slightly related but sparsity-inducing activation function Relu
| is often used in neural networks
| medium_spicy wrote:
| Short answer: if the inputs can be represented well on the
| Fourier basis, yes. I have a patent in process on this, fingers
| crossed.
|
| Longer answer: deep learning models are usually trying to find
| the best nonlinear basis in which to represent inputs; if the
| inputs are well-represented (read that as: can be sparsely
| represented) in some basis known a-priori, it usually helps to
| just put them in that basis, e.g., by FFT'ing RF signals.
|
| The challenge is that the overall-optimal basis might not be
| the same as those of any local minima, so you've got to do some
| tricks to nudge the network closer.
| superkuh wrote:
| There were no auto-discovery RSS/Atom feeds in the HTML, no links
| to the RSS feed anywhere, but by guessing at possible feed names
| and locations I was able to find the "Explorables" RSS feed at:
| https://pair.withgoogle.com/explorables/rss.xml
| flyer_go wrote:
| I don't think I have seen an answer here that actually challenges
| this question - from my experience, I have yet to see a neural
| network actually learn representations outside the range in which
| it was trained. Some papers have tried to use things like
| sinusoidal activation functions that can force a neural network
| to fit a repeating function, but on its own I would call it pure
| coincidence.
|
| On generalization - its still memorization. I think there has
| been some proof that chatgpt does 'try' to perform some higher
| level thinking but still has problems due to the dictionary type
| lookup table it uses. The higher level thinking or agi that
| people are excited about is a form of generalization that is so
| impressive we don't really think of it as memorization. But I
| actually question if our wantingness to generate original thought
| isn't as actually separate from what we currently are seeing.
| smaddox wrote:
| > I have yet to see a neural network actually learn
| representations outside the range in which it was trained
|
| Generalization doesn't require _learning representations_
| outside of the training set. It requires learning reusable
| representations that compose in ways that enable solving unseen
| problems.
|
| > On generalization - its still memorization
|
| Not sure what you mean by this. This statement sounds self
| contradictory to me. Generalization requires abstraction /
| compression. Not sure if that's what you mean by memorization.
|
| Overparameterized models are able to generalize (and tend to,
| when trained appropriately) because there are far more
| parameterizations that minimize loss by compressing knowledge
| than there are parameterizations that minimize loss without
| compression.
|
| This is fairly easy to see. Imagine a dataset and model such
| that the model has barely enough capacity to learn the dataset
| without compression. The only degrees of freedom would be
| through changes in basis. In contrast, if the model uses
| compression, that would increase the degrees of freedom. The
| more compression, the more degrees of freedom, and the more
| parameterizations that would minimize the loss.
|
| If stochastic gradient descent is sufficiently equally as
| likely to find any given compressed minimum as any given
| uncompressed minimum, then the fact that there are
| exponentially many more compressed minimums than uncompressed
| minimums means it will tend to find a compressed minimum.
|
| Of course this is only a probabilistic argument, and doesn't
| guarantee compression / generalization. And in fact we know
| that there are ways to train a model such that it will not
| generalize, such as training for many epochs on a small dataset
| without augmentation.
| jhaenchen wrote:
| The issue is that we are prone to inflate the complexity of our
| own processing logic. Ultimately we are pattern recognition
| machines in combination with abstract representation. This
| allows us to connect the dots between events in the world and
| apply principles in one domain to another.
|
| But, like all complexity, it is reduceable to component parts.
|
| (In fact, we know this because we evolved to have this ability.
| )
| agalunar wrote:
| Calling us "pattern recognition machines capable of abstract
| representation" I think is correct, but is (rather) broad
| description of what we can do and not really a comment on how
| our minds work. Sure, from personal observation, it seems
| like we sometimes overcomplicate self-analysis ("I'm feeling
| bad - why? oh, there are these other things that happened and
| related problems I have and maybe they're all manifestations
| of one or two deeper problems, &c" when in reality I'm just
| tired or hungry), but that seems like evidence we're both
| simpler than we think and also more complex than you'd expect
| (so much mental machinery for such straightforward
| problems!).
|
| I read _Language in Our Brain_ [1] recently and I was amazed
| by what we 've learned about the neurologicial basis of
| language, but I was even more astounded at how profoundly
| _little_ we know.
|
| > But, like all complexity, it is reduceable to component
| parts.
|
| This is just false, no? Sometimes horrendously complicated
| systems are made of simple parts that interact in ways that
| are intractable to predict or that defy reduction.
|
| [1] https://mitpress.mit.edu/9780262036924/language-in-our-
| brain
| huijzer wrote:
| A bit of both, but it does certainly generalize. Just look into
| the sentiment neuron from OpenAI in 2017 or come up with an
| unique question to ChatGPT.
| _ache_ wrote:
| Does anyone know how that charts are created ? I bet that it's
| half generated by some sort of library and them manually improved
| but the generated animated SVG are beautiful.
| 1wheel wrote:
| Basically just a bunch of d3 -- could be cleaned up
| significantly, but that's hard to do while iterating and
| polishing the charts.
|
| I also have a couple of little libraries for things like
| annotations, interleaving svg/canvas and making d3 a bit less
| verbose.
|
| - https://github.com/PAIR-code/ai-
| explorables/tree/master/sour...
|
| - https://1wheel.github.io/swoopy-drag/
|
| - https://github.com/gka/d3-jetpack
|
| - https://roadtolarissa.com/hot-reload/
| iaw wrote:
| I was going to ask the same question. Those are some great
| visualizations
| davidguetta wrote:
| hierarchize would be a better term than generalize
| 3cats-in-a-coat wrote:
| Generalize is seeing common principles, patterns, between
| disparate instances of a phenomena. It's a proper word for
| this.
| Chabsff wrote:
| That's a common mechanism to achieve generalization, but the
| term is a little more general (heh) than that. It
| specifically refers to correctly handling data that lives
| outside the distribution presented by the training data.
|
| It's a description of a _behavior_ , not a mechanism. Which
| may or may not be appropriate depending on whether you are
| talking about *what* the model does or *how* it achieves it.
| 3cats-in-a-coat wrote:
| Kinda fuzzy what's "in the distribution", because it
| depends on how deeply the model interprets it. If it
| understands examples outside the distribution... that kinda
| puts them in the distribution.
|
| General understanding makes the information in the
| distribution very wide. Shallow understanding makes it very
| narrow. Like say recognizing only specific combinations of
| pixels verbatim.
| Chabsff wrote:
| I think you are misinterpreting. The distribution present
| in the training set in isolation (the one I'm referring
| to, and is not fuzzy in the slightest) is not the same
| thing as the distribution understood by the trained model
| (the one you are referring to, and is definitely more
| conceptual and hard to characterize in non-trivial
| cases).
|
| "Generalization" is simply the theoretical measure of how
| much the later extends beyond the former, regardless of
| how that's achieved.
| davidguetta wrote:
| Generalize has a tendency to imply you can extrapolate. And
| in most case it's actually the opposite that happens: neural
| nets tend to COMPRESS the data. (which in turn is a good
| thing in many case because the data is noisy)
| 3cats-in-a-coat wrote:
| The point of compression is to decompress after. That's
| what happens during inference, and when the extrapolation
| occurs.
|
| Let's say I tell GPT "write 8 times foobar". Will it? Well
| then it understands me and can extrapolate from the request
| to the proper response, without having specifically "write
| 8 times foobar" in its model.
|
| Most decompression algorithms focus on predicting the next
| token (byte, term, etc.), believe it or not. The more
| accurately they predict the next token, the less
| information you need to store to correct misprediction.
| ot wrote:
| "hierarchize" only describes your own mental model of how
| knowledge organization and reasoning may work in the model, not
| the actual phenomenon being observed here.
|
| "generalize" means going from specific examples to general
| cases not seen before, which is a perfectly good description of
| the phenomenon. Why try to invent a new word?
| davidguetta wrote:
| > hierarchize" only describes your own mental model of how
| knowledge organization and reasoning may work in the model,
| not the actual phenomenon being observed here
|
| It's not true, if you look at deep CNN the lower layers show
| lines, the higher complex stuff like eyes or football players
| etc.. Herarchisation of information actually emerges
| naturally in NNs.
|
| Generalization often implies extrapolation on new data, which
| is just not the case most of the time with NNs and why i
| didn't like the word
| version_five wrote:
| Anything would be better than "grokking".
|
| From what I gather they're talking about double descent which
| afaik is the consequence of overparameterization leading to a
| smooth interpolation between the training data as opposed to
| what happens in traditional overfitting. Imagine a polynomial
| fit with the same degree as the number of data points (swinging
| up and down wildly away from the data) compared with a much
| higher degree fit that could smoothly interpolate between the
| points while still landing right on them.
|
| None of this is what I would call generalization, it's good
| interpolation, which is what deep learning does in a very high
| dimensional space. It's notoriously awful at extrapolating, ie
| generalizing to anything without support in the training data.
| visarga wrote:
| > It's notoriously awful at extrapolating, ie generalizing to
| anything without support in the training data.
|
| Scientists are also pretty lousy at making new discoveries
| without labs. They just need training data.
| Jack000 wrote:
| double descent is a different phenomenon from grokking
| blueyes wrote:
| If your data set is too small, they memorize. If you train them
| well on a large dataset, they learn to generalize.
| visarga wrote:
| they only generalise with big datasets, that is the rule
| blueyes wrote:
| That's what I said.
| ajuc wrote:
| I was trying to make an AI for my 2d sidescrolling game with
| asteroid-like steering learn from recorded player input +
| surroundings.
|
| It generalized splendidly - it's conclusion was that you always
| need to press "forward" and do nothing else, no matter what
| happens :)
| mostertoaster wrote:
| Sometimes I think the reason human memory in some sense is so
| amazing, is what we lack in storage capacity that machines have,
| we makeup for in our ability to create patterns that compress the
| amount of information stored dramatically, and then it is like we
| compress those patterns together with other patterns and are able
| to extract things from it. Like it is an incredibly lossy
| compression, but it gets the job done.
| tbalsam wrote:
| For more information and the related math behind associative
| memories, please see Hopfield Neural Networks.
|
| While the upper bound is technically "infinity", there is a
| tradeoff between the amount of concepts stored and the
| fundamental amount of information storable per concept, similar
| to how other tradeoff principles like the uncertainty
| principle, etc work.
| bobboies wrote:
| Good example in my math and physics classes I found it really
| helpful to understand the general concepts, then instead of
| memorizing formulas could actually derive them from other known
| (perhaps easier-to-remember) facts.
|
| Geometry is good for training in this way--and often very
| helpful for physics proofs too!
| BSEdlMMldESB wrote:
| yes, when we do this to history, it becomes filled with
| conspiracies. but is merely a process to 'understand' history
| by projecting intentionalities.
|
| this 'compression' is what 'understanding' something really
| entails; at first... but then there's more.
|
| when knowledge becomes understood it enables perception (e.g.
| we perceive meaning in words once we learn to read).
|
| when we get really good at this understanding-perception we may
| start to 'manipulate' the abstractions we 'perceive'. an
| example would be to 'understand a cube' and then being able to
| rotate it around so to predict what would happen without really
| needing the cube. but this is an overly simplistic example
| pillefitz wrote:
| That is essentially what embeddings do
| nightski wrote:
| Maybe, except from my understanding an embedding vector tends
| to be much larger than the source token (due to the high
| dimensionality of the embedding space). So it's almost like a
| reverse compression in a way. That said I know vector DBs
| have much more efficient ways of storing those vector
| embedding.
| jncfhnb wrote:
| Tokens are not 1:1 with vectors.
| bufferoverflow wrote:
| There are rare people who remember everything
|
| https://youtu.be/hpTCZ-hO6iI
| svachalek wrote:
| It's pretty fascinating to me how "normal" Marilu Henner
| seems to be. I'm getting older and my memory is not what it
| was, but when I was younger it was pretty extraordinary. I
| did really well in school and college but over time I've
| realized it was mostly due to being able to remember most
| things pretty effortlessly, over being truly "smart" in a
| classic sense.
|
| But having so much of the past being so accessible is tough.
| There are lots of memories I'd rather not have, that are
| vivid and easily called up. And still, I think it's only a
| fraction of what her memory seems to be like.
| 93po wrote:
| As someone on the other end of the spectrum, I have an
| awful memory, and don't remember most of my life aside from
| really wide, sweeping generalizations and maybe a couple
| hundred very specific memories. My way of existence is also
| very sad, and it makes me feel like I've not really lived.
| TheRealSteel wrote:
| " I did really well in school and college but over time
| I've realized it was mostly due to being able to remember
| most things pretty effortlessly"
|
| Same! They thought I was a genius in primary school but I
| ended up a loser adult with a dead end job. Turns out I
| just liked technology and was good at remembering facts and
| names for things.
| hgsgm wrote:
| Is there scientific evidence of that or just claims?
| badumtsss wrote:
| some people don't want to be studied or tested.
| ComputerGuru wrote:
| That's not exactly true, there doesn't seem to be an upper
| bound (that we can reach) on storage capacity in the brain [0].
| Instead, the brain actually works to actively distill knowledge
| that doesn't need to be memorized verbatim into its essential
| components in order to achieve exactly this "generalized
| intuition and understanding" to avoid overfitting.
|
| [0]: https://www.scientificamerican.com/article/new-estimate-
| boos...
| halflings wrote:
| > That's not exactly true [...] Instead, the brain actually
| works to actively distill knowledge that doesn't need to be
| memorized verbatim into its essential components
|
| ...but that's exactly what OP said, no?
|
| I remember attending an ML presentation where the speaker
| shared a quote I can't find anymore (speaking of memory and
| generalization :)), which said something like: "To learn is
| to forget"
|
| If we memorized everything perfectly, we would not learn
| anything: instead of remembering the concept of a "chair",
| you would remember thousands of separate instances of things
| you've seen that have a certain combination of colors and
| shapes etc
|
| It's the fact that we forget certain details (small
| differences between all these chairs) that makes us learn
| what a "chair" is.
|
| Likewise, if you remembered every single word in a book, you
| would not understand its meaning; understanding its meaning =
| being able to "summarize" (compress) this long list of words
| into something more essential: storyline, characters,
| feelings, etc.
| JieJie wrote:
| My mind is a blurry jpeg of my life.
|
| (https://www.newyorker.com/tech/annals-of-
| technology/chatgpt-...)
| WanderPanda wrote:
| Compression = Intelligence
|
| http://prize.hutter1.net/
| ComputerGuru wrote:
| > but that's exactly what OP said, no?
|
| Not precisely. We don't know if verbatim capacity is
| limited (and it doesn't seem to be) but the brain operates
| in a space-efficient manner all the same. So there isn't
| necessarily a causative relationship between "memory
| capacity" and "means of storage".
|
| > Likewise, if you remembered every single word in a book,
| you would not understand its meaning
|
| I understand your meaning but I want to clarify for the
| sake of the discussion that unlike with ML, the human brain
| can both memorize verbatim _and_ understand the meaning
| because there is no mechanism for memorizing something but
| not processing it (i.e. purely storage). The first pass(es)
| are stripped to their essentials but subsequent passes
| provide the ability to memorize the same input.
| whimsicalism wrote:
| > verbatim capacity is limited
|
| I am but a simple physicist and I can already tell you it
| is.
| SanderNL wrote:
| We know for certain it is limited. Do brains not adhere
| to physics?
| cmpalmer52 wrote:
| There's a story by Jorge Luis Borges called "Funes the
| Memorious" about a man who remembers everything, but can't
| generalize. There's a line about him not knowing if a dog
| on the square glimpsed at noon from the side is the same
| dog as the one seen from the back at 12:01 or something
| like that. Swirls of smoke from a cigarette are memorized
| forever. He mostly sits in a dark room.
| jjk166 wrote:
| Distilling knowledge is data compression.
| w10-1 wrote:
| You're conflating memorization with generalization, no?
| jjk166 wrote:
| Memorization is storing data. Generalization is
| developing the heuristics by which you compress stored
| data. To distill knowledge is to apply heuristics to
| lossily-compress a large amount of data to a much smaller
| amount of data from which you nevertheless can recover
| enough information to be useful in the future.
| downboots wrote:
| Can "distill knowledge" be made precise ?
| __loam wrote:
| Unless you know something the neuroscientists don't, it
| cannot.
| ComputerGuru wrote:
| As best as I've been able to research, it's still under
| active exploration and there are hypotheses but no real
| answers. I believe research has basically been circling
| around the recent understanding that in addition to being
| part of how the brain is wired, it is also an active,
| deliberate (if unconscious) mechanism that takes place in
| the background and is run "at a higher priority" during
| sleep (sort of like an indexing daemon running at low
| priority during waking hours then getting the bulk of
| system resources devoted to it during idle).
|
| There are also studies that show "data" in the brain isn't
| stored read-only and the process of accessing that memory
| involves remapping the neurons (which is how fake memories
| are possible) - so my take is if you access a memory or
| datum sequentially start to finish each time the brain
| knows this is to be stored verbatim for as-is retrieval but
| if you access snapshots of it or actively seek to and
| replay a certain part while trying to relate that memory to
| a process or a new task, the brain rewires the neural
| pathways accusingly. Which implies that there us an
| unconscious part that takes place globally plus an active,
| modifying process where how we use a stored memory affects
| how it is stored and indexed (so data isn't accessed by
| simple fields but rather by complex properties or getters,
| in programming parlance).
|
| I guess the key difference from how machine learning works
| (and I believe an integral part of AGI, if it is even
| possible) is that inference is constant, even when you're
| only "looking up" data and you don't know the right answer
| (i.e. not training stage). The brain recognizes how the new
| query differs from queries it has been trained on and can
| modify its own records to take into account the new data.
| For example, let's say you're trying to classify animals
| into groups and you've "been trained" on a dataset that
| doesn't include monotremes or marsupials. The first time
| you come across a platypus in the wild (with its mammaries
| but no nipples, warm-blooded but lays eggs, and a single
| duct for waste and reproduction) you wouldn't just
| mistakenly classify it as a bird or mammal - you would
| actively trigger a (delayed/background) reclassification of
| all your existing inferences to account for this new
| phenomenon, _even though you don't know what the answer to
| the platypus classification question is_.
| clord wrote:
| imo, it amounts to revisiting concepts once more general
| principles are found -- and needed. For instance, you learn
| the alphabet, and it's hard. the order is tricky. the
| sounds are tricky, etc. but eventually, it get distilled to
| a pattern. But you still have to start from A to remember
| what letter 6 is, until you encounter that problem many
| times, and then the brain creates a 6=F mapping. I think of
| it in economic terms: when the brain realizes it's cheaper
| to create a generalization, it does so on the fly, and that
| generalization takes over the task.
|
| Somtimes it's almost like creating a specialist shard to
| take over the task. Driving is hard at first, with very
| high task overload, lots to pay attention to. With
| practice, it becomes a little automated part of yourself
| takes care of those tasks while your main general
| intelligence can do whatever it likes, even as the "driver"
| deals with seriously difficult tasks.
| esafak wrote:
| https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theor
| y
| nonameiguess wrote:
| I've thought about this a lot in the context of the desire
| people seem to have to try and achieve human immortality or
| at least indefinite lifespans. If SciAm is correct here and
| the upper bound is a quadrillion bytes, we may not be able to
| hit that given the bound on possible human experiences, but
| someone who lived long enough would eventually hit that.
| After a hundred million years or whatever the real number is
| of life, you'd either lose the ability to form new memories
| or you'd have to overwrite old ones to do so.
|
| Aside from having to eventually experience the death of all
| stars and light and the decay of most of the universe's
| baryonic matter and then face an eternity of darkness with
| nothing to touch, it's yet another reason I don't think
| immortality (as opposed to just a very long lifespan) is
| actually desirable.
| mewpmewp2 wrote:
| I imagine there would be perhaps tech or technique which
| you can choose to determine which memories to compress and
| countless of others techniques like extra storage that you
| can instantly access, so I don't see all of these as being
| real arguments why not become immortal. If I have to choose
| to be dead and memoryless compared to losing some of my
| memories, but being still alive, why should I choose being
| dead and memoryless?
|
| And when losing memories you would first just discard some
| details, like you lose now anyway, but you would start
| compressing centuries into rough ideas of what happened,
| it's just the details that would lack a bit.
|
| I don't see it being a problem at all. And if really
| something happens with the Universe, sure I can die then,
| but why would I want to die before?
|
| I want to know what happens, what gets discovered, what
| happens with humanity, how far do we reach in terms of
| understanding of what is going on in this place. Why are we
| here. Imagine dying and not even knowing why you were here.
| imtringued wrote:
| Longtermists argue that we will be harvesting hawking
| radiation from blackholes trillions of years after the heat
| death of the universe.
| __loam wrote:
| The last civilizations will be built around black holes.
| TheRealSteel wrote:
| You seem to have just re-stated what the other person said.
| whimsicalism wrote:
| Thank you, thought I was losing it for a second
| gattilorenz wrote:
| Is there a "realistic upper bound" in things that should be
| memorized verbatim? Ancient greeks probably memorized the
| Iliad and other poems (rhyming and metre might work as a
| substitute for data compression, in this case), and many
| medieval preachers apparently memorized the whole Bible...
| [deleted]
| gorjusborg wrote:
| Grr, the AI folks are ruining the term 'grok'.
|
| It means roughly 'to understand completely, fully'.
|
| To use the same term to describe generalization... just shows you
| didn't grok grokking.
| erwald wrote:
| "Grok" in AI doesn't quite describe generalization, it's more
| specific that that. It's more like "delayed and fairly sudden
| generalization" or something like that. There was some
| discussion of this in the comments of this post[1], which
| proposes calling the phenomenon "eventual recovery from
| overfitting" instead.
|
| [1]
| https://www.lesswrong.com/posts/GpSzShaaf8po4rcmA/qapr-5-gro...
| gorjusborg wrote:
| Whoever suggested 'eventual recovery from overfitting' is a
| kindred spirit.
|
| Why throw away the context and nuance?
|
| That decision only further leans into the 'AI is magic'
| attitude.
| jeremyjh wrote:
| No, actually this is just how language evolves. I'm glad we
| have the word "car" instead of "carriage powered by
| internal combustion engine" even if it confused some people
| 100 years ago when the term became used exclusively to mean
| something a bit more specfic.
|
| Of course the jargon used in a specific sub-field evolves
| much more quickly than common usage because the intended
| audience of paper like this is expected to be well-read and
| current in the field already.
| smolder wrote:
| Language devolves just as it evolves. We (the grand we)
| regularly introduce ambiguity --words and meanings with
| no useful purpose, or that are worse than useless.
|
| I'm not really weighing in on the appropriateness of the
| use "grok" in this case. It's just a pet peeve of mine
| that people bring out "language evolves" as an excuse for
| why any arbitrary change is natural and therefore
| acceptable and we should go with the flow. Some changes
| are strictly bad ones.
|
| A go-to example is when "literally" no longer means
| "literally", but its opposite, or nothing at all. We
| don't have a replacement word, so now in some contexts
| people have to explain that they "literally mean
| literally".
| krapp wrote:
| Language only evolves, "devolving" isn't a thing. All
| changes are arbitrary. Language is always messy, fluid
| and ambigious. You should go with the flow because being
| a prescriptivist about the way other people speak is
| obnoxious and pointless.
|
| And "literally" has been used to mean "figuratively" for
| as long as the word has existed[0].
|
| [0]https://blogs.illinois.edu/view/25/96439
| mdp2021 wrote:
| > _devolving isn 't a thing_
|
| Incompetent use is devolution.
| gorjusborg wrote:
| Also being overlooked is that the nuances in what we
| accept is in large part how we define group culture.
|
| If you want to use the word 'irregardless' unironically
| there are people who will accept that. Then there are the
| rest of us.
| smolder wrote:
| I'm going to take a rosier view of prescriptivists and
| say they are a necessary part of the speaking/writing
| public, doing the valuable work of fighting entropic
| forces to prevent making our language dumb. They don't
| always need to win or be right.
|
| That's the first time I've seen literally-as-figuratively
| defended from a historical perspective. I still think
| we'd all be better off if people didn't mindlessly use it
| as a filler word or for emphasis, which is generally what
| people are doing these days that is the source of
| controversy, not reviving an archaic usage.
|
| Also, it's kind of ironic you corrected my use of
| "devolves", where many would accept it. :)
| gorjusborg wrote:
| > No, actually this is just how language evolves
|
| Stop making 'fetch' happen, it's not going to happen.
| [deleted]
| tbalsam wrote:
| Part of the issue here is posting a LessWrong post. There is
| some good in there, but much of that site is like a Flat
| Earth conspiracy theory for neural networks.
|
| Neural network training [edit: on a fixed point task, as is
| often the case {such as image->label}] is always (always)
| biphasic necessarily, so there is no "eventual recovery from
| overfitting". In my experience, it is just people newer to
| the field or just noodling around fundamentally
| misunderstanding what is happening, as their network goes
| through a very delayed phase change. Unfortunately there is a
| significant amplification to these kinds of posts and such,
| as people like chasing the new shiny of some fad-or-another-
| that-does-not-actually-exist instead of the much more
| 'boring' (which I find fascinating) math underneath it all.
|
| To me, as someone who specializes in optimizing network
| training speeds, it just indicates poor engineering to the
| problem on the part of the person running the experiments. It
| is not a new or strange phenomenon, it is a literal
| consequence of the information theory underlying neural
| network training.
| tbalsam wrote:
| To further clarify things, the reason there is no mystical
| 'eventual recovery from overfitting ' is because
| overfitting is a stable bound that is approached. Adding
| this false denomination to this implies a non-biphasic
| nature to neural network training, and adds false
| information that wasn't there before.
|
| Thankfully things are pretty stable in the
| over/underfitting regime. I feel sad when I see ML
| misinformation propagated on a forum that requires little
| experience but has high leverage due to the rampant misuse
| of existing terms and complete invention of a in-group-
| language that has little touch with the mathematical
| foundations of what's happening behind the scenes. I've
| done this for 7-8 years at this point at a pretty deep
| level and have a strong pocket of expertise, so I'm not
| swinging at this one blindly.
| ShamelessC wrote:
| > Part of the issue here is posting a LessWrong post. There
| is some good in there, but much of that site is like a Flat
| Earth conspiracy theory for neural networks.
|
| Indeed! It's very frustrating that so many people here are
| such staunch defenders of LessWrong. Some/much of the
| behavior there is honestly concerning.
| NikkiA wrote:
| I've always taken 'grok' to be in the same sense as 'to be one
| with'
| gorjusborg wrote:
| Yeah, there is definitely irony that I'm trying to push my
| own definition of an extra-terrestrial word, complaining that
| someone is ruining it.
|
| If anyone wants to come up with their own definition, read
| Robert Heinlein's 'Stranger in a Strange Land'. There is no
| definition in there, but you build an intuition of the
| meaning by its use.
|
| One of the issues I have w/ the use in AI is that using the
| word 'grok' suggests that the machine understands (that's a
| common interpretation of the word grok, that it is an
| understanding greater than normal understanding).
|
| By using an alien word, we are both suggesting something that
| probably isn't technically true, while simultaneously giving
| ourselves a slimy out. If you are going to suggest that AI
| understands, just have the courage to say it with common
| english, and be ready for argument.
|
| Redefining a word that already exists to make the argument
| technical feels dishonest.
| snewman wrote:
| Actually the definition of 'grok' is discussed in the book;
| you can find some relevant snippets at
| https://en.m.wikipedia.org/wiki/Grok. My recollection is
| that the book says the original / literal meaning is
| "drink", but this isn't supported by the Wikipedia quotes
| and perhaps I am misremembering, it has been a long time.
| 93po wrote:
| I have heard grok used tremendously more frequently in the past
| year or two and I find it annoying because they're using it as
| a replacement for the word "understand" for reasons I don't
| "grok"
| whimsicalism wrote:
| I literally do not see the difference between the two uses that
| you are trying to make
| mxwsn wrote:
| They're just defining grokking in a different way. It's
| reasonable to me though - grokking suggests elements of
| intuitive understanding, and a sudden, large increase in
| understanding. These mirror what happens to the loss.
| thuuuomas wrote:
| "Grok" is more about in-group signaling like "LaTex
| credibility" or publishing blog posts on arxiv.
| jjk166 wrote:
| I've always considered the important part of grokking something
| to be the intuitiveness of the understanding, rather than the
| completeness.
| momirlan wrote:
| grok, implying a mystical union, is not applicable to AI
| Filligree wrote:
| Why not?
| benreesman wrote:
| Sci-Fi Nerd Alert:
|
| "Grok" was Valentine Michael Smith's rendering for human ears
| and vocal cords of a Martian word with a precise denotational
| semantic of "to drink". The connotational semantics range from
| to literally or figuratively "drink deeply" all the way up
| through to consume the absented carcass of a cherished one.
|
| I highly recommend Stranger in A Strange Land (and make sure to
| get the unabridged re-issue, 1990 IIRC).
| paulddraper wrote:
| What the difference between understanding and generalizing?
|
| And what is the indicator for a machine understanding
| something?
| jimwhite42 wrote:
| I'm not sure if I'm remembering it right, but I think it was on a
| Raphael Milliere interview on Mindscape, where Raphael said
| something along the lines of when there are many dimensions in a
| machine learning model, the distinction between interpolation and
| extrapolation is not clear like it is in our usual areas of
| reasoning. I can't work out if this could be something similar to
| what the article is talking about.
| MagicMoonlight wrote:
| Memorise because there is no decision component. It attempts to
| just brute force a pattern rather than thinking through the
| information and making a conclusion.
| lachlan_gray wrote:
| It looks like grid cells!
|
| https://en.wikipedia.org/wiki/Grid_cell
|
| If you plot a head map of a neuron in the hidden layer on a 2D
| chart where one axis is $a$ and the other is $b$, I think you
| might get a triangular lattice. If it's doing what I think it is,
| then looking at another hidden neuron would give a different
| lattice with another orientation + scale.
|
| Also you could make a base 67 adding machine by chaining these
| together.
|
| I also can't help the gut feeling that the relationship between
| W_in-proj's neurons compared to the relationship between W_out-
| proj's neurons looks like the same mapping as the one between the
| semitone circle and the circle of fifths
|
| https://upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Pi...
| esafak wrote:
| I haven't read the latest literature but my understanding is that
| "grokking" is the phase transition that occurs during the
| coalescing of islands of understanding (increasingly abstract
| features) that eventually form a pathway to generalization. And
| that this is something associated with over-parameterized models,
| which have the potential to learn multiple paths (explanations).
|
| https://en.wikipedia.org/wiki/Percolation_theory
|
| A relevant, recent paper I found from a quick search: _The
| semantic landscape paradigm for neural networks_
| (https://arxiv.org/abs/2307.09550)
| ComputerGuru wrote:
| PSA: if you're interested in the details of this topic, it's
| probably best to view TFA on a computer as there is data in the
| visualizations that you can't explore on mobile.
| tehjoker wrote:
| Well they memorize points and lines (or tanh) between different
| parts of the space right? So it depends on whether a useful
| generalization can be extracted from the line estimation and how
| dense the points on the landscape are no?
| [deleted]
| taeric wrote:
| I'm curious how representative the target function is? I get that
| it is common for you to want a model to learn the important
| pieces of an input, but a string of bits, and only caring about
| the first three, feels particularly contrived. Literally a truth
| table on relevant parameters of size 8? And trained with 4.8
| million samples? Or am I misunderstanding something there? (I
| fully expect I'm misunderstanding something.)
| jaggirs wrote:
| I have observed this pattern before in computer vision tasks
| (train accuracy flatlining for a while before test acc starts
| to go up). The point of the simple tasks is to be able to
| interpret what could be going on behind the scenes when this
| happens.
| taeric wrote:
| No doubt. But I have also seen what people thought were
| generalized models failing on outlier, but valid, data. Quite
| often.
|
| Put another way, it isn't just how simple this task seems to
| be in the number of terms that are important, but isn't it
| also a rather dense function?
|
| Probably better question to ask is how sensitive are models
| that are looking at less dense functions to this? (Or more
| dense.). I'm not trying to disavow the ideas.
| visarga wrote:
| Maybe humans are also failing a lot in out of distribution
| settings. It might be inherent.
| taeric wrote:
| We have names for that. :D. Stereotypes being a large
| one. Racism being motivated interpretation on the same
| ideas. Right?
| agumonkey wrote:
| They ponderize.
___________________________________________________________________
(page generated 2023-08-10 23:00 UTC)