[HN Gopher] Computer scientists prove why bigger neural networks...
___________________________________________________________________
Computer scientists prove why bigger neural networks do better
Author : theafh
Score : 165 points
Date : 2022-02-10 16:02 UTC (6 hours ago)
(HTM) web link (www.quantamagazine.org)
(TXT) w3m dump (www.quantamagazine.org)
| SpaceManNabs wrote:
| > Right now, we are routinely creating neural networks that have
| a number of parameters more than the number of training samples.
| This says that the books have to be rewritten.
|
| Confused by this statement. Double descent with
| overparameterization is exhibited in "classical settings" too and
| mentioned in older books.
|
| > In their new proof, the pair show that overparameterization is
| necessary for a network to be robust.
|
| What is important to note here is that many of papers this paper
| cites prove or show this result in certain network architectures.
| This paper adds universality.
|
| > The proof is very elementary -- no heavy math, and it says
| something very general
|
| The most elementary part was clever use of Hoeffding's
| inequality. Some people are really fast readers haha.
|
| I don't even know how you pick up the fact that isoperimetry
| holds in manifold settings with positive curvature while also
| playing with all those norms and inequalities. A few years ago I
| mentioned on here all the maths that I knew or wanted to know to
| read more papers, and others critiqued that the list was too
| long. Well, this is why!
| aray wrote:
| > Double descent with overparameterization is exhibited in
| "classical settings" too and mentioned in older books.
|
| I'm curious for references or citations to this. When I was
| going over double descent I tried to find citations like this
| (just in a couple places like ML/stats textbooks).
| moyix wrote:
| Here's one that lists some older references:
| https://arxiv.org/abs/2004.04328
| tomrod wrote:
| There are a handful of papers in the 90s that show this, but
| it wasn't recognized for what it is. Double descent is REALLY
| crazy to me, coming from a classical background.
| pishpash wrote:
| Over-parameterization for regularization is really old. The
| pseudoinverse min-norm solution for under-determined linear
| systems even has that flavor.
| tomrod wrote:
| Sure, but that's identification approaches in
| econometrics and matrix analysis contexts. Using that for
| neural networks is new-ish in the zeitgeist, which did
| not exist in the 1990s as it does today.
| throwmeawaysoon wrote:
| feketegy wrote:
| This looks interesting, I bookmarked it.
|
| My biggest blocker is the "statistics" part of M/L, knowing what
| algorithms to choose for various cases.
| qorrect wrote:
| This book was a big help for me and is very well written,
| https://xcelab.net/rm/statistical-rethinking/ . You can find it
| free online ( along with video course ). The printed version is
| a very nice high quality book.
| stevofolife wrote:
| Do you know if there are any online classes that use this
| book as a reference? Or more generally, what type of courses
| teaches this subject?
| lariati wrote:
| Thanks so much. That is an amazing level of choice in the
| example code. I need this right now as a type of statistical
| strength training.
| kache_ wrote:
| check out introduction to statistical learning
| [deleted]
| stared wrote:
| I am surprised that the paper does not even cite the Lottery
| Ticket Hypothesis (https://arxiv.org/abs/1803.03635,
| https://eng.uber.com/deconstructing-lottery-tickets/).
|
| In the LTH paper (IMHO the most fundamental deep learning
| publication in the last few years), the number of tickets goes as
| layer_size^n_layers.
| gwern wrote:
| I don't see how lottery tickets yield the isoperimetry result,
| even in a heuristic or handwavy sort of way. Yes, a larger
| network is more likely to have good-scoring subnetworks; sure.
| But that's all it says. What does that tell me about how
| efficiently I can construct an adversarial example? For that, I
| need something else, like, say a geometric argument about what
| sort of network will interpolate between high-dimensional
| datapoints with properties like "not changing much in response
| to small input changes"...
| renewiltord wrote:
| Considering the subject, it is at least somewhat amusing that
| you double posted this.
| samwisedum wrote:
| Let's add more nodes so we can overfit even better!
| ogogmad wrote:
| Does this help against adversarial examples? The article seems to
| suggest so.
| prideout wrote:
| > The proof relies on a curious fact about high-dimensional
| geometry, which is that randomly distributed points placed on the
| surface of a sphere are almost all a full diameter away from each
| other.
|
| What theorem is this referring to? Sounds like something I should
| already be familiar with, but I'm not.
| grungegun wrote:
| For reference, see the book High Dimensional Probability by
| Vershynin. It's free online. See Theorem 3.1.1. It proves that
| a sub-gaussian random vector is in some sense close in norm to
| sqrt(n) where n is the number of dimensions. Most of these
| results are true up to multiplying by some unknown constant.
| [deleted]
| [deleted]
| aix1 wrote:
| Not my area of expertise, but the quoted "fact" seems at best
| incompletely stated: surely for it to hold there must be some
| constraints on the number of points (likely as a function of
| the diameter)?
| pfortuny wrote:
| It is for VERY HUGE n, as siblings explain.
| Retric wrote:
| It's just wrong as stated, there is only one point a full
| diameter away from each point on a high dimensional sphere.
| Aka (1,0,0,0,0, ...) maps to (-1,0,0,0,0, ...) and nothing
| else. Just as (1,0) maps to (-1,0) on a unit circle and
| (1,0,0) maps to (-1,0,0) on a unit sphere.
|
| On a high dimensional sphere they should generally be close
| to square root of 2 radius away from each other.
| hedora wrote:
| If the data points are in the space [0,1]^n, and your
| metric function is:
|
| d(x,y) = 0 if x == y; 1 otherwise
|
| Then all points are distance one apart. It's been proven
| that, as dimensionality increases, normal euclidian
| distance over uniform point clouds rapidly converges to
| have the same behavior as the equality metric.
|
| The proof relies on the information gained by performing
| pairwise distance calculations.
|
| In the example distance function I gave, there is zero
| information gained if you plug in two points that are known
| to be non-equal.
|
| The information gained from evaluating the Euclidian
| distance function converges to zero as the dimensionality
| of the data set increases.
|
| (Note: This does not hold for low dimensional data that's
| been embedded in a higher dimensional space.)
|
| Edit: Misread your comment. Yes, everything ends up being
| the same distance apart. More precisely, the ratio of mean
| distance / stddev distance tends to infinity. The intrinsic
| dimensionality of the data is monotonic w.r.t. that ratio.
| bick_nyers wrote:
| Euclidean distance calculations change based on number of
| dimensions, for example, in 3 dimensions it is
| sqrt(a^2+b^2+c^2).
| Retric wrote:
| Yes, that's why it's square root of 2. Consider the
| origin (0,0,0, ...) to a random point on the sphere (~0,
| ~0, ~0, ...).
|
| Distance = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 +
| ...). So D = square root of ((~0-0)^2 + (~0-0)^2 +
| (~0-0)^2 + ... ), which is equal to 1 by definition of
| the unit high dimensional sphere.
|
| So distance from (1,0,0,0 ...) to (~0, ~0, ~0, ...) =
| square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + ... ) ~=
| square root of 2.
| bick_nyers wrote:
| Ahh ok, for some reason I was thinking (1,1,1) would be a
| valid point in this case
| dan-robertson wrote:
| The fact should say that the expected distance between two
| random points tends to the diameter as the dimension
| increases. The intuition is that to be close you need to be
| close in a large number of coordinates and the law of large
| numbers (though coordinates aren't independent) suggests
| that is unlikely. If you fix one point on a sphere (say
| (1,0,...,0)) then, for a high dimension, most points will
| not have any extreme values in coordinates and will look
| like (~0,~0,...,~0) where ~0 means something close to zero.
| But if we sum the squares of everything apart from the
| first we get 1 - (~0)^2 ~= 1, so the distance from our
| fixed point is (1 - ~0)^2 + sum_2^n (0 - ~0)^2 ~= 1 + 1 =
| 2.
| Retric wrote:
| You forgot the square root on distance formula. Distance
| = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 + ...).
|
| Consider the origin (0,0,0, ...) to a random point on the
| sphere (~0, ~0, ~0, ...). So Distance from origin =
| square root of ((~0-0)^2 + (~0-0)^2 + (~0-0)^2 + ... ),
| which sums to 1 by definition of the unit high
| dimensional sphere.
|
| Then plug in 1 vs 0 in the first place because we care
| about (1,0,0,0 ...) and you get the correct answer =
| square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + ... ) ~=
| square root of 2.
|
| Edited to fix typo and add clarity.
| dan-robertson wrote:
| Wow. Can't believe I missed that.
| ravi-delia wrote:
| It should be almost all points are _almost_ a full diameter
| away. However it 's still very striking, and an unintuitive
| fact about very high dimensional spheres.
| leto_ii wrote:
| I think it's something related to the curse of dimensionality
| [1] [2], basically just a property of high dimensional spaces
| (perhaps only certain kinds of spaces though).
|
| [1] https://en.wikipedia.org/wiki/Curse_of_dimensionality
|
| [2] http://kops.uni-
| konstanz.de/bitstream/handle/123456789/5715/...
| hedora wrote:
| The intrinsic dimensionality of a dataset is also relevant
| here.
|
| The M-Tree is one of my favorite indexes. It works with data
| that's embedded in infinite dimensional spaces (sometimes;
| it's bumping up against an impossibility result that's
| sketched in a sibling comment).
| bo1024 wrote:
| Yes.
|
| Even though almost every all pairs of points are almost a
| full diameter away from each other, they are also almost all
| almost orthogonal (i.e. the angle they make with the center
| of the sphere is very close to 90 degrees).
| bick_nyers wrote:
| My initial intuition is telling me that it would be diameter/2,
| from the perspective of a single point, the closest points
| would be near zero distance away, and the furthest points would
| be on the opposite side, a full diameter away, and I am
| assuming that there are a lot of points in a uniform
| distribution.
|
| What I have just thought about though, is what points would be
| exactly diameter/2 distance away from that point? If you have a
| circle, you might think it would be the points that form a 90
| degree triangle, but that is not the case, those points would
| be sqrt(2)*radius distance away.
|
| So while it is obvious to me that it is not diameter/2, it is
| not obvious to me why it would be diameter either, or how
| larger n converges it closer to the diameter or some other
| fixed number.
| dan-robertson wrote:
| If you consider a point on the sphere it means choosing a
| bunch of xi such that: x1^2 + x2^2 + ... +
| xn^2 = 1.
|
| Suppose wlog you pick (1,0,0,...,0). Then the distance from
| your point to a random point is: D = (x1-1)^2
| + x2^2 + ... + xn^2
|
| And from the first equation we know: x1^2 = 1
| - x2^2 - x3^2 - ... - xn^2
|
| Intuitionistically, your point will be far from a random
| point if x1 is close to zero, and x1 will be close to zero
| because _everything is close to zero._
|
| But we can be more mathematical about it. Our (very
| reasonable) assumption is that the volume of a n-dimensional
| disk is proportional to the nth power of its radius. The
| third equation shows that x1 is going to be big (meaning the
| distance to the chosen point above is not so close to the
| diameter) if a corresponding[1] point on the n-disk is close
| to the middle. But the distance from the origin, R, of a
| random point in the n-disk is distributed with pdf
| proportional to p(r) = r^n for r in [0,1]. So the cdf is just
| r^(n+1) and E[x1^2] = 1 - E[R] = 1 - (n+1)/(n+2), which tends
| to 0 as n grows.
|
| Therefore we get E[D] = E[(1-x1)^2] + 1 - E[x1^2] which tends
| to 2 as n grows large.
|
| [1] the correspondence is that if I give you a point on a
| disk, you can turn it into a point on a sphere by flipping a
| coin to decide if it goes in the upper or lower hemisphere
| and then projecting up or down perpendicular to the disk from
| the point onto the sphere. But thinking a little more, I'm
| not sure this preserves the metric as it favours points on
| the sphere that correspond to the middle parts of the disk.
| So I think the actual expected value of x1 should be smaller.
| WithinReason wrote:
| Let me hijack your explanation starting from this point:
| D = (x1-1)2 + (x22 + ... + x[?]2)
|
| Since all the x[?]2 sum to 1, as the dimensionality grows
| ([?]x[?]2-1 as n-[?]) each individual x[?] will converge
| towards 0. Since x1 is almost 0, therefore the (x1-1)2 term
| will be almost 1.
|
| Since we know that [?]x[?]2=1, and that x12 is almost 0,
| then we also know that [?]x[?]2 - x12 is almost 1, which is
| the 2nd half of the above expression for D. So the average
| distance converges to "almost 1 + almost 1", which "almost
| 2", which is the diameter.
| akomtu wrote:
| "each individual x[?] will converge towards 0"
|
| I'm not sure it will. x1 is chosen randomly in the -1..1
| interval. I dont see how the million other dimensions
| would force it to stick to 0. Those N other dimensions
| shrink the stddev(xi) by sqrt(N), though.
| WithinReason wrote:
| Then try normalizing a random 1000-element vector. The
| average of the vector elements is around 0.027.
| Retric wrote:
| Close, the distance formula is square root of (X1^2 +
| X2^2 ...).
|
| So exactly 1 gives a distance of 1, but almost 1 + almost
| 1 gives a distance of _almost_ square root of 2.
| WithinReason wrote:
| Good point!
| adgjlsfhk1 wrote:
| I think the most intuitive way of thinking about this is
| sphere packing. Asking what percent of points are within
| distance d of an n-sphere of radius 1 is equivalent to asking
| what the ratio of volumes is. For d<1, the n-volume of a
| radius d sphere tends to 0 as n goes towards infinity, so
| that means almost all of the points are as far away as
| possible.
| bjourne wrote:
| It's just another way to state the
| https://en.wikipedia.org/wiki/Curse_of_dimensionality
| zwaps wrote:
| Can someone speak to the generality of assuming c-isoperimetry
| for the distribution of features?
|
| Without knowing anything about this in particular, this seems to
| be a rather pertinent restriction of the result related to things
| like sampling assumptions and the like.
| woopwoop wrote:
| It really depends on what we assume about "natural" data. If it
| looks "positively curved", e.g. the uniform measure on the
| boundary of a convex body, or the a gaussian, or something,
| this holds. But if the distribution exhibits a strong
| hierarchical structure that's not so good. I think it's a
| plausible if not obviously true assumption.
| AmericanBlarney wrote:
| This conclusion feels like saying more CPU and memory are better.
| Seems obvious that more moves allows matching to have more
| nuance, but I guess cool that someone proved it.
| nazgul17 wrote:
| From what I understand, it says that more parameters are good.
| This wasn't obvious before this paper: you can fit a polynomial
| instead of a neutral net, but adding parameters wouldn't help
| with robustness in that case: the polynomial would become more
| and more jagged.
| ska wrote:
| > Seems obvious that more moves allows matching to have more
| nuance,
|
| This really has to be balance against overfitting. The key
| problem in ML is generalization, and lots of things improve
| training performance while making that worse.
| amelius wrote:
| Asymptotically better? Or practically better?
| ravi-delia wrote:
| We know from reality that they get practically better, but
| theoretic intuition suggests you shouldn't see an effect after
| some point. This paper shows that this intuition is wrong if
| you want your networks to be robust. It doesn't guarantee large
| networks will be though.
| kd5bjo wrote:
| Is there a corresponding result that gives the number of examples
| needed to provide a sufficient training set for a given physical
| phenomenon? I'm imagining a high-dimensional equivalent of
| Nyquist's sampling theorem.
|
| Coupled with this result, we'd then have a reasonable estimator
| of the network size required for particular tasks before even
| starting the data collection.
| pishpash wrote:
| VC dimension?
| rackjack wrote:
| Silly thought: if bigger NN's are better, shouldn't more neurons
| be better? Why aren't elephants smarter than us, despite having
| more neurons?
|
| https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n...
|
| https://pubmed.ncbi.nlm.nih.gov/24971054/
| kemiller wrote:
| IANANS but my understanding is that neurons/body mass is more
| indicative. Large animals have more neurons because large
| bodies need more.
| salty_biscuits wrote:
| They talk about the encephalization quotient, which is to the
| 2/3 power
|
| https://en.wikipedia.org/wiki/Encephalization_quotient
| cloogshicer wrote:
| You probably already know this (since you wrote "silly
| thought"), but real-life neurons are ridiculously more complex
| than simulated "neurons" in an NN. So the analogy doesn't
| really hold.
| pishpash wrote:
| They're more complex in biological construction and in
| signaling mechanism, but no proof that they are more complex
| in function.
| mattkrause wrote:
| An individual biological neuron can compute a variety of
| functions, including max and xor, that a single perceptron
| can't (e.g.,
| https://www.science.org/doi/10.1126/science.aax6239 ). In
| general, one needs a fairly elaborate ANN to approximate
| the behavior of a single biological neuron.
|
| OTOH, a three-layer network is a universal function
| approximator and RNNs are universal dynamical systems
| approximators, so they are sort of trivially equivalent.
| XnoiVeX wrote:
| I think a lot of people on this thread are missing this
| critical insight.
| visarga wrote:
| You can simulate the data processing of a real neuron with
| 1000 digital ones, a small neural net.
|
| I think we read too much into the complexity of biological
| neurons. Remember they need to do much more than compute
| signals. They need to self assemble, self replicate and
| pass through various stages of growth. They need to
| function for 80-100 years. Many of those neurons and
| synapses exist only for redundancy and other biological
| constraints.
|
| A digital neuron doesn't care about its physical substrate
| and can be millions of times faster. They can be copied
| identically for no cost and cheaply fine-tuned for new
| tasks. Their architecture and data can evolve much faster
| than ours, and the physical implementation can remain the
| same during this process.
| juancn wrote:
| Well, biological neurons are much more complex than CS neurons
| (https://www.quantamagazine.org/how-computationally-
| complex-i...).
|
| Also, you're working under the assumption that they are
| equivalent between mammals which as far as we can tell it's not
| the case (https://www.medicalnewstoday.com/articles/why-are-
| human-neur...).
|
| So my guess is that the comparison is much more complex than
| just number of neurons.
| gfody wrote:
| are we certain they're not? i'm not sure we know how to measure
| smartness
| beebeepka wrote:
| I only stopped saying my cat is smarter than the vast
| majority of people I've met because she is no longer with us.
|
| I did, and still do, believe this to be true. Would love to
| befriend a bird
| wizzwizz4 wrote:
| You can befriend corvids. Teaching them symbolic language
| is tricky, but they can trade and socialise and solve
| puzzles (if you manage to explain the puzzle).
| peterburkimsher wrote:
| Dumbo is smarto!
|
| Elephants have bodies built like a tank (and used as such by
| Hannibal), but humans have better I/O ports.
|
| {reading, writing, listening, speaking, singing, typing,
| doing, going}
|
| Without opposable thumbs, an elephant is probably quite
| envious of human writing & typing. Let's use the privilege
| wisely to encourage one another, teach and learn from each
| other, from Donald Tusk, and give a helping hand.
| Someone wrote:
| But African elephants have quite versatile opposable
| finger-like extensions at the tip of their trunks (Asian
| elephants have only one such thing)
| alexpotato wrote:
| Because, IIRC, a lot of neurons are dedicated to
| motion/sensing.
|
| Bigger animals may require more neurons to handle moving larger
| and/or more complicated muscle groups.
|
| Interesting related point there is the encephalization quotient
| which is related to the predicted ratio of brain size to body
| mass. On the wikipedia page [0] they list the EQ for various
| animals. Humans are the highest but dolphins and ravens are not
| far behind.
| molticrystal wrote:
| To further emphasize that having neural material focused on
| the appropriate functions is more important vs how much you
| have, here is a story about a guy whose brain is mostly
| hollow and filled with fluid, it probably did cause his IQ to
| be 75 and causes him weakness in his legs, but otherwise he
| lives a normal adult life more or less.
|
| https://www.newscientist.com/article/dn12301-man-with-
| tiny-b...
| acchow wrote:
| Doesn't this demonstrate the opposite of what you were
| claiming?
| lacksconfidence wrote:
| I feel like the quotes agree with parent:
|
| > "If something happens very slowly over quite some time,
| maybe over decades, the different parts of the brain take
| up functions that would normally be done by the part that
| is pushed to the side," adds Muenke, who was not involved
| in the case.
| Ajedi32 wrote:
| Did you see the scans? The dude's head is practically
| _empty_ (brain 55-75% smaller than normal) and nobody
| even noticed until he was 44 years old and got an MRI.
| divbzero wrote:
| I think it's that _a priori_ you would expect a hollow
| brain to have a far more drastic effect and not allow for
| a mostly normal adult life.
| pishpash wrote:
| Why would you expect that, when a tiny insect can do
| pretty intelligent things? What "unexpected" things
| humans can do are probably all in the >75 IQ range.
| gwern wrote:
| Volume != neurons. In any case, 75 is awful and is usually
| considered borderline retarded. (If you're tempted to
| respond with other cases of higher IQ, note that they are
| often retracted or unconfirmed and likely fraudulent in
| some way; see https://www.gwern.net/Hydrocephalus .)
| willmw101 wrote:
| >Volume != neurons
|
| Exactly. Most of the newer research on this topic
| suggests that it's neural connection complexity, and
| specifically frontal lobe volume, rather than overall
| brain size that determines intelligence or brain power.
|
| https://neuroscience.stanford.edu/news/ask-
| neuroscientist-do...
|
| >Luckily, there is much more to a brain when you look at
| it under a microscope, and most neuroscientists now
| believe that the complexity of cellular and molecular
| organization of neural connections, or synapses, is what
| truly determines a brain's computational capacity. This
| view is supported by findings that intelligence is more
| correlated with frontal lobe volume and volume of gray
| matter, which is dense in neural cell bodies and
| synapses, than sheer brain size. Other research comparing
| proteins at synapses between different species suggests
| that what makes up synapses at the molecular level has
| had a huge impact on intelligence throughout evolutionary
| history. So, although having a big brain is somewhat
| predictive of having big smarts, intelligence probably
| depends much more on how efficiently different parts of
| your brain communicate with each other.
| mattkrause wrote:
| As a counterpoint, rats without a cortex can
| do...basically everything normal rats can do--except trim
| their toenails. The classic reference for this is
| Whitslaw's 1990 chapter "The decorticate rat".
|
| This thread has links to a copy, plus a bunch of related
| studies in humans and animals. https://twitter.com/markdh
| umphries/status/107105276276554137...
| joebob42 wrote:
| Aside from other points, more neurons might be better "all else
| equal", but there are differences between our brain and an
| elephant's beyond just neuron count.
|
| It's like how just getting a bigger faster computer can help
| with your problem, but its less powerful than a new more
| efficient algorithm on the same computer.
| World_Peace wrote:
| Elephants very likely could be more intelligent than us, it
| just seems that intelligence is a difficult thing to measure
| quantitatively.
| bee_rider wrote:
| In particular, a given elephant might be "more intelligent"
| than a human -- we just happen to have evolved from a
| particular niche that has rendered us bizarrely good at
| abstracting knowledge and combining it with the knowledge of
| other humans.
| notahacker wrote:
| What is "more intelligent" if not "more capable of
| abstracting, synthesizing and sharing knowledge"?
| bee_rider wrote:
| How about the ability to solve novel problems?
|
| We have very good problem solving ability of course, but
| a superpowered ability to ask others how they solved the
| problem. If we wanted to somehow define a kind of 'brain
| horsepower' type intelligence, it seems to me that the
| former is closer to it than the latter, and it doesn't
| seem obvious to me that humans would necessarily take the
| top spot. Or that there's a reasonable/ethical way to
| test it -- let's take a human, elephant, crow, and
| dolphin, raise them in total isolation from the any
| community to get a measure of their untrained
| intelligence... we might get some interesting results on
| intelligence, but mostly we will learn something about
| ballistics as some ethics review board launches us unto
| the Sun.
| jayd16 wrote:
| You'd also need the desire for such things.
| tshaddox wrote:
| It may be hard to measure and even define precisely, but I
| think it's pretty clear that if we did agree on a definition
| in the context of this conversation it would be defined in
| such a way that humans are more intelligent than elephants.
| lariati wrote:
| I have listened to Francois Chollet say that all intelligence
| is specialized intelligence.
|
| I suspect the question really doesn't make sense if that is
| true.
|
| We just have this bias/mind projection fallacy that
| intelligence is a general physical property of the brain that
| can be measured. I just suspect this is not true.
|
| Like athletic ability doesn't generalize well. Of course,
| someone not athletic at all is never going to be a great
| athlete in anything but it makes no sense to compare Lance
| Armstrong to Patrick Mahomes in some general athletic
| context. Putting a number on a general athletic ability index
| between the two would just be total nonsense.
| tshaddox wrote:
| For one thing, when the article says "bigger" it means "more
| parameters," not "more neurons."
| fabiospampinato wrote:
| I read on wikipedia [0] the other day a fairly disturbing
| statistic related to this, apparently human men have on average
| a ~10% bigger brain than women. It'd be interesting to know if
| that translates to a higher neuron count or the difference in
| volume is due to something else.
|
| [0]: https://en.wikipedia.org/wiki/Brain_size#:~:text=In%20men%
| 20....
| andrewflnr wrote:
| Probably just a consequence of overall physical size being
| larger. AFAIK there continues to be no evidence of a sex
| difference in overall intelligence, so slight difference in
| brain size is probably a red herring.
| gwern wrote:
| Density is also important. If we look at other things - some
| recent studies have been done on number-counting (https://royal
| societypublishing.org/doi/10.1098/rstb.2020.052...) or bird
| brains (https://www.gwern.net/docs/psychology/neuroscience/2020
| -herc...) - density jumps out as a major predictor. African
| elephants may have some more neurons, but the density isn't as
| great as a human where it counts, so they are remarkably
| intelligent (like ravens and crows), but still not human-level.
| There are diminishing returns in both directions. We have more
| neurons than any bird as much or more dense, and we have more
| density than any elephant with as many or more neurons. Put
| that together, and we squeak across the finish line to being
| just smart enough to create civilization.
|
| An analogy: what's the difference between a supercomputer, and
| the same number of CPUs scattered across a few datacenters?
| It's that in a supercomputer, those CPUs are packed physically
| as close as possible with expensive interconnects to allow them
| to communicate as fast as possible. (For many applications, the
| supercomputer will finish long before the spread out nodes ever
| finish communicating and idling.) But you need to improve both
| or else your new super-fast CPUs will spend all their time
| waiting on Infiniband to chug through, or your fancy new
| Infiniband will be underutilized and you should've bought more
| CPUs.
| user90349032 wrote:
| And yet, no animal except humans is self aware. Really makes
| you wonder why that is.
| Swizec wrote:
| There are lots of self aware non human animals.
|
| Dolphins and elephants are famous examples, most primates
| as well. Even many birds show levels of self awareness and
| theory of mind (they know the difference between what they
| know and what others know)
| visarga wrote:
| Seems like being a social animal is necessary for self
| awareness.
| Ardon wrote:
| You might be interested in the theories on the evolution
| of human intelligence: https://en.wikipedia.org/wiki/Evol
| ution_of_human_intelligenc...
|
| This is exactly the question the field is about, and I
| find it fascinating to read about
| Swizec wrote:
| In fact there is a popular theory[1] that bird
| intelligence evolved because of the way their social
| structures work. Birds mate for life _but they cheat_.
| Every bird wants their partner to be loyal and itself to
| sex as many other birds as possible.
|
| This means birds have to keep track of who can and can't
| see them cheat, who knows and who doesn't. There's even
| evidence that they rat each other out (2nd degree info)
| if they think there's a reward to be had. All of this
| requires immense intelligence, which happens to prove
| useful in other contexts.
|
| There's also a bird species who does this with food
| caches. Easier to steal from others than to build their
| own so a plethora of deceptive tactics developed to
| ensure others can't see where you're storing those
| delicious nuts. Complete with fake caches, lying, and
| espionage.
|
| [1] I learned about it in The Genius of Birds
| attemptone wrote:
| There are also lots of non self-aware human animals :P
| q845712 wrote:
| are you sure?
| https://en.wikipedia.org/wiki/Theory_of_mind_in_animals
| dr_dshiv wrote:
| Self awareness is social self awareness. Viewing oneself as
| a social actor.
| stjohnswarts wrote:
| that is simply incorrect bonobos, orcas, elephants,
| dolphins, chimpanzees, etc have all shown degrees of self
| awareness.
| moomin wrote:
| Probably that you don't know how to measure what you're
| describing.
|
| Plenty of animals recognise themselves in the mirror, for
| instance.
| btilly wrote:
| How do you measure intelligence? Elephants have much better
| memories than we do!
|
| https://www.scientificamerican.com/article/elephants-never-f...
| Ajedi32 wrote:
| That article doesn't seem to support your claim. All of the
| feats mentioned would be entirely unremarkable in your
| average human.
| btilly wrote:
| Really? You'd immediately recognize someone you knew for a
| few weeks over 20 years ago? You wouldn't need a bit to try
| to figure out who they are?
|
| If so, then your memory is unusually good. I know that this
| is well beyond my capabilities. Nor do I have the ability
| to visit a place that I lived 40 years earlier and find my
| way around.
| Someone wrote:
| How many other elephants did these elephants see in those
| 20+ years? It wouldn't surprise me if that was fewer than
| 100. How many did they spend a few weeks or more with? It
| wouldn't surprise me if that were less than 20.
|
| There also, AFAIK, isn't evidence they remember _all_
| other elephants they've shared time with for at least few
| weeks (I certainly do not rule that out, either, given
| the low number they likely will meet in their life)
| tshaddox wrote:
| > You'd immediately recognize someone you knew for a few
| weeks over 20 years ago?
|
| Yeah? Maybe not if they were a kid 20 years ago or their
| appearance had otherwise changed significantly, but
| otherwise I don't see why not.
| Spooky23 wrote:
| I think it depends on the intensity of the experience.
|
| I recently found myself in a hotel that I stayed in as a
| 7-8 year old in the 80s for a particularly memorable
| vacation with my extended family. It was funny that I
| still remembered the I unusual aspects of the layout and
| could spot many of the changes that had been made over
| the years.
|
| But if you asked me to describe someone I met for a few
| days in a business context in 2020, I'd have a hard time
| remembering detail.
| 6gvONxR4sf7o wrote:
| Off topic, but I love that they make it trivial to find a link to
| the original paper. I know not everyone loves quanta, but stuff
| like this is really refreshing.
| lordgrenville wrote:
| What do people not like about Quanta?
___________________________________________________________________
(page generated 2022-02-10 23:00 UTC)