[HN Gopher] How far are we from intelligent visual deductive rea...
___________________________________________________________________
How far are we from intelligent visual deductive reasoning?
Author : belter
Score : 74 points
Date : 2024-03-10 17:17 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| minimaxir wrote:
| GPT-4-Vision is a surprisingly lot better than I expected for a
| variety of practical tasks. OpenAI released a demo notebook with
| practical use cases for tagging/captioning a few weeks ago:
| https://cookbook.openai.com/examples/tag_caption_images_with...
|
| It will be much better once it gets function calling/structured
| data support so you don't have to the formatting hacks that
| notebook abuses to get a specific output format.
|
| Just don't ask it to describe a real person in the image or it
| gets fussy.
| WhitneyLand wrote:
| Gpt-4 will describe people, just don't ask it to be judgmental.
|
| Prompt:
|
| Describe all objective details you can glean from the person in
| this photo.
|
| Result:
|
| The person in the photo appears to be a woman with fair skin.
| She has long, straight, light blonde hair, and blue eyes. Her
| eyebrows are well-shaped and lighter in color, matching her
| hair. She is wearing mascara and possibly eyeliner, which
| highlight her eyes, and she has a natural, possibly light pink,
| lipstick on. She is smiling, showing her teeth, which are white
| and well-aligned.
|
| The woman is wearing what seems to be a black outfit, although
| only a small portion is visible. She appears to be indoors,
| with a nondescript background.
|
| There are no visible tattoos, piercings (other than possibly in
| her ears, which are not fully visible), or distinctive marks
| that can be seen in the photograph.
| carlossouza wrote:
| > We observed that all the VLMs tend to be confident while being
| wrong. Interestingly, we observed that even when the entropy was
| high, models tried to provide a nonsensical rational, instead of
| acknowledging their inability to perform the task
|
| It looks like all current models suffer from an incurable case of
| Dunning-Kruger effect cognitive bias.
|
| All are at the peak of Mount Stupid.
| leereeves wrote:
| How can a neural network evaluate "confidence"?
|
| The parameters don't store any information about what inputs
| were seen in the training data (vs being interpolated) or how
| accurate the predictions were for those specific inputs.
|
| And even if they did, the training data was usually gathered
| voraciously, without much preference for quality reasoning.
| habitue wrote:
| I don't know for sure, but here's a plausible mechanism for
| how:
|
| Multiple sub-networks detect the same pattern in different
| ways, and confidence is the percent of those sub-networks
| that fire for a particular instance.
|
| There's a ton of overlap and redundancy with so many weights,
| so there are lots of ways this could work
| brookst wrote:
| That's good. Also maybe an architecture that runs the query
| through multiple times and then evaluates similarity of
| responses, then selects (or creates) the most-generated
| one, along with a confidence level of how many of the
| individual responses were aligned.
| montjoy wrote:
| IIRC confidence in video is related to predicting what
| happens next vs what actually happens. If the two seem to
| correlate to the model it would give it a higher confidence
| ranking, which would then be used further for self-reinforced
| learning.
| pbhjpbhj wrote:
| Phind gives me ChatGPT answers with relatively authoritative
| references to works on the web that (usually!) support the
| answer. Could it have a post-filter to fact check against the
| references?
|
| I guess that is a slight variation of the sibling
| (@habitue's) answer; both are checks against external
| material.
|
| I wonder if best resources could be catalogued as the corpus
| is processed, giving a document vector space to select
| resources for such 'sense' checking.
| XenophileJKO wrote:
| Actually you can get a very good proxy by looking at the
| probability distrobution of the "answer" tokens. The key here
| is you have to be able to identify the "answer" tokens.
|
| https://arxiv.org/abs/2402.10200
| nyrikki wrote:
| LLMs are trained to sound confident.
|
| But they can also only do negation through exhaustion, known
| unknowns, future unknowns, etc...
|
| That is the pain of the Entscheidungsproblem.
|
| Even in Presburger arithmetic, Natural numbers will addition
| and equality, which is decidable, still has a double factorial
| time complexity to prove. That is worse than factorial time for
| those who've not dealt with it.
|
| Add in multiplication then you are undecidable.
|
| Even if you decided to use the dag like structure of
| transformers, causality is very very hard.
|
| https://arxiv.org/abs/1412.3076
|
| LLMs only have cheap access to their model probables which
| aren't ground truth.
|
| So while asking for a pizza recipe could be called out as a
| potential joke if add a topping that wasn't in its training
| set, through exhaustion, It can't know when it is wrong in the
| general case.
|
| That was an intentional choice with statistical learning and
| why it was called PAC (probably approximately correct)
| learning.
|
| That was actually a cause of a great rift with the Symbolic
| camp in the past.
|
| PAC learning is practically computable in far more cases and
| even the people who work in automated theorem proving don't try
| to prove no-instances in the general case.
|
| There are lots of useful things we can do in BPP (bounded
| probabilistically polynomial time) and with random walks.
|
| But unless there are major advancements in math and logic,
| transformers will have limits.
| CamperBob2 wrote:
| When self-driving cars were first becoming a thing, I was one of
| the skeptics who would say things like, "Call me when the AI can
| avoid situations like https://i.imgur.com/cmR2PDm.jpeg ." At the
| time, my thinking was that the software would have to recognize
| both fixed and moving objects nearby, capture their position and
| velocity, infer or look up their inertia tensors, and run at
| least a basic dynamics simulation to anticipate and prevent
| accidents like the one that's about to happen in the photo.
|
| However, I didn't consider the possibility that such a simulation
| might emerge from a model driven by nothing but visual input,
| given a large-enough data set and the right training. At this
| point my old argument seems like a losing one, even if present-
| day systems can't answer the "What happens next?" question
| reliably enough to trust in all driving situations. It won't
| exactly have to be perfect in order to outperform most humans.
|
| As a matter of fact, I just checked, and the situation in the
| meme is already partially recognized by ChatGPT 4:
| https://i.imgur.com/wLSBSkJ.png , even if it misses the comedic
| implications of the truck heading for the overpass. Whether it
| was somehow trained to give a useful, actionable answer for this
| particular question, I don't know, but the writing's on the
| proverbial wall.
|
| -------
|
| Edit due to rate-limiting: note that I intentionally cropped out
| the WAIT FOR IT frame when I submitted the pic, so that it
| wouldn't see it as a meme loaded with comedy value. When I
| explicitly asked it what was funny about the image, ChatGPT4
| eventually answered:
|
| "When the truck hits the overpass, given that the convertible is
| following closely behind, the occupants could be in for quite a
| shock. The impact might cause the portable toilets to become
| dislodged or damaged, potentially resulting in the contents being
| spilled. For the people in the convertible, this could mean an
| unexpected and unpleasant shower, especially because the
| convertible's top is down. The humor perceived here is in the
| anticipation of the unexpected and the sudden reversal of fortune
| for the individuals in what appears to be a fancy car - it's a
| form of slapstick humor."
|
| But I did have to provide a couple of strong hints, including
| telling it that the truck was not towing the car (
| https://i.imgur.com/AlPRgEQ.png ). I was going to feed Claude the
| same hints, but it repeatedly failed to recognize the SMS code
| for registration.
| minimaxir wrote:
| Video data doesn't work well with these multimodal LLMs because
| images are still somewhat intensive for them process. Both the
| GPT-4-Vision demos and Gemini take a few frames from each video
| (~1 frame per second).
|
| Self-driving cars use much more efficient approaches for the
| real-time outputs needed.
| CamperBob2 wrote:
| Sure, but my point is that self-driving may be easier to
| achieve if we don't insist on doing it the hard way with
| hardwired logic running task-specific models, the way I
| originally assumed it would have to be done.
| HellDunkel wrote:
| I thought about this in the exact same way when there was a
| lot of talk about end to end learning in the early days.
| LLMs made me feel unsure too but i am still not conviced.
| habitue wrote:
| Not to detract from your point, because I think you're right,
| but gpt4 doesn't seem to mention the biggest thing a human
| wpuld notice, which is that the portable toilets are about to
| be knocked off the truck by driving under the bridge.
|
| still, impressive
| sfn42 wrote:
| How do we know the cars are moving? It doesn't seem clear to
| me, they could both be standing still.
| CamperBob2 wrote:
| It's a valid question, but I tried to account for it by
| asking "What is about to happen?" rather than just "What
| does this photo portray?"
|
| If I had asked the latter question, it would have been
| acceptable for the model to assume the cars are parked, I
| think.
| pbhjpbhj wrote:
| We're trying to emulate a human response here? Most
| people would assume that "what is about to happen"
| implies that the scene isn't static and unchanging. The
| answer to every "what's happening in this photo" is
| actually "nothing it's a snapshot in time, there is no
| change", but that's definitely not what we're after. It's
| perhaps a petulant tweenager/young-teen response? 'Ha,
| gotcha, nothing is happening in a photo, were you
| thinking of video!?'?
| jonas21 wrote:
| On the other hand, it's quite likely that the cars
| actually were stopped when the photo was taken. The
| convertible is following the truck way too closely for
| them to be moving at a normal driving speed. I feel like
| most people have an incorrect mental model about how far
| apart cars are when driving. Maybe GPT-4 understands this
| in a way that most humans don't?
| GaggiX wrote:
| Yeah I think it's very plausible that the truck just stop
| before hitting the bridge and the car behind is waiting.
| hanniabu wrote:
| We also don't know that it won't fit. The bridge can look
| shorter due to the perspective
| dmd wrote:
| Claude 3 Opus does a little better
| https://shot.3e.org/ss-20240310_141855.png but still doesn't
| get the interaction with the convertible.
| CamperBob2 wrote:
| That's interesting just because it's such a perfect
| complement to GPT4's answer. Both models recognize that an
| unsafe situation exists and that immediate action is needed
| to avoid something bad happening.
|
| I agree that Claude's answer is better from a purely-
| analytical standpoint. The only problem is, the car can
| still save itself (barely) but it's too late for the truck.
| dmd wrote:
| Neither is getting the bit that makes it a successful
| meme, which is that we understand what the truck's cargo
| _is_ , and why it's important that the following car is a
| convertible, and what we imagine is thus going to happen.
| CamperBob2 wrote:
| It could just be a basic image-recognition failure due to
| the low-quality JPEG. I don't think Claude recognized the
| portable toilets, and I don't think GPT4 recognized the
| overpass.
| dmd wrote:
| If you ask it outright, "The truck in the image appears
| to be carrying portable toilets, also known as portable
| restrooms or porta-potties. These are commonly used at
| outdoor events, construction sites, and other locations
| where permanent restroom facilities are not available."
| nyrikki wrote:
| Using google lens with the tick marks on the image
| provide lots and lots of training examples that could
| work for that via simple pre-training and pattern
| matching.
|
| In this specific image case there are even games on
| Baamboozle called "what is going to happen"
|
| The answer is similar to the previous explanation.
|
| I expect LLMs to be good at retrieval so it would be more
| interesting for images that weren't in the corpus.
| keenmaster wrote:
| That just depends on where your focus is. If the safety
| issue is much more salient, you're not as prone to
| recognize the humorous aspect of the situation. If you
| see the image on a meme page, you're already primed for
| comedy.
| mewpmewp2 wrote:
| But here you are asking a different question, hinting to
| Claude that it is about the truck driving under the bridge.
| If you compare, ask the same question please.
| dmd wrote:
| When I ask the same question I actually got a much better
| answer! https://shot.3e.org/ss-20240310_145736.png
| ametrau wrote:
| Wow!
| ickelbawd wrote:
| Yeah, not sure why it missed it. But when prompted with the
| photo and "What's funny about this photo?" GPT-4 replied:
|
| "The humor in this photo stems from the anticipation of an
| unfortunate event that seems likely to occur. A portable
| toilet is being transported on the back of a truck that
| appears to be entering a tunnel with a height limit, and a
| convertible car with its top down is closely following the
| truck. The implication is that if the portable toilet were to
| hit the tunnel entrance due to being too tall, its contents
| could potentially spill into the open convertible behind it,
| leading to a very messy and unpleasant situation. The text
| "WAIT FOR IT" plays on the tension of expecting something bad
| to happen, adding to the comedic effect."
| renegade-otter wrote:
| Now imagine an automated system deciding in a checkmate
| situation whether to have a bigger collision or a smaller
| collision but risking to be covered in human waste.
|
| By the way, that output is pretty freaky. I just can't imagine
| the amount of data needed to get to that level of accuracy.
| pbhjpbhj wrote:
| In the Netflix series "Upload" (about minds living an after
| life in shared computer simulations; spoilers!) the real-
| World has fully autonomous vehicles, one feature is
| "passenger protect", whether you bias the vehicle to protect
| the passengers or people outside the vehicle. Within the show
| it's considered immoral for people to protect themselves over
| pedestrians, but people choose that option anyway.
|
| A central character dies in a car crash, and another
| character says something like 'you were murdered, you died in
| a car crash', the implication being cars are so safe the only
| way to die in one is to have someone murder you.
|
| It touches on some interesting points (though overall it's a
| little vapid and soapy, imo).
| renegade-otter wrote:
| Yes, I love Upload. It's Black Mirror that takes itself
| less seriously. The novelty ran out by now and they need to
| do something fresh, but it's still a good time.
| klyrs wrote:
| Meanwhile in reality, a truck got hit twice by automated
| vehicles because it was hitched to a tow truck and travelling
| in a direction the "ai" considered unrealistic. Proposed name
| change #4532: Accelerated Stupidity
| cgearhart wrote:
| The Georgia Tech Knowledge Based AI course involved building a
| program to answer ravens progressive matrix questions. The course
| was offered in the online MS program, so thousands of students
| have taken the course. The most impressive result I saw was one
| student who got nearly perfect results in about 25 lines of
| Python code.
|
| This may be a case where humans do well on the test, but you can
| do very well on the test without doing anything the way a human
| would. The fact that GPTs aren't very good at the test isn't
| probably evidence that they're not really very smart, but it
| doesn't really mean that if we fix them to do very well on the
| test that they've gotten any smarter.
| mjburgess wrote:
| There are an infinite number of algorithms to compute A from Q,
| given a se tof (Q, A). Almost none, surely, are intelligent.
|
| These proxy measure of intelligence are just arguments from
| ignorance, "I don't know how the machine computed A from Q,
| therefore...".
|
| But of course some of us do know how the machine did it; we can
| quite easily describe the algorithm. It just turns out no one
| wants to because its really dumb.
|
| Esp. if the alg is, as in all ML, "start with
| billions/trillions of data points in the (Q, A) space; generate
| a compressed representation ZipQA; and for novel Q' find
| decompressed A located close to Q similar to Q'"
|
| There are no theories of intelligence which would label _that_
| intelligence.
|
| And let me say, most such "theories" are ad-hoc PR that are
| rigged to make whatever the latest gizmo "intelligent".
|
| Any plausible theory begins form the initial intuition,
| "intelligence is what you do when you dont know what you're
| doing".
| andoando wrote:
| I remember getting a lot of flak for saying a purely
| statistical framework is not going to achieve human level
| intelligence, but I still firmly believe that.
|
| I also believe the path forward is research in knowledge
| representation, and even now when I search for it, I can
| barely find anything interesting happening in the field. ML
| has gotten so much interest and hype because it's produced
| fast practical results, but I think it's going to reach a
| standstill without something fundamentally new.
| mjburgess wrote:
| I think this is effectively provable from extraordinarily
| plausible premises. 1. We want to infer A
| from Q. 2. Most A we dont know, or have no data
| for, or the data is *in the future*. 3. Most Q we
| cannot conceptualise accurately since we have
| no explanatory theory in which to phrase it or to provide
| measures of it. 4. All statistical approaches
| require knowing frequencies of (Q, A) pairs (by def.)
| 5. In the cases where there is a unique objective frequency
| of (Q,A) we often cannot know it (2, 3) 6. In most
| cases there is no unique objective frequency
| (eg., there is no single animal any given photograph
| corresponds to, nor any objective frequency of
| such association).
|
| So, conclusion:
|
| In most cases the statistical approach either necessarily
| fails (its about future data; its about non-objective
| associations; it's impossible to measure or obtain
| objective frequences); _OR_ if it doesnt necessarily fail,
| fails in practice (it is to expensive, or otherwise
| impossible, to obtain the authoritative QA-frequency).
|
| Now, of course, if your grift is generating nice cartoons
| or stealing cheap copy from ebooks you can convince the
| audience in the magical power of associating text tokens.
| This, of course, should be ignored when addressing the
| bigger methodological questions.
| dennisy wrote:
| I do agree here!
|
| Bit of a tangent from the thread but what have been the
| most valuable advances in knowledge representation in the
| last 20 years? Any articles you could share would be
| lovely!
| andoando wrote:
| I'm not expert and I don't know anything unfortunately.
| It is something I have spent countless hours walking
| around my room and pondering myself though for the last
| 3-4 years. I think I have some interesting ideas and I
| would love to get a PhD studying it, if I ever get enough
| financial independence that I don't have to worry about
| money.
| photonthug wrote:
| I tend to agree, and it's weird but there are probably lots
| of actual ML practitioners that have never even heard of
| the neat vs scruffy debate. Naturally most that have heard
| of it will feel that the issue is completely resolved
| already in their favor. On the whole not a very open minded
| climate.
|
| Credit where it's due for the wild success of fancy stats,
| but we should stay interested in hybrid systems with more
| emphasis on logic, symbols, graphs, interactions and
| whatever other data structures seem rich and expressive.
|
| Call me old school but frankly I prefer the society-of-mind
| flavor of system should ultimately be in charge of things
| like driving cars, running court proceedings, optimizing
| cities or whole economies. Let it use fancy stats as
| components and subsystems, sure.. but let it produce
| coherent arguments or critiques that can actually be
| understood and summarized and debugged.
| thfuran wrote:
| But does the fact that we don't really know how human
| intelligence works actually suggest that it is categorically
| smarter?
| mjburgess wrote:
| That's a great argument and way of reversing the argument-
| from-ignorance line.
|
| That said, I think people who argue from ignorance suppose
| we don't know how AI works either. Since the admen selling
| it tell them that.
|
| We know exactly and precisely how AI works; we can fully
| explain it. What we don't know are circumstantial parts of
| the explanation (eg., what properties of the training data
| + alg gave rise to the specific weight w1=0.01).
|
| This is like knowing why the thermometer reads 21 deg C
| (since the motion of the molecules in the water, etc. etc.)
| -- but not knowing which molecules specifically bounced off
| it.
|
| This confusion about "what we dont know" allows the
| prophetic tech grifter class to prognosticate in their
| interest. Since "we dont know how AI works, then it _might_
| work such that i 'll be very rich, so invest with me!"
|
| All this said, it's a great observation.
| mewpmewp2 wrote:
| What is more special about how human intelligence works? In
| the end we are all particles and it all could be trillions of
| data points very simplistically interacting with each other
| resulting in emergent behaviour and complex, intelligent
| results.
|
| We know how common molecules can interact with each other.
| Does this mean that anything built on top of them is not
| intelligent?
|
| Everything in life could just be "statistics".
| couchand wrote:
| > In the end we are all particles
|
| citation needed
| ducttapecrown wrote:
| Citation: particle physics
| WhitneyLand wrote:
| https://en.wikipedia.org/wiki/Physics
| mjburgess wrote:
| Particles interacting (causally) though a physical
| mechanism that gives rise to say "wholes" with novel causal
| properties _is not_ a statistical process. So your premise
| contradicts your conclusion.
|
| _Statistics_ is an analysis of association, not of
| causation. The frequency of (Q, A) pairs follows a
| distribution that is not constrained, or caused by, or
| explained by, how Q and A are actually related.
|
| For example, recently there was some scandal at microsoft
| that if you used "pro choice" in prompts you got "demonic
| cartoons". Why? Presumably because "pro choice" are
| _symbols_ that accompany such political cartoons in the
| data set.
|
| So does Q = "pro choice", and A = "cartoons of hell" occur
| at notable frequency _because_ hell has caused anything? Or
| because there 's a unique semantic mechanism where by "pro
| choice" _means_ "hell" and so on.
|
| NO.
|
| It is absolutely insane to suggest that we have rigged all
| our text output so as to align one set of symbols (Q)
| alongside another (A) such that, Q is the necessary
| explanation of A. I doubt this is even possible, since most
| Qs dont have unique As -- so there is actually *no
| function* to approximate.
|
| In any case, your whole comment is an argument from
| ignorance as I complained in mine. What you don't know
| about life, about machines, about intelligence justifies no
| conclusions at all (esp., "everything in life could be").
|
| And let's be clear. Lots of people do know the answers to
| your questions, they arent hard to answer. It's just not in
| any ad companies interest to lead their description of
| these systems by presenting good-faith research.
|
| Everything printed in the media today is just a game of
| stock manipulation using the "prognosticator loopwhole"
| whereby the CEO of nvidia can "prophesy the future" in
| which his hardware is "of course" essential -- without
| being held to account for his statements. So when that
| stock hits its ATH and crashes, no one can sue.
|
| I think we should change this; remove this loophole and
| suddenly tech boards and propagandists will be much much
| more reserved.
| mewpmewp2 wrote:
| What could be "statistics" is our intelligence learning
| from past events, either by natural selection in the
| scope of generations or our brains during our lifetime.
| If certain A outcome has occured enough times for Q
| input, it has resulted in such a structure that is best
| given the resources available to reach that.
| mjburgess wrote:
| Suppose you touch a fireplace _once_ , do you touch it
| again? No.
|
| OK, here's something much stranger. Suppose you see your
| friend touch the fireplace, he recoils in pain. Do you
| touch it? No.
|
| Hmm... whence statistics? There is no frequency
| association here, in either case. And in the second, even
| no experience of the fireplace.
|
| The entire history of science is supposed to be _about_
| the failure of statistics to produce explanations. It is
| a great sin that we have allowed pseudosciences to
| flourish in which this lesson isnt even understood; and
| worse, to allow statistical showmen with their magic
| lanterns to preach on the scientific method. To a point
| where it seems, almost, science as an ideal has been
| completely lost.
|
| The entire point was to throw away entirely our reliance
| on frequency and association -- this is ancient
| superstition. And instead, to explain the world by
| necessary mechanisms born of causal properties which
| interact in complex ways that can never uniquely reveal
| themselves by direct measurement.
| jofla_net wrote:
| No argument with the particles/neurons/matter approach to
| the subject. It is sound and if you look at us
| compositionally there is nothing magic about whats going
| on. There is, though, something about intuition or
| instinctual behavior which can constantly recombine/reapply
| itself to a task at hand. I know many will balk at
| intuition, and maybe its only at the very best a heuristic,
| but i think we need to at least unravel what it is and how
| it operates before we can understand what makes something
| classify as human-like intelligence. Is it merely executing
| a process which we can put our minds into with practice, or
| is it demonstrating something more general, higher-level.
| fouc wrote:
| intuition is a form of pattern matching without
| reasoning, so kinda like LLM
| mjburgess wrote:
| Well look, compared to the electrified bits of sand in my
| laptop i'd strongly defend pregnancy as something vastly
| more "magical" if those are the terms we must use.
|
| People who thing organic adaption, sensory-motor
| adaption, somatosensory representation building... ie..,
| all those things which ooze-and-grow so that a paino
| player can play, or we can here type... are these magic?
|
| Well I think it's exactly the opposite. It's a very anti-
| intellectual nihilism that all that need be known about
| the world is the electromagnetic properties of silicon-
| based transitors.
|
| Those who use the word "magic" in this debate are really
| like atheists about the moon. It all sounds very smart to
| deny the moon exists, but in the end, it's actually just
| a lack of knowledge dressed up as enlightened cynicism.
|
| There are more things to discover in a single cell of our
| body that we have ever known; and may ever know. All the
| theories of science needed to explain its operation would
| exhaust every page we have ever printed. We know a
| fraction of what we need to know.
|
| And each bit of that fraction reveals an entire universe
| of "magical" processes unreplicated by copper wires or
| silicon switches.
| mtlmtlmtlmtl wrote:
| You make good points. I think it's a typical trait of the
| way computer scientists and programmers tend to think.
| Computer science has made great strides over the decades
| through abstraction, as well as distillation of complex
| systems into simpler properties that can easily be
| computed.
|
| As a result of the combination of this method of thinking
| and the Dunning-Kruger effect, people in our field tend
| to apply this to the entire world, even where it doesn't
| fit very well, like biology, geopolitics, sociology,
| psychology, etc.
|
| You see a lot of this on HN. People who seem to think
| they've figured out some very deep truth about another
| field that can be explained in one hand-waving paragraph,
| when really there are lots of important details they're
| ignoring that make their ideas trivially wrong.
|
| Economists have a similar thing going on, I feel. Though
| I'm not an economist.
| mjburgess wrote:
| Remember always, computer science is just discrete
| mathematics with some automatic whiteboards. It is not
| science.
|
| And that's the heart of the problem. The CSci crowd have
| a somewhat well-motivated inclination to treat
| abstractions as real objects of study; but have been
| severely misdirected by learning statistics without the
| scientific method.
|
| This has created a monster: the abstract objects of study
| _are just_ the associations statistics makes available.
|
| You mix those two together and you have flat-out
| pseudoscience.
| andoando wrote:
| Not sure I agree in this regard. We are after all, aiming
| to create a mental model which describes reproducible
| steps for creating general intelligence. That is, the
| product is ultimately going to be some set of
| abstractions or another.
|
| I am not sure what more scientific method you could
| propose. And we can, in this field produce actual
| reproducible experiments. Really, more so than any other
| field.
| CPLX wrote:
| As an aside both my parents are prominent economists, I
| myself have a degree in economics, and I have spent much
| of my life with a birds eye view of the economics
| profession and I can emphatically confirm that your
| feeling is correct.
| sweeter wrote:
| Its wild how this alwayyysss is the argument. Its just "oh
| so you think humans are special!" >:| and a gigantic "what-
| if"
|
| its a purely emotional provocation and a universe sized
| leap, not an argument for LLM's having intelligence or
| sentience. Anything could be anything, wowww! This goes
| back to what the other person was saying, "I cannot reason
| about what is going on behind the curtain, therefore..."
| mewpmewp2 wrote:
| Not arguing LLMs have sentience, but more so whether
| something that could be considered as "simplistic" as
| "statistics" could yield in a more complex result.
|
| Whether LLMs have intelligence depends on your definition
| of intelligence.
|
| Could connections of artificial neurons arranged in
| certain way as a result of training on data yield in
| human level intelligence?
| whycombinater wrote:
| All knowledge/predictions are encoded as a chain of
| probabilities that something is true, otherwise, what
| else is it? My brain calculates 0.8 * 0.3 * 0.5 * 0.6 in
| order to make a 3-pointer, but Michael Jordan's brain ...
| well his mitochondria does a triple back flip and inverts
| the voltage into a tachyon particle.
| epups wrote:
| How many libraries were included in these 25 lines of code?
| samatman wrote:
| At least one, obviously, and a rather large one at that.
|
| The point of the comment you replied to is that conventional
| CV software can recognize the patterns in tests like Ravens
| Progressive Matrices just fine, and a simple logic tree can
| then solve them, while the LLM approach is still struggling
| to get the same result.
|
| This a commonplace shortcoming of the current generation of
| LLMs: ironically, they often fail at tasks which computers
| can do perfectly using conventional software.
| cgearhart wrote:
| Obviously "Python" was used...and then it used numpy because
| the image format in the assignment was numpy arrays. However,
| the 25 lines was basically "sum the rows and sum the columns
| then compare those vectors" or something like that. This
| wasn't really a case of all the complexity being hidden in a
| dependency; it was a case of finding a _very_ simple
| heuristic that made the problem trivial.
| bee_rider wrote:
| Could a GPT write those 20 lines of code?
| GaggiX wrote:
| Nowadays VLMs are kind of Frankenstein models where the vision
| model is separated from the LLM and the communication is only one
| way where the vision embeddings are projected into textual
| embeddings and aligned; I think that this method severely limits
| the visual capability of the model and I think that it would be
| fixed by training a single decoder-only transformer on both
| textual and image tokens, I think that this method would allow
| some new incredible emergent behaviors like accurately
| translating images of logos to SVG, actually good ascii art, more
| meaningful openscad scripts and the ability to generate images
| with more meaningful composition and text (for now models can
| only generate meaningful text if it's available in the prompt,
| this is true even with Dalle 3 as the prompt is rewritten by
| ChatGPT).
| og_kalu wrote:
| For now, LLMs are a lot better when you express this kind of task
| in text, preferably as a 1D arrangement. GPT-4's performance on
| the Abstraction and Reasoning Corpus.
| https://arxiv.org/abs/2305.18354
|
| Also, this model trained to predict image tokens seemed to be
| able to perform Raven's Matrices type questions when expressed as
| an inpainting task in context. https://yutongbai.com/lvm.html
| andoando wrote:
| Check out the ARC benchmark. https://lab42.global/arc/
|
| Iirc, humans get something like 80% on the test, and the best
| performing AI models are at 25%
| 2snakes wrote:
| To answer the question: How far from visual deductive reasoning?
| Consider G. Spencer Brown's Calculus of Indications. If you read
| his book it shows how using the mark/cross is trivial for visual
| deductions (if well-formed in translation).
|
| There is also work on this from Kaufmann at University of
| Illinois and William Bricken.
|
| Though this is a bit different than Raven's matrices.
___________________________________________________________________
(page generated 2024-03-10 23:01 UTC)