[HN Gopher] How far are we from intelligent visual deductive rea...
       ___________________________________________________________________
        
       How far are we from intelligent visual deductive reasoning?
        
       Author : belter
       Score  : 74 points
       Date   : 2024-03-10 17:17 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | minimaxir wrote:
       | GPT-4-Vision is a surprisingly lot better than I expected for a
       | variety of practical tasks. OpenAI released a demo notebook with
       | practical use cases for tagging/captioning a few weeks ago:
       | https://cookbook.openai.com/examples/tag_caption_images_with...
       | 
       | It will be much better once it gets function calling/structured
       | data support so you don't have to the formatting hacks that
       | notebook abuses to get a specific output format.
       | 
       | Just don't ask it to describe a real person in the image or it
       | gets fussy.
        
         | WhitneyLand wrote:
         | Gpt-4 will describe people, just don't ask it to be judgmental.
         | 
         | Prompt:
         | 
         | Describe all objective details you can glean from the person in
         | this photo.
         | 
         | Result:
         | 
         | The person in the photo appears to be a woman with fair skin.
         | She has long, straight, light blonde hair, and blue eyes. Her
         | eyebrows are well-shaped and lighter in color, matching her
         | hair. She is wearing mascara and possibly eyeliner, which
         | highlight her eyes, and she has a natural, possibly light pink,
         | lipstick on. She is smiling, showing her teeth, which are white
         | and well-aligned.
         | 
         | The woman is wearing what seems to be a black outfit, although
         | only a small portion is visible. She appears to be indoors,
         | with a nondescript background.
         | 
         | There are no visible tattoos, piercings (other than possibly in
         | her ears, which are not fully visible), or distinctive marks
         | that can be seen in the photograph.
        
       | carlossouza wrote:
       | > We observed that all the VLMs tend to be confident while being
       | wrong. Interestingly, we observed that even when the entropy was
       | high, models tried to provide a nonsensical rational, instead of
       | acknowledging their inability to perform the task
       | 
       | It looks like all current models suffer from an incurable case of
       | Dunning-Kruger effect cognitive bias.
       | 
       | All are at the peak of Mount Stupid.
        
         | leereeves wrote:
         | How can a neural network evaluate "confidence"?
         | 
         | The parameters don't store any information about what inputs
         | were seen in the training data (vs being interpolated) or how
         | accurate the predictions were for those specific inputs.
         | 
         | And even if they did, the training data was usually gathered
         | voraciously, without much preference for quality reasoning.
        
           | habitue wrote:
           | I don't know for sure, but here's a plausible mechanism for
           | how:
           | 
           | Multiple sub-networks detect the same pattern in different
           | ways, and confidence is the percent of those sub-networks
           | that fire for a particular instance.
           | 
           | There's a ton of overlap and redundancy with so many weights,
           | so there are lots of ways this could work
        
             | brookst wrote:
             | That's good. Also maybe an architecture that runs the query
             | through multiple times and then evaluates similarity of
             | responses, then selects (or creates) the most-generated
             | one, along with a confidence level of how many of the
             | individual responses were aligned.
        
           | montjoy wrote:
           | IIRC confidence in video is related to predicting what
           | happens next vs what actually happens. If the two seem to
           | correlate to the model it would give it a higher confidence
           | ranking, which would then be used further for self-reinforced
           | learning.
        
           | pbhjpbhj wrote:
           | Phind gives me ChatGPT answers with relatively authoritative
           | references to works on the web that (usually!) support the
           | answer. Could it have a post-filter to fact check against the
           | references?
           | 
           | I guess that is a slight variation of the sibling
           | (@habitue's) answer; both are checks against external
           | material.
           | 
           | I wonder if best resources could be catalogued as the corpus
           | is processed, giving a document vector space to select
           | resources for such 'sense' checking.
        
           | XenophileJKO wrote:
           | Actually you can get a very good proxy by looking at the
           | probability distrobution of the "answer" tokens. The key here
           | is you have to be able to identify the "answer" tokens.
           | 
           | https://arxiv.org/abs/2402.10200
        
         | nyrikki wrote:
         | LLMs are trained to sound confident.
         | 
         | But they can also only do negation through exhaustion, known
         | unknowns, future unknowns, etc...
         | 
         | That is the pain of the Entscheidungsproblem.
         | 
         | Even in Presburger arithmetic, Natural numbers will addition
         | and equality, which is decidable, still has a double factorial
         | time complexity to prove. That is worse than factorial time for
         | those who've not dealt with it.
         | 
         | Add in multiplication then you are undecidable.
         | 
         | Even if you decided to use the dag like structure of
         | transformers, causality is very very hard.
         | 
         | https://arxiv.org/abs/1412.3076
         | 
         | LLMs only have cheap access to their model probables which
         | aren't ground truth.
         | 
         | So while asking for a pizza recipe could be called out as a
         | potential joke if add a topping that wasn't in its training
         | set, through exhaustion, It can't know when it is wrong in the
         | general case.
         | 
         | That was an intentional choice with statistical learning and
         | why it was called PAC (probably approximately correct)
         | learning.
         | 
         | That was actually a cause of a great rift with the Symbolic
         | camp in the past.
         | 
         | PAC learning is practically computable in far more cases and
         | even the people who work in automated theorem proving don't try
         | to prove no-instances in the general case.
         | 
         | There are lots of useful things we can do in BPP (bounded
         | probabilistically polynomial time) and with random walks.
         | 
         | But unless there are major advancements in math and logic,
         | transformers will have limits.
        
       | CamperBob2 wrote:
       | When self-driving cars were first becoming a thing, I was one of
       | the skeptics who would say things like, "Call me when the AI can
       | avoid situations like https://i.imgur.com/cmR2PDm.jpeg ." At the
       | time, my thinking was that the software would have to recognize
       | both fixed and moving objects nearby, capture their position and
       | velocity, infer or look up their inertia tensors, and run at
       | least a basic dynamics simulation to anticipate and prevent
       | accidents like the one that's about to happen in the photo.
       | 
       | However, I didn't consider the possibility that such a simulation
       | might emerge from a model driven by nothing but visual input,
       | given a large-enough data set and the right training. At this
       | point my old argument seems like a losing one, even if present-
       | day systems can't answer the "What happens next?" question
       | reliably enough to trust in all driving situations. It won't
       | exactly have to be perfect in order to outperform most humans.
       | 
       | As a matter of fact, I just checked, and the situation in the
       | meme is already partially recognized by ChatGPT 4:
       | https://i.imgur.com/wLSBSkJ.png , even if it misses the comedic
       | implications of the truck heading for the overpass. Whether it
       | was somehow trained to give a useful, actionable answer for this
       | particular question, I don't know, but the writing's on the
       | proverbial wall.
       | 
       | -------
       | 
       | Edit due to rate-limiting: note that I intentionally cropped out
       | the WAIT FOR IT frame when I submitted the pic, so that it
       | wouldn't see it as a meme loaded with comedy value. When I
       | explicitly asked it what was funny about the image, ChatGPT4
       | eventually answered:
       | 
       | "When the truck hits the overpass, given that the convertible is
       | following closely behind, the occupants could be in for quite a
       | shock. The impact might cause the portable toilets to become
       | dislodged or damaged, potentially resulting in the contents being
       | spilled. For the people in the convertible, this could mean an
       | unexpected and unpleasant shower, especially because the
       | convertible's top is down. The humor perceived here is in the
       | anticipation of the unexpected and the sudden reversal of fortune
       | for the individuals in what appears to be a fancy car - it's a
       | form of slapstick humor."
       | 
       | But I did have to provide a couple of strong hints, including
       | telling it that the truck was not towing the car (
       | https://i.imgur.com/AlPRgEQ.png ). I was going to feed Claude the
       | same hints, but it repeatedly failed to recognize the SMS code
       | for registration.
        
         | minimaxir wrote:
         | Video data doesn't work well with these multimodal LLMs because
         | images are still somewhat intensive for them process. Both the
         | GPT-4-Vision demos and Gemini take a few frames from each video
         | (~1 frame per second).
         | 
         | Self-driving cars use much more efficient approaches for the
         | real-time outputs needed.
        
           | CamperBob2 wrote:
           | Sure, but my point is that self-driving may be easier to
           | achieve if we don't insist on doing it the hard way with
           | hardwired logic running task-specific models, the way I
           | originally assumed it would have to be done.
        
             | HellDunkel wrote:
             | I thought about this in the exact same way when there was a
             | lot of talk about end to end learning in the early days.
             | LLMs made me feel unsure too but i am still not conviced.
        
         | habitue wrote:
         | Not to detract from your point, because I think you're right,
         | but gpt4 doesn't seem to mention the biggest thing a human
         | wpuld notice, which is that the portable toilets are about to
         | be knocked off the truck by driving under the bridge.
         | 
         | still, impressive
        
           | sfn42 wrote:
           | How do we know the cars are moving? It doesn't seem clear to
           | me, they could both be standing still.
        
             | CamperBob2 wrote:
             | It's a valid question, but I tried to account for it by
             | asking "What is about to happen?" rather than just "What
             | does this photo portray?"
             | 
             | If I had asked the latter question, it would have been
             | acceptable for the model to assume the cars are parked, I
             | think.
        
               | pbhjpbhj wrote:
               | We're trying to emulate a human response here? Most
               | people would assume that "what is about to happen"
               | implies that the scene isn't static and unchanging. The
               | answer to every "what's happening in this photo" is
               | actually "nothing it's a snapshot in time, there is no
               | change", but that's definitely not what we're after. It's
               | perhaps a petulant tweenager/young-teen response? 'Ha,
               | gotcha, nothing is happening in a photo, were you
               | thinking of video!?'?
        
               | jonas21 wrote:
               | On the other hand, it's quite likely that the cars
               | actually were stopped when the photo was taken. The
               | convertible is following the truck way too closely for
               | them to be moving at a normal driving speed. I feel like
               | most people have an incorrect mental model about how far
               | apart cars are when driving. Maybe GPT-4 understands this
               | in a way that most humans don't?
        
             | GaggiX wrote:
             | Yeah I think it's very plausible that the truck just stop
             | before hitting the bridge and the car behind is waiting.
        
             | hanniabu wrote:
             | We also don't know that it won't fit. The bridge can look
             | shorter due to the perspective
        
           | dmd wrote:
           | Claude 3 Opus does a little better
           | https://shot.3e.org/ss-20240310_141855.png but still doesn't
           | get the interaction with the convertible.
        
             | CamperBob2 wrote:
             | That's interesting just because it's such a perfect
             | complement to GPT4's answer. Both models recognize that an
             | unsafe situation exists and that immediate action is needed
             | to avoid something bad happening.
             | 
             | I agree that Claude's answer is better from a purely-
             | analytical standpoint. The only problem is, the car can
             | still save itself (barely) but it's too late for the truck.
        
               | dmd wrote:
               | Neither is getting the bit that makes it a successful
               | meme, which is that we understand what the truck's cargo
               | _is_ , and why it's important that the following car is a
               | convertible, and what we imagine is thus going to happen.
        
               | CamperBob2 wrote:
               | It could just be a basic image-recognition failure due to
               | the low-quality JPEG. I don't think Claude recognized the
               | portable toilets, and I don't think GPT4 recognized the
               | overpass.
        
               | dmd wrote:
               | If you ask it outright, "The truck in the image appears
               | to be carrying portable toilets, also known as portable
               | restrooms or porta-potties. These are commonly used at
               | outdoor events, construction sites, and other locations
               | where permanent restroom facilities are not available."
        
               | nyrikki wrote:
               | Using google lens with the tick marks on the image
               | provide lots and lots of training examples that could
               | work for that via simple pre-training and pattern
               | matching.
               | 
               | In this specific image case there are even games on
               | Baamboozle called "what is going to happen"
               | 
               | The answer is similar to the previous explanation.
               | 
               | I expect LLMs to be good at retrieval so it would be more
               | interesting for images that weren't in the corpus.
        
               | keenmaster wrote:
               | That just depends on where your focus is. If the safety
               | issue is much more salient, you're not as prone to
               | recognize the humorous aspect of the situation. If you
               | see the image on a meme page, you're already primed for
               | comedy.
        
             | mewpmewp2 wrote:
             | But here you are asking a different question, hinting to
             | Claude that it is about the truck driving under the bridge.
             | If you compare, ask the same question please.
        
               | dmd wrote:
               | When I ask the same question I actually got a much better
               | answer! https://shot.3e.org/ss-20240310_145736.png
        
               | ametrau wrote:
               | Wow!
        
           | ickelbawd wrote:
           | Yeah, not sure why it missed it. But when prompted with the
           | photo and "What's funny about this photo?" GPT-4 replied:
           | 
           | "The humor in this photo stems from the anticipation of an
           | unfortunate event that seems likely to occur. A portable
           | toilet is being transported on the back of a truck that
           | appears to be entering a tunnel with a height limit, and a
           | convertible car with its top down is closely following the
           | truck. The implication is that if the portable toilet were to
           | hit the tunnel entrance due to being too tall, its contents
           | could potentially spill into the open convertible behind it,
           | leading to a very messy and unpleasant situation. The text
           | "WAIT FOR IT" plays on the tension of expecting something bad
           | to happen, adding to the comedic effect."
        
         | renegade-otter wrote:
         | Now imagine an automated system deciding in a checkmate
         | situation whether to have a bigger collision or a smaller
         | collision but risking to be covered in human waste.
         | 
         | By the way, that output is pretty freaky. I just can't imagine
         | the amount of data needed to get to that level of accuracy.
        
           | pbhjpbhj wrote:
           | In the Netflix series "Upload" (about minds living an after
           | life in shared computer simulations; spoilers!) the real-
           | World has fully autonomous vehicles, one feature is
           | "passenger protect", whether you bias the vehicle to protect
           | the passengers or people outside the vehicle. Within the show
           | it's considered immoral for people to protect themselves over
           | pedestrians, but people choose that option anyway.
           | 
           | A central character dies in a car crash, and another
           | character says something like 'you were murdered, you died in
           | a car crash', the implication being cars are so safe the only
           | way to die in one is to have someone murder you.
           | 
           | It touches on some interesting points (though overall it's a
           | little vapid and soapy, imo).
        
             | renegade-otter wrote:
             | Yes, I love Upload. It's Black Mirror that takes itself
             | less seriously. The novelty ran out by now and they need to
             | do something fresh, but it's still a good time.
        
         | klyrs wrote:
         | Meanwhile in reality, a truck got hit twice by automated
         | vehicles because it was hitched to a tow truck and travelling
         | in a direction the "ai" considered unrealistic. Proposed name
         | change #4532: Accelerated Stupidity
        
       | cgearhart wrote:
       | The Georgia Tech Knowledge Based AI course involved building a
       | program to answer ravens progressive matrix questions. The course
       | was offered in the online MS program, so thousands of students
       | have taken the course. The most impressive result I saw was one
       | student who got nearly perfect results in about 25 lines of
       | Python code.
       | 
       | This may be a case where humans do well on the test, but you can
       | do very well on the test without doing anything the way a human
       | would. The fact that GPTs aren't very good at the test isn't
       | probably evidence that they're not really very smart, but it
       | doesn't really mean that if we fix them to do very well on the
       | test that they've gotten any smarter.
        
         | mjburgess wrote:
         | There are an infinite number of algorithms to compute A from Q,
         | given a se tof (Q, A). Almost none, surely, are intelligent.
         | 
         | These proxy measure of intelligence are just arguments from
         | ignorance, "I don't know how the machine computed A from Q,
         | therefore...".
         | 
         | But of course some of us do know how the machine did it; we can
         | quite easily describe the algorithm. It just turns out no one
         | wants to because its really dumb.
         | 
         | Esp. if the alg is, as in all ML, "start with
         | billions/trillions of data points in the (Q, A) space; generate
         | a compressed representation ZipQA; and for novel Q' find
         | decompressed A located close to Q similar to Q'"
         | 
         | There are no theories of intelligence which would label _that_
         | intelligence.
         | 
         | And let me say, most such "theories" are ad-hoc PR that are
         | rigged to make whatever the latest gizmo "intelligent".
         | 
         | Any plausible theory begins form the initial intuition,
         | "intelligence is what you do when you dont know what you're
         | doing".
        
           | andoando wrote:
           | I remember getting a lot of flak for saying a purely
           | statistical framework is not going to achieve human level
           | intelligence, but I still firmly believe that.
           | 
           | I also believe the path forward is research in knowledge
           | representation, and even now when I search for it, I can
           | barely find anything interesting happening in the field. ML
           | has gotten so much interest and hype because it's produced
           | fast practical results, but I think it's going to reach a
           | standstill without something fundamentally new.
        
             | mjburgess wrote:
             | I think this is effectively provable from extraordinarily
             | plausible premises.                   1. We want to infer A
             | from Q.          2. Most A we dont know, or have no data
             | for, or the data is *in the future*.         3. Most Q we
             | cannot conceptualise accurately              since we have
             | no explanatory theory in which to phrase it or to provide
             | measures of it.          4. All statistical approaches
             | require knowing frequencies of (Q, A) pairs (by def.)
             | 5. In the cases where there is a unique objective frequency
             | of (Q,A) we often cannot know it (2, 3)         6. In most
             | cases there is no unique objective frequency
             | (eg., there is no single animal any given photograph
             | corresponds to,              nor any objective frequency of
             | such association).
             | 
             | So, conclusion:
             | 
             | In most cases the statistical approach either necessarily
             | fails (its about future data; its about non-objective
             | associations; it's impossible to measure or obtain
             | objective frequences); _OR_ if it doesnt necessarily fail,
             | fails in practice (it is to expensive, or otherwise
             | impossible, to obtain the authoritative QA-frequency).
             | 
             | Now, of course, if your grift is generating nice cartoons
             | or stealing cheap copy from ebooks you can convince the
             | audience in the magical power of associating text tokens.
             | This, of course, should be ignored when addressing the
             | bigger methodological questions.
        
             | dennisy wrote:
             | I do agree here!
             | 
             | Bit of a tangent from the thread but what have been the
             | most valuable advances in knowledge representation in the
             | last 20 years? Any articles you could share would be
             | lovely!
        
               | andoando wrote:
               | I'm not expert and I don't know anything unfortunately.
               | It is something I have spent countless hours walking
               | around my room and pondering myself though for the last
               | 3-4 years. I think I have some interesting ideas and I
               | would love to get a PhD studying it, if I ever get enough
               | financial independence that I don't have to worry about
               | money.
        
             | photonthug wrote:
             | I tend to agree, and it's weird but there are probably lots
             | of actual ML practitioners that have never even heard of
             | the neat vs scruffy debate. Naturally most that have heard
             | of it will feel that the issue is completely resolved
             | already in their favor. On the whole not a very open minded
             | climate.
             | 
             | Credit where it's due for the wild success of fancy stats,
             | but we should stay interested in hybrid systems with more
             | emphasis on logic, symbols, graphs, interactions and
             | whatever other data structures seem rich and expressive.
             | 
             | Call me old school but frankly I prefer the society-of-mind
             | flavor of system should ultimately be in charge of things
             | like driving cars, running court proceedings, optimizing
             | cities or whole economies. Let it use fancy stats as
             | components and subsystems, sure.. but let it produce
             | coherent arguments or critiques that can actually be
             | understood and summarized and debugged.
        
           | thfuran wrote:
           | But does the fact that we don't really know how human
           | intelligence works actually suggest that it is categorically
           | smarter?
        
             | mjburgess wrote:
             | That's a great argument and way of reversing the argument-
             | from-ignorance line.
             | 
             | That said, I think people who argue from ignorance suppose
             | we don't know how AI works either. Since the admen selling
             | it tell them that.
             | 
             | We know exactly and precisely how AI works; we can fully
             | explain it. What we don't know are circumstantial parts of
             | the explanation (eg., what properties of the training data
             | + alg gave rise to the specific weight w1=0.01).
             | 
             | This is like knowing why the thermometer reads 21 deg C
             | (since the motion of the molecules in the water, etc. etc.)
             | -- but not knowing which molecules specifically bounced off
             | it.
             | 
             | This confusion about "what we dont know" allows the
             | prophetic tech grifter class to prognosticate in their
             | interest. Since "we dont know how AI works, then it _might_
             | work such that i 'll be very rich, so invest with me!"
             | 
             | All this said, it's a great observation.
        
           | mewpmewp2 wrote:
           | What is more special about how human intelligence works? In
           | the end we are all particles and it all could be trillions of
           | data points very simplistically interacting with each other
           | resulting in emergent behaviour and complex, intelligent
           | results.
           | 
           | We know how common molecules can interact with each other.
           | Does this mean that anything built on top of them is not
           | intelligent?
           | 
           | Everything in life could just be "statistics".
        
             | couchand wrote:
             | > In the end we are all particles
             | 
             | citation needed
        
               | ducttapecrown wrote:
               | Citation: particle physics
        
               | WhitneyLand wrote:
               | https://en.wikipedia.org/wiki/Physics
        
             | mjburgess wrote:
             | Particles interacting (causally) though a physical
             | mechanism that gives rise to say "wholes" with novel causal
             | properties _is not_ a statistical process. So your premise
             | contradicts your conclusion.
             | 
             |  _Statistics_ is an analysis of association, not of
             | causation. The frequency of (Q, A) pairs follows a
             | distribution that is not constrained, or caused by, or
             | explained by, how Q and A are actually related.
             | 
             | For example, recently there was some scandal at microsoft
             | that if you used "pro choice" in prompts you got "demonic
             | cartoons". Why? Presumably because "pro choice" are
             | _symbols_ that accompany such political cartoons in the
             | data set.
             | 
             | So does Q = "pro choice", and A = "cartoons of hell" occur
             | at notable frequency _because_ hell has caused anything? Or
             | because there 's a unique semantic mechanism where by "pro
             | choice" _means_ "hell" and so on.
             | 
             | NO.
             | 
             | It is absolutely insane to suggest that we have rigged all
             | our text output so as to align one set of symbols (Q)
             | alongside another (A) such that, Q is the necessary
             | explanation of A. I doubt this is even possible, since most
             | Qs dont have unique As -- so there is actually *no
             | function* to approximate.
             | 
             | In any case, your whole comment is an argument from
             | ignorance as I complained in mine. What you don't know
             | about life, about machines, about intelligence justifies no
             | conclusions at all (esp., "everything in life could be").
             | 
             | And let's be clear. Lots of people do know the answers to
             | your questions, they arent hard to answer. It's just not in
             | any ad companies interest to lead their description of
             | these systems by presenting good-faith research.
             | 
             | Everything printed in the media today is just a game of
             | stock manipulation using the "prognosticator loopwhole"
             | whereby the CEO of nvidia can "prophesy the future" in
             | which his hardware is "of course" essential -- without
             | being held to account for his statements. So when that
             | stock hits its ATH and crashes, no one can sue.
             | 
             | I think we should change this; remove this loophole and
             | suddenly tech boards and propagandists will be much much
             | more reserved.
        
               | mewpmewp2 wrote:
               | What could be "statistics" is our intelligence learning
               | from past events, either by natural selection in the
               | scope of generations or our brains during our lifetime.
               | If certain A outcome has occured enough times for Q
               | input, it has resulted in such a structure that is best
               | given the resources available to reach that.
        
               | mjburgess wrote:
               | Suppose you touch a fireplace _once_ , do you touch it
               | again? No.
               | 
               | OK, here's something much stranger. Suppose you see your
               | friend touch the fireplace, he recoils in pain. Do you
               | touch it? No.
               | 
               | Hmm... whence statistics? There is no frequency
               | association here, in either case. And in the second, even
               | no experience of the fireplace.
               | 
               | The entire history of science is supposed to be _about_
               | the failure of statistics to produce explanations. It is
               | a great sin that we have allowed pseudosciences to
               | flourish in which this lesson isnt even understood; and
               | worse, to allow statistical showmen with their magic
               | lanterns to preach on the scientific method. To a point
               | where it seems, almost, science as an ideal has been
               | completely lost.
               | 
               | The entire point was to throw away entirely our reliance
               | on frequency and association -- this is ancient
               | superstition. And instead, to explain the world by
               | necessary mechanisms born of causal properties which
               | interact in complex ways that can never uniquely reveal
               | themselves by direct measurement.
        
             | jofla_net wrote:
             | No argument with the particles/neurons/matter approach to
             | the subject. It is sound and if you look at us
             | compositionally there is nothing magic about whats going
             | on. There is, though, something about intuition or
             | instinctual behavior which can constantly recombine/reapply
             | itself to a task at hand. I know many will balk at
             | intuition, and maybe its only at the very best a heuristic,
             | but i think we need to at least unravel what it is and how
             | it operates before we can understand what makes something
             | classify as human-like intelligence. Is it merely executing
             | a process which we can put our minds into with practice, or
             | is it demonstrating something more general, higher-level.
        
               | fouc wrote:
               | intuition is a form of pattern matching without
               | reasoning, so kinda like LLM
        
               | mjburgess wrote:
               | Well look, compared to the electrified bits of sand in my
               | laptop i'd strongly defend pregnancy as something vastly
               | more "magical" if those are the terms we must use.
               | 
               | People who thing organic adaption, sensory-motor
               | adaption, somatosensory representation building... ie..,
               | all those things which ooze-and-grow so that a paino
               | player can play, or we can here type... are these magic?
               | 
               | Well I think it's exactly the opposite. It's a very anti-
               | intellectual nihilism that all that need be known about
               | the world is the electromagnetic properties of silicon-
               | based transitors.
               | 
               | Those who use the word "magic" in this debate are really
               | like atheists about the moon. It all sounds very smart to
               | deny the moon exists, but in the end, it's actually just
               | a lack of knowledge dressed up as enlightened cynicism.
               | 
               | There are more things to discover in a single cell of our
               | body that we have ever known; and may ever know. All the
               | theories of science needed to explain its operation would
               | exhaust every page we have ever printed. We know a
               | fraction of what we need to know.
               | 
               | And each bit of that fraction reveals an entire universe
               | of "magical" processes unreplicated by copper wires or
               | silicon switches.
        
               | mtlmtlmtlmtl wrote:
               | You make good points. I think it's a typical trait of the
               | way computer scientists and programmers tend to think.
               | Computer science has made great strides over the decades
               | through abstraction, as well as distillation of complex
               | systems into simpler properties that can easily be
               | computed.
               | 
               | As a result of the combination of this method of thinking
               | and the Dunning-Kruger effect, people in our field tend
               | to apply this to the entire world, even where it doesn't
               | fit very well, like biology, geopolitics, sociology,
               | psychology, etc.
               | 
               | You see a lot of this on HN. People who seem to think
               | they've figured out some very deep truth about another
               | field that can be explained in one hand-waving paragraph,
               | when really there are lots of important details they're
               | ignoring that make their ideas trivially wrong.
               | 
               | Economists have a similar thing going on, I feel. Though
               | I'm not an economist.
        
               | mjburgess wrote:
               | Remember always, computer science is just discrete
               | mathematics with some automatic whiteboards. It is not
               | science.
               | 
               | And that's the heart of the problem. The CSci crowd have
               | a somewhat well-motivated inclination to treat
               | abstractions as real objects of study; but have been
               | severely misdirected by learning statistics without the
               | scientific method.
               | 
               | This has created a monster: the abstract objects of study
               | _are just_ the associations statistics makes available.
               | 
               | You mix those two together and you have flat-out
               | pseudoscience.
        
               | andoando wrote:
               | Not sure I agree in this regard. We are after all, aiming
               | to create a mental model which describes reproducible
               | steps for creating general intelligence. That is, the
               | product is ultimately going to be some set of
               | abstractions or another.
               | 
               | I am not sure what more scientific method you could
               | propose. And we can, in this field produce actual
               | reproducible experiments. Really, more so than any other
               | field.
        
               | CPLX wrote:
               | As an aside both my parents are prominent economists, I
               | myself have a degree in economics, and I have spent much
               | of my life with a birds eye view of the economics
               | profession and I can emphatically confirm that your
               | feeling is correct.
        
             | sweeter wrote:
             | Its wild how this alwayyysss is the argument. Its just "oh
             | so you think humans are special!" >:| and a gigantic "what-
             | if"
             | 
             | its a purely emotional provocation and a universe sized
             | leap, not an argument for LLM's having intelligence or
             | sentience. Anything could be anything, wowww! This goes
             | back to what the other person was saying, "I cannot reason
             | about what is going on behind the curtain, therefore..."
        
               | mewpmewp2 wrote:
               | Not arguing LLMs have sentience, but more so whether
               | something that could be considered as "simplistic" as
               | "statistics" could yield in a more complex result.
               | 
               | Whether LLMs have intelligence depends on your definition
               | of intelligence.
               | 
               | Could connections of artificial neurons arranged in
               | certain way as a result of training on data yield in
               | human level intelligence?
        
               | whycombinater wrote:
               | All knowledge/predictions are encoded as a chain of
               | probabilities that something is true, otherwise, what
               | else is it? My brain calculates 0.8 * 0.3 * 0.5 * 0.6 in
               | order to make a 3-pointer, but Michael Jordan's brain ...
               | well his mitochondria does a triple back flip and inverts
               | the voltage into a tachyon particle.
        
         | epups wrote:
         | How many libraries were included in these 25 lines of code?
        
           | samatman wrote:
           | At least one, obviously, and a rather large one at that.
           | 
           | The point of the comment you replied to is that conventional
           | CV software can recognize the patterns in tests like Ravens
           | Progressive Matrices just fine, and a simple logic tree can
           | then solve them, while the LLM approach is still struggling
           | to get the same result.
           | 
           | This a commonplace shortcoming of the current generation of
           | LLMs: ironically, they often fail at tasks which computers
           | can do perfectly using conventional software.
        
           | cgearhart wrote:
           | Obviously "Python" was used...and then it used numpy because
           | the image format in the assignment was numpy arrays. However,
           | the 25 lines was basically "sum the rows and sum the columns
           | then compare those vectors" or something like that. This
           | wasn't really a case of all the complexity being hidden in a
           | dependency; it was a case of finding a _very_ simple
           | heuristic that made the problem trivial.
        
         | bee_rider wrote:
         | Could a GPT write those 20 lines of code?
        
       | GaggiX wrote:
       | Nowadays VLMs are kind of Frankenstein models where the vision
       | model is separated from the LLM and the communication is only one
       | way where the vision embeddings are projected into textual
       | embeddings and aligned; I think that this method severely limits
       | the visual capability of the model and I think that it would be
       | fixed by training a single decoder-only transformer on both
       | textual and image tokens, I think that this method would allow
       | some new incredible emergent behaviors like accurately
       | translating images of logos to SVG, actually good ascii art, more
       | meaningful openscad scripts and the ability to generate images
       | with more meaningful composition and text (for now models can
       | only generate meaningful text if it's available in the prompt,
       | this is true even with Dalle 3 as the prompt is rewritten by
       | ChatGPT).
        
       | og_kalu wrote:
       | For now, LLMs are a lot better when you express this kind of task
       | in text, preferably as a 1D arrangement. GPT-4's performance on
       | the Abstraction and Reasoning Corpus.
       | https://arxiv.org/abs/2305.18354
       | 
       | Also, this model trained to predict image tokens seemed to be
       | able to perform Raven's Matrices type questions when expressed as
       | an inpainting task in context. https://yutongbai.com/lvm.html
        
       | andoando wrote:
       | Check out the ARC benchmark. https://lab42.global/arc/
       | 
       | Iirc, humans get something like 80% on the test, and the best
       | performing AI models are at 25%
        
       | 2snakes wrote:
       | To answer the question: How far from visual deductive reasoning?
       | Consider G. Spencer Brown's Calculus of Indications. If you read
       | his book it shows how using the mark/cross is trivial for visual
       | deductions (if well-formed in translation).
       | 
       | There is also work on this from Kaufmann at University of
       | Illinois and William Bricken.
       | 
       | Though this is a bit different than Raven's matrices.
        
       ___________________________________________________________________
       (page generated 2024-03-10 23:01 UTC)