Post Aby8TLAhpv5afsNU6C by sten@chaos.social
(DIR) More posts by sten@chaos.social
(DIR) Post #Aby1zjCXKbPQmvjLxw by mttaggart@infosec.town
2023-11-19T14:35:32.782Z
0 likes, 0 repeats
Nope, still can't reason.We found that our more informative one-shot prompt improved GPT-4’s performance in the text case, but its performance remained well below that of humans and the special-purpose Kaggle-ARC program. We also found that giving minimal tasks as images to the multimodal GPT-4 resulted in substantially worse performance than in the text-only case. Our results support the hypothesis that GPT-4, perhaps the most capable “general” LLM currenly available, is still not able to robustly form abstractions and reason about basic core concepts in contexts not previously seen in its training data. It is possible that other methods of prompting or task representation would increase the performance of GPT-4 and GPT-4V; this is a topic for future research. https://arxiv.org/abs/2311.09247
(DIR) Post #Aby8TLAhpv5afsNU6C by sten@chaos.social
2023-11-19T15:42:38Z
1 likes, 0 repeats
@mttaggart I'm not sure what these experiments are supposed to show. Unless I deeply misunderstand how these programs are built (a distinct possibility), this kind of program will never be able to reason.
(DIR) Post #Aby8WTtlqYPMboxDH6 by mttaggart@infosec.town
2023-11-19T15:48:42.139Z
0 likes, 0 repeats
@sten@chaos.social That is correct. But demonstrating that with empirical evidence has value.
(DIR) Post #Aby8Xw3octv4F2LVQW by DaveMWilburn@infosec.exchange
2023-11-19T15:44:53Z
1 likes, 0 repeats
@mttaggart It's disappointing that people are so credulous about the supposed reasoning ability of fancy autocomplete. But I do think there's some limited promise in multimodal learning, or alternatively and perhaps more simply building interfaces between different models that specialize in different tasks. For instance, stop trying to make the LLM do math, and instead use the LLM to recognize a math problem from input, extract the relevant math features from the underlying unstructured natural language input, coax those features from the unstructured text inputs into something like symbolic logic, feed that symbolic logic into something external that can actually do math, and then take those results and spit them back to the user as text output. Like... maybe something closer to WolframAlpha's approach. I'd still hesitate to call any of that "reasoning", but just because it isn't "reasoning" doesn't mean it can't be useful. With just a bit more imagination and work, I suspect these things could at least be more useful. Or at least something more useful than for parlor tricks like we're seeing today. I'll readily admit this is far outside my area of expertise, though.
(DIR) Post #Aby8f2KLj2aLhjSvHk by mttaggart@infosec.town
2023-11-19T15:50:14.488Z
0 likes, 0 repeats
@DaveMWilburn@infosec.exchange I think there's value in what you describe too, which is active NLP, and less a Markov chain with a hat on it. Guess which one is harder?