Post B2eho6OPFPhGK5EbVA by AmericanChampion@poa.st
(DIR) More posts by AmericanChampion@poa.st
(DIR) Post #B2eho6OPFPhGK5EbVA by AmericanChampion@poa.st
2026-01-25T17:33:57.459024Z
1 likes, 0 repeats
I'm expecting that this isn't a positive signal for OpenAI's position. LLMs are getting better at coding, but they're not good enough to automate frontier research programmers' code-writing work.
(DIR) Post #B2eho788VOHCbud7mS by jmw150@poa.st
2026-01-25T17:45:00.104648Z
0 likes, 0 repeats
@AmericanChampion I wish LLMs were not just predictors on what the average coder would do. I still have to do most of my stuff. But I am glad it is automating potential losers like this guy out of the meat of it.
(DIR) Post #B2eivXwHKTkNjnBwTw by BroDrillard@nicecrew.digital
2026-01-25T18:01:05.271241Z
1 likes, 0 repeats
>I don't write code anymorehis shit falling apart in 3...2...1...
(DIR) Post #B2ejB5R54TaFyC0dgu by AmericanChampion@poa.st
2026-01-25T21:31:55.734275Z
0 likes, 0 repeats
@jmw150 > I wish LLMs were not just predictors on what the average coder would do. They aren't, anymore. You can absolutely fine-tune an LLM to be better at programming than the average programmer. Read the GRPO paper for the basics on this, I'm sure there are thousands of pages on the little details that'll get you better performance out of the algorithm but the tldr is that any programmatically evaluable task can be solved via fine-tuning.The issue is that "write a novel ML research library that implements my 40 page thesis's core hypothesis" is not a programmatically evaluable problem. Too many requirements that can't be checked by unit tests alone.
(DIR) Post #B2ekzeDa7qaMHKMu4u by jmw150@poa.st
2026-01-25T21:52:16.026226Z
0 likes, 0 repeats
@AmericanChampion I tried using it for writing new mathematics in Rocq. Maybe it is not literally average, but I consider it average.If you are talking about this paper, competition math is for children, not professionals. As soon as something intelligent comes out I will be one of the first to use it.https://arxiv.org/pdf/2402.03300
(DIR) Post #B2hKNfWenhY7zoOsJE by AmericanChampion@poa.st
2026-01-27T03:38:12.211587Z
0 likes, 0 repeats
@jmw150 I'm not talking about the metric, I'm talking about the algorithm. - GPT-3 was the last 'average internet user' LLM. Everything after it uses something in addition to the standard "run supervised learning on the training set" algorithm.- 3.5 used RLHF, which leveraged reinforcement learning to shape the distribution of outputs, but didn't make it any smarter because it was used to get it to talk like an HR employee instead.- GRPO used RL to optimize towards intelligence, specifically the ability to solve any kind of computationally-verifiable problem, including coding tasks that fit in the context window. Modern LLMs no longer represent "the average programmer", is what I'm saying.
(DIR) Post #B2htIkI79V0LiuzBdA by jmw150@poa.st
2026-01-27T10:09:29.459191Z
0 likes, 0 repeats
@AmericanChampion I understood what you said.