fsebugoutzone.org:9999

       Post B2eho6OPFPhGK5EbVA by AmericanChampion@poa.st
 (DIR) More posts by AmericanChampion@poa.st
 (DIR) Post #B2eho6OPFPhGK5EbVA by AmericanChampion@poa.st
       2026-01-25T17:33:57.459024Z
       
       1 likes, 0 repeats
       
       I&#39;m expecting that this isn&#39;t a positive signal for OpenAI&#39;s position. LLMs are getting better at coding, but they&#39;re not good enough to automate frontier research programmers&#39; code-writing work.
       
 (DIR) Post #B2eho788VOHCbud7mS by jmw150@poa.st
       2026-01-25T17:45:00.104648Z
       
       0 likes, 0 repeats
       
       @AmericanChampion I wish LLMs were not just predictors on what the average coder would do. I still have to do most of my stuff. But I am glad it is automating potential losers like this guy out of the meat of it.
       
 (DIR) Post #B2eivXwHKTkNjnBwTw by BroDrillard@nicecrew.digital
       2026-01-25T18:01:05.271241Z
       
       1 likes, 0 repeats
       
       &gt;I don&#39;t write code anymorehis shit falling apart in 3...2...1...
       
 (DIR) Post #B2ejB5R54TaFyC0dgu by AmericanChampion@poa.st
       2026-01-25T21:31:55.734275Z
       
       0 likes, 0 repeats
       
       @jmw150 &gt; I wish LLMs were not just predictors on what the average coder would do. They aren&#39;t, anymore. You can absolutely fine-tune an LLM to be better at programming than the average programmer. Read the GRPO paper for the basics on this, I&#39;m sure there are thousands of pages on the little details that&#39;ll get you better performance out of the algorithm but the tldr is that any programmatically evaluable task can be solved via fine-tuning.The issue is that &quot;write a novel ML research library that implements my 40 page thesis&#39;s core hypothesis&quot; is not a programmatically evaluable problem. Too many requirements that can&#39;t be checked by unit tests alone.
       
 (DIR) Post #B2ekzeDa7qaMHKMu4u by jmw150@poa.st
       2026-01-25T21:52:16.026226Z
       
       0 likes, 0 repeats
       
       @AmericanChampion I tried using it for writing new mathematics in Rocq. Maybe it is not literally average, but I consider it average.If you are talking about this paper, competition math is for children, not professionals. As soon as something intelligent comes out I will be one of the first to use it.https://arxiv.org/pdf/2402.03300
       
 (DIR) Post #B2hKNfWenhY7zoOsJE by AmericanChampion@poa.st
       2026-01-27T03:38:12.211587Z
       
       0 likes, 0 repeats
       
       @jmw150 I&#39;m not talking about the metric, I&#39;m talking about the algorithm. - GPT-3 was the last &#39;average internet user&#39; LLM. Everything after it uses something in addition to the standard &quot;run supervised learning on the training set&quot; algorithm.- 3.5 used RLHF, which leveraged reinforcement learning to shape the distribution of outputs, but didn&#39;t make it any smarter because it was used to get it to talk like an HR employee instead.- GRPO used RL to optimize towards intelligence, specifically the ability to solve any kind of computationally-verifiable problem, including coding tasks that fit in the context window. Modern LLMs no longer represent &quot;the average programmer&quot;, is what I&#39;m saying.
       
 (DIR) Post #B2htIkI79V0LiuzBdA by jmw150@poa.st
       2026-01-27T10:09:29.459191Z
       
       0 likes, 0 repeats
       
       @AmericanChampion I understood what you said.