hngopher.com

       [HN Gopher] Ask HN: Has ChatGPT gotten worse at coding for anyon...
       ___________________________________________________________________
        
       Ask HN: Has ChatGPT gotten worse at coding for anyone else?
        
       I used it for coding in Python, often with the python-docx library,
       about six weeks ago, and it was superb. It gave me exactly what I
       wanted, which is no mean feat for a semi-obscure little library,
       and I was delighted. Then I tried it again a few weeks ago and it
       did worse than before, but I thought maybe it was just bad luck.
       Using it today, though, it seemed really really bad and it messed
       up some very basic Python features, like the walrus operator -- it
       got so bad that I gave up on it and went back to google and stack
       overflow.  The performance drop is so steep that I can only imagine
       they crippled the model, probably to cope with the explosion in
       demand. Has anyone else seen the same thing?
        
       Author : Michelangelo11
       Score  : 12 points
       Date   : 2023-02-18 16:50 UTC (6 hours ago)
        
       | alar44 wrote:
       | [dead]
        
       | wwwpatdelcom wrote:
       | Is it possible that what you have been working on over the last
       | six weeks became more specialized / less generalized and common?
       | Did you start out with a prototype and then move later into
       | pinned dependencies? Had you attempted using the Walrus earlier
       | or just recently?
       | 
       | My contention, which I covered in this video below here, is that
       | due to the underlying statistical sampling problems inherent in
       | RLHF transformers, LLM's perform poorly in edge cases, which,
       | depending on the application or language, the margin of that edge
       | can be super wide.
       | 
       | Here's a video I created about it:
       | https://www.youtube.com/watch?v=GMmIol4mnLo
       | 
       | I didn't cover this yet but there are these things called,
       | "scaling laws," which basically state the amount of raw text
       | needed for a LLM with of a particular size of parameters. So my
       | current mental model is that these, "laws," are really economic
       | rules of thumb, like Moore's law is actually Moore's Rule of
       | Thumb, and there is a huge expense in sampling clean data, hence
       | the need for RLHF.
       | 
       | More about RLHF if not familiar with that term yet:
       | https://huggingface.co/blog/rlhf
        
       | IronWolve wrote:
       | I noticed is sometimes uses outdated code. Chatgpt admitted it
       | gave me old outdated and wrong code without me telling it. it
       | knew.
       | 
       | Tried to have it write a hexchat script in perl and python,
       | neither worked due old documentation being trained.
        
         | ravi-delia wrote:
         | If you're saying it "knew" because it admitted to it after you
         | pointed it out, the character it's playing will bow and scrape
         | so long as you correct it at all, even if you just straight up
         | lie. It knows that a correction in that context ought to be
         | followed by an apology, and OpenAIs tweaking is almost
         | certainly involved (since it uses the same words every time).
        
           | GranPC wrote:
           | > without me telling it
        
       | qwertox wrote:
       | Yes, I noticed this too. I had it build a MongoDB aggregation
       | where documents would get aggregated in hourly timeslots (compute
       | temperature averages+hi+low for every hour). Two ways to do this:
       | 1) convert the datetime to a YYYY-mm-dd-hh string and use it to
       | group the documents, or 2) use a Unix timestamp and do some math
       | on it.
       | 
       | I was already using 2) in some projects, so I wanted to check if
       | it was able to do this.
       | 
       | It first suggested 1), then I told it to make it more efficient
       | by avoiding string so it gave me 2). Wow.
       | 
       | That was around 3-4 weeks ago. When I tried it again this week,
       | it would only output 1) and it wasn't able to make the move to 2)
       | anymore by telling it to not use strings. It kept using them.
        
       | crawsome wrote:
       | I always noticed it made-up function calls or got them wrong for
       | GMS2 all the time. (Gamemaker Studio2) but the structure was
       | often correct.
        
       | speedgoose wrote:
       | If you subscribe to the paid version you can use the slower and
       | perhaps better "legacy" model that was available a few times
       | everyday until a few weeks ago.
        
         | Michelangelo11 wrote:
         | Hmm, thanks, yeah, I forgot about the paid version. They might
         | have just crippled the free version...
        
       | MrLeap wrote:
       | I asked it for an HLSL shader to raymarch a cloud and it
       | basically handed me a copy/paste of the top result off shadertoy
       | changed just enough to be broken. Kept the indentation and the
       | magic constants unchanged though!
       | 
       | The more niche the ask the less... transformative/uniquely
       | generative its model is, and the less reliable.
        
         | Michelangelo11 wrote:
         | Right, absolutely. But the level of nicheness was largely the
         | same over time or, if anything, went down (the walrus operator
         | in Python isn't very niche at all).
        
       ___________________________________________________________________
       (page generated 2023-02-18 23:02 UTC)