[HN Gopher] Ask HN: Has ChatGPT gotten worse at coding for anyon...
___________________________________________________________________
Ask HN: Has ChatGPT gotten worse at coding for anyone else?
I used it for coding in Python, often with the python-docx library,
about six weeks ago, and it was superb. It gave me exactly what I
wanted, which is no mean feat for a semi-obscure little library,
and I was delighted. Then I tried it again a few weeks ago and it
did worse than before, but I thought maybe it was just bad luck.
Using it today, though, it seemed really really bad and it messed
up some very basic Python features, like the walrus operator -- it
got so bad that I gave up on it and went back to google and stack
overflow. The performance drop is so steep that I can only imagine
they crippled the model, probably to cope with the explosion in
demand. Has anyone else seen the same thing?
Author : Michelangelo11
Score : 12 points
Date : 2023-02-18 16:50 UTC (6 hours ago)
| alar44 wrote:
| [dead]
| wwwpatdelcom wrote:
| Is it possible that what you have been working on over the last
| six weeks became more specialized / less generalized and common?
| Did you start out with a prototype and then move later into
| pinned dependencies? Had you attempted using the Walrus earlier
| or just recently?
|
| My contention, which I covered in this video below here, is that
| due to the underlying statistical sampling problems inherent in
| RLHF transformers, LLM's perform poorly in edge cases, which,
| depending on the application or language, the margin of that edge
| can be super wide.
|
| Here's a video I created about it:
| https://www.youtube.com/watch?v=GMmIol4mnLo
|
| I didn't cover this yet but there are these things called,
| "scaling laws," which basically state the amount of raw text
| needed for a LLM with of a particular size of parameters. So my
| current mental model is that these, "laws," are really economic
| rules of thumb, like Moore's law is actually Moore's Rule of
| Thumb, and there is a huge expense in sampling clean data, hence
| the need for RLHF.
|
| More about RLHF if not familiar with that term yet:
| https://huggingface.co/blog/rlhf
| IronWolve wrote:
| I noticed is sometimes uses outdated code. Chatgpt admitted it
| gave me old outdated and wrong code without me telling it. it
| knew.
|
| Tried to have it write a hexchat script in perl and python,
| neither worked due old documentation being trained.
| ravi-delia wrote:
| If you're saying it "knew" because it admitted to it after you
| pointed it out, the character it's playing will bow and scrape
| so long as you correct it at all, even if you just straight up
| lie. It knows that a correction in that context ought to be
| followed by an apology, and OpenAIs tweaking is almost
| certainly involved (since it uses the same words every time).
| GranPC wrote:
| > without me telling it
| qwertox wrote:
| Yes, I noticed this too. I had it build a MongoDB aggregation
| where documents would get aggregated in hourly timeslots (compute
| temperature averages+hi+low for every hour). Two ways to do this:
| 1) convert the datetime to a YYYY-mm-dd-hh string and use it to
| group the documents, or 2) use a Unix timestamp and do some math
| on it.
|
| I was already using 2) in some projects, so I wanted to check if
| it was able to do this.
|
| It first suggested 1), then I told it to make it more efficient
| by avoiding string so it gave me 2). Wow.
|
| That was around 3-4 weeks ago. When I tried it again this week,
| it would only output 1) and it wasn't able to make the move to 2)
| anymore by telling it to not use strings. It kept using them.
| crawsome wrote:
| I always noticed it made-up function calls or got them wrong for
| GMS2 all the time. (Gamemaker Studio2) but the structure was
| often correct.
| speedgoose wrote:
| If you subscribe to the paid version you can use the slower and
| perhaps better "legacy" model that was available a few times
| everyday until a few weeks ago.
| Michelangelo11 wrote:
| Hmm, thanks, yeah, I forgot about the paid version. They might
| have just crippled the free version...
| MrLeap wrote:
| I asked it for an HLSL shader to raymarch a cloud and it
| basically handed me a copy/paste of the top result off shadertoy
| changed just enough to be broken. Kept the indentation and the
| magic constants unchanged though!
|
| The more niche the ask the less... transformative/uniquely
| generative its model is, and the less reliable.
| Michelangelo11 wrote:
| Right, absolutely. But the level of nicheness was largely the
| same over time or, if anything, went down (the walrus operator
| in Python isn't very niche at all).
___________________________________________________________________
(page generated 2023-02-18 23:02 UTC)