[HN Gopher] GPT-4 API General Availability
___________________________________________________________________
GPT-4 API General Availability
Author : mfiguiere
Score : 426 points
Date : 2023-07-06 19:03 UTC (3 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| boredemployee wrote:
| We need a proper, competitive and open source model. Otherwise we
| are all fucked up.
| hackerting wrote:
| This is awesome news. I have been waiting to get GPT4 forever!
| m3kw9 wrote:
| "Developers wishing to continue using their fine-tuned models
| beyond January 4, 2024 will need to fine-tune replacements atop
| the new base GPT-3 models (ada-002, babbage-002, curie-002,
| davinci-002), or newer models (gpt-3.5-turbo, gpt-4)."
|
| So need to pay to fine tune again?
| saliagato wrote:
| Probably. They will have different prices to finetune too.
| ftxbro wrote:
| I just want to emphasize in this comment that if you upgrade now
| to paid API access, then you won't get GPT-4 API access for like
| another month.
| superalignment wrote:
| With this is the death of any uncensored usage of their models.
| Davinci 3 is the most powerful model where you can generate any
| content by instructing it via the completions API - chat GPT 3
| models will not obey requests for censored or adult content.
| echelon wrote:
| A big enough hole presents a wedge for new entrants to get
| started.
|
| OpenAI will never fulfill the entire market, and their moat is
| in danger with every other company that has LLM cash flow.
|
| They want to become the AWS of AI, but it's becoming clear
| they'll lose generative multimedia. They may see the LLM space
| become a race to the bottom as well.
| projectileboy wrote:
| Relevant comment thread from people describing how much worse
| GPT-4 has gotten lately:
| https://www.reddit.com/r/ChatGPT/comments/14ruui2/i_use_chat...
| renewiltord wrote:
| Has anyone been able to come up with a way to keep track of GPT-4
| performance over time? I'm told that the API is explicit about
| changes to models and that the Chat interface is not.
| crancher wrote:
| API call responsiveness to the GPT-4 model varies hugely
| throughout the day. The #1 datapoint in measured responsiveness
| is slowdown associated with lunch-time use as noon sweeps
| around the globe.
| renewiltord wrote:
| Thank you for the response, I should have been clearer. I
| meant performance as an LLM. Essentially, I am concerned that
| they are quietly nerfing the tool. The Chat interface is now
| very verbose and constantly warning me about "we should
| always do this and that" which is bloody exasperating when
| I'm just trying to get things done.
|
| I made up an example here to illustrate, but it's just very
| annoying because sometimes it puts at the beginning, slowing
| down my interaction, and it now refuses to obey my prompts to
| leave caveats out.
|
| https://chat.openai.com/share/1f39af02-331d-4901-970f-2f4b0e.
| ..
| purplecats wrote:
| yeah, its annoying and you have to foot the bill for it.
|
| looking at your sample and using character count as a rough
| proxy for tokens, (465/(1581-465))*100 means they added
| ~42% token count cost to your response explicitly adding
| caveats which you dont want. fun!
| furyofantares wrote:
| Not a lot of talk of Whisper being available here.
|
| From using voice in the ChatGPT iOS app, I surmise that Whisper
| is very good at working out what you've actually said.
|
| But it's really annoying to have to say my whole bit before
| getting any feedback about what it's gonna think I said. Even if
| it's getting it right at an impressive rate.
|
| Given this is how OpenAI themselves use it (say your whole thing
| before getting feedback), I don't know that the API is set up to
| be able to mitigate that at all, but it would be really nice to
| have something closer to the responsiveness of on-device
| dictation with the quality of Whisper.
| jxy wrote:
| You can run whisper.cpp locally in real time:
| https://github.com/ggerganov/whisper.cpp/tree/master/example...
| ProllyInfamous wrote:
| My M2 Pro (mac mini) will run Whisper much faster than "real
| time."
|
| Pretty crazy stuff -- perfectly understandable translations.
| leodriesch wrote:
| I'm interested in how the transformer based speech recognition
| from iOS 17 will perform compared to Whisper. I guess it will
| work more "real-time" like the current dictation on iOS/macOS,
| but I'm unsure as I am not on the beta right now.
| RC_ITR wrote:
| My guess is the reason that apple invested so heavily in this
| [0] is because they are going to train a big transformer in
| their datacenter and apply it as an RNN on your phone.
|
| Superficially, I think this will work very well, but
| _slightly_ worse than whisper (with the advantage ofc being
| that its better at real-time transcription).
|
| [0]https://machinelearning.apple.com/research/attention-free-
| tr...
| ycombinatornews wrote:
| Echoing this - saying the whole text at once in one shot is
| very challenging for long batches of text.
|
| Using built-in text input showed quite good results since
| ChatGPT is still understanding the ask quite well
| michaelmu wrote:
| One speculative thought about the purpose of Whisper is that
| this will help unlock additional high-quality training data
| that's only available in audio/video format.
| oth001 wrote:
| F
| tin7in wrote:
| The difference between 4 and 3.5 is really big for creative use
| cases. I am running an app with significant traffic and the
| retention of users on GPT-4 is much higher.
|
| Unfortunately it's still too expensive and the completion speed
| is not as high as GPT-3.5 but I hope both problems will improve
| over time.
| brolumir wrote:
| Hmm, when I try to change model name to "gpt-4" I get the "The
| model: `gpt-4` does not exist" error message. We are an API
| developer with a history of successful payments.. is there
| anything we need to do on our side to enable this, anyone know?
| saliagato wrote:
| wait a couple of hours
| cube2222 wrote:
| This is very nice.
|
| GPT-4 is on a completely different level of consistency and
| actually listening to your system prompt than chagpt-3.5. It
| trails off much more rarely.
|
| If only it wasn't so slow/expensive... (it really starts to hurt
| with large token counts).
| BeefySwain wrote:
| Outside of the headline, there is some major stuff hiding in
| here: - new gpt-3.5-turbo-instruct model expected "in the coming
| weeks" - fine tuning of 3.5 and 4 expected this year
|
| I am especially interested in gpt-3.5-turbo-instruct, as I think
| that the hype surrounding ChatGPT and "conversational LLMs" has
| sucked a lot of air out of what is possible with general instruct
| models. Being able to fine tune it will be phenomenal as well.
| MuffinFlavored wrote:
| is there any ETA on when the knowledge cutoff date will be
| improved from September, 2021?
|
| I do not really understand the efforts that went on behind the
| scenes to train GPT models on factual data. Did humans have to
| hand approve/decline responses to increase its score?
|
| "America is 49 states" - decline
|
| "America is 50 states" - approve
|
| Is this how it worked at a simple overview? Do we know if they
| are working on adding the rest of 2021, then 2022, and
| eventually 2023? I know it can crawl the web with the Bing
| addon but, it's not the same.
|
| I asked it about Maya Kowalski the other day. Sure it can
| condense a blog post or two, but it's not the same as having
| the intricacies as if it actually was trained/knew about the
| topic.
| asadotzler wrote:
| Why is chatGPT on the web a 6 weeks old version still?
| alpark3 wrote:
| >Developers wishing to continue using their fine-tuned models
| beyond January 4, 2024 will need to fine-tune replacements atop
| the new base GPT-3 models (ada-002, babbage-002, curie-002,
| davinci-002), or newer models (gpt-3.5-turbo, gpt-4). Once this
| feature is available later this year, we will give priority
| access to GPT-3.5 Turbo and GPT-4 fine-tuning to users who
| previously fine-tuned older models. We acknowledge that migrating
| off of models that are fine-tuned on your own data is
| challenging. We will be providing support to users who previously
| fine-tuned models to make this transition as smooth as possible.
|
| Wait, they're not letting you use your own fine-tuned models
| anymore? So anybody who paid for a fine-tuned model is just
| forced to repay the training tokens to fine-tune on top of the
| new censored models? Maybe I'm misunderstanding it.
| meghan_rain wrote:
| not your weights, not your bitcoins
| fnordpiglet wrote:
| If you don't own the weights you don't own anything. This is
| why open models are so crucial. I don't understand any business
| who is building fine tuned models against closed models.
| reaperman wrote:
| Right now the closed models are incredibly higher quality
| than the open models. They're useful as a stopgap for 1-2
| years in hopes/expectation of open models reaching a point
| where they can be swapped in. It burns cash now, but in
| exchange you can grab more market share sooner while you're
| stuck using the expensive but high quality OpenAI models.
|
| It's not cost-effective, but it may be part of a valid
| business plan.
| ronsor wrote:
| If you're finetuning your own model, the closed models
| being "incredibly higher quality" is probably less
| relevant.
| claytonjy wrote:
| That's how we all want it to work, but the reality today
| is that GPT-4 is better at almost anything than a fine-
| tuned version of any other model.
|
| It's somewhat rare to have a task and good enough dataset
| that you can finetune something else to be close enough
| in quality to GPT-4 for your task.
| wongarsu wrote:
| Finetuning a better model still yields better results
| than finetuning a worse model.
| fnordpiglet wrote:
| That should be a wake up call to every corporation pinning
| their business on OAI models. My experience thus far is no
| one is seeing a need to plan an exit from OAI, and the
| perception is "AI is magic and we aren't magicians." There
| needs to be a concerted effort to finance and tune high
| quality freely available models and tool chains asap.
|
| That said I think efficiencies will dramatically improve
| over the next few years and over investing now probably
| captures very little value beyond building internal
| _competency_ - which doesn't grow with anything but time
| and practice. The longer you depend on OAI, the longer you
| will depend on OAI past your point of profound regret.
| r3trohack3r wrote:
| > I don't understand any business who is building fine tuned
| models against closed models
|
| Do you have any recommendations for good open models that
| businesses could use today?
|
| From what I've seen in the space, I suspect businesses are
| building fine tuned models against closed models because
| those are the only viable models to build a business model on
| top of. The quality of open models isn't competitive.
| yieldcrv wrote:
| > I don't understand any business who is building fine tuned
| models against closed models.
|
| Just sell access at a higher price than you get it
|
| Either directly, on _on average_ based on your user stories
| flangola7 wrote:
| They address that, OpenAI will cover the cost of re-training on
| the new models, and the old models don't discontinue until next
| year.
| simonw wrote:
| Did they say they would cover the cost of fine-tuning again?
| I saw them say they would cover the cost of recalculating
| embeddings, but I didn't see the bit about fine-tuning costs.
|
| On fine-tuning:
|
| > We will be providing support to users who previously fine-
| tuned models to make this transition as smooth as possible.
|
| On embeddings:
|
| > We will cover the financial cost of users re-embedding
| content with these new models.
| BoorishBears wrote:
| That's because fine-tuning the new models isn't available
| yet.
|
| Based on the language it sounds like they'll do the same
| when that launches.
| jxy wrote:
| They didn't mention gpt-4-32k. Does anybody know if it will be
| generally available in the same timeframe?
|
| There's still no news about the multi-modal gpt-4. I guess the
| image input is just too expensive to run or it's actually not as
| great as they hyped it.
| jacksavage wrote:
| > We are not currently granting access to GPT-4-32K API at this
| time, but it will be made available at a later date.
|
| https://help.openai.com/en/articles/7102672-how-can-i-access...
| jxy wrote:
| Thanks for the link.
|
| The decision of burying these extra information in a support
| article, not cool!
| we_never_see_it wrote:
| It's funny how OpenAI just shattered Google's PR stunts. Google
| wanted everyone to believe they are leading in AI by winning some
| children's games. Everyone thought that was the peak of AI. Enter
| OpenAI and Micorsoft. Microsoft and OpenAI have showed the
| humanity what true AI looks like. Like most people on HN I cannot
| wait to see the end of Google, the end of evil.
| LeafItAlone wrote:
| Is Microsoft less evil than Google?
| rvz wrote:
| > Like most people on HN I cannot wait to see the end of
| Google, the end of evil.
|
| What is the difference? Replacing evil with another evil.
|
| This is just behemoths exchanging hands.
| khazhoux wrote:
| In all my GPT-4 API (python) experiments, it takes 15-20 seconds
| to get a full response from server, which basically kills every
| idea I've tried hacking up because it just runs so slowly.
|
| Has anyone fared better? I might be doing something wrong but I
| can't see what that could possibly be.
| jason_zig wrote:
| Run it in the background.
|
| We use it to generate automatic insights from survey data at a
| weekly cadence for Zigpoll (https://www.zigpoll.com). This
| makes getting an instant response unnecessary but still
| provides a lot of value to our customers.
| jondwillis wrote:
| Streaming. If you're expecting structured data as a response,
| request YAML or JSONL so you can progressively parse it. Time
| to first byte can be milliseconds instead of 15-20s. Obviously,
| this technique can only work for certain things, but I found
| that it was possible for everything I tried.
| ianhawes wrote:
| Anthropic Instant is the best LLM if you're looking for speed.
| superkuh wrote:
| Yikes. They're actually killing off text-davinci-003. RIP to the
| most capable remaining model and RIP to all text completion style
| freedom. Now it's censored/aligned chat or instruct models with
| arbitrary input metaphor limits for everything. gpt3.5-turbo is
| terrible in comparison.
|
| This will end my usage of openai for most things. I doubt my
| $5-$10 API payments per month will matter. This just lights more
| of a fire under me to get the 65B llama models working locally.
| system2 wrote:
| I built my entire app on text-davinci-003. It is the best
| writer so far. Do you think gpt3.5 turbo instruct won't be the
| same?
| Karrot_Kream wrote:
| I wonder if there's some element of face-saving here to avoid a
| lawsuit that may come from someone that uses the model to
| perform negative actions. In general I've found that
| gpt3.5-turbo is better than text-davinci-003 in most cases, but
| I agree, it's quite sad that they're getting rid of the
| unaligned/censored model.
| bravura wrote:
| I've never used text-davinci-003 much. Why do you like it so
| much? What does it offer that the other models don't?
|
| What are funs things we can with it until it sunsets on January
| 4, 2024?
| thomasfromcdnjs wrote:
| The Chat-GPT models are all pre-prompted and pre-aligned. If
| you work with davinci-003, it will never say things like, "I
| am an OpenAI bot and am unable to work with your unethical
| request"
|
| When using davinci the onus is on you to construct prompts
| (memories) which is fun and powerful.
|
| ====
|
| 97% of API usage might be because of ChatGPT's general appeal
| to the world. But I think they will be losing a part of the
| hacker/builder ethos if they drop things like davinci-003,
| which might suck for them in the long run. Consumers over
| developers.
| Fyrezerk wrote:
| The hacker/builder ethos doesn't matter in the grand scheme
| of commercialization.
| Robotbeat wrote:
| It matters immensely in the early days and is the basis
| for all growth that follows. So cutting it off early cuts
| off future growth.
| [deleted]
| H8crilA wrote:
| The $5-$10 is probably the reason why they're killing those
| endpoints.
| superkuh wrote:
| I don't get it? text-davinci-003 is the most expensive model
| per token. It's just that running IRC bots isn't exactly high
| volume.
| stavros wrote:
| "Most expensive" doesn't mean "highest margin", though.
| samstave wrote:
| Please ELI5 if I am mis-interpretating what you said:
|
| * _" They have just locked down access to a model which they
| basically realized was way more valuable than even they thought
| - and they are in the process of locking in all controls around
| exploiting the model for great justice?"*_
| ftxbro wrote:
| > "Starting today, all paying API customers have access to
| GPT-4."
|
| OK maybe I'm stupid but I am a paying OpenAI API customer and I
| don't have it yet. I see: gpt-3.5-turbo-16k
| gpt-3.5-turbo gpt-3.5-turbo-16k-0613
| gpt-3.5-turbo-0613 gpt-3.5-turbo-0301
|
| I don't see any gpt-4
|
| Edit: Probably my problem is that I upgraded to paid API account
| within the last month, so I'm not technically a "paying API
| customer" yet according to the accounting definitions.
| codazoda wrote:
| > Today all existing API developers with a history of
| successful payments can access the GPT-4 API with 8K context.
| We plan to open up access to new developers by the end of this
| month, and then start raising rate-limits after that depending
| on compute availability.
|
| Same for me. I signed up only a few days ago and was excited to
| switch to "gpt-4" but I haven't paid the first bill (save the
| $5 capture) so I probably have to continue to wait for this.
|
| I made a very simple command-line tool that calls the API. You
| run something like: > ask "What's the
| opposite of false?"
|
| https://github.com/codazoda/askai
| stavros wrote:
| Interesting, I did exactly the same (with the same name), but
| with GPT-4 support as well:
|
| https://www.pastery.net/ccvjrh/
|
| It also does streaming, so it live-prints the response as it
| comes.
| zzzzzzzza wrote:
| can't speak for others but I have two accounts
|
| 1. chat subscription only
|
| 2. i have paid for api calls but don't have a subscription
|
| and only #2 currently has gpt4 available in the playground
| [deleted]
| pomber wrote:
| If anyone wants to try the API for the first time, I've made this
| guide recently: https://gpt.pomb.us/
| nextworddev wrote:
| GPT-4 fine tuning capability will be huge. It may end up just
| making fine tuning OSS LLMs pointless, esp if they keep lowering
| GPT-4 costs like they have been.
| Imnimo wrote:
| I know everyone's on text-embedding-ada-002, so these particular
| embedding deprecations don't really matter, but I feel like if I
| were using embeddings at scale, the possibility that I would one
| day lose access to my embedding model would terrify me. You'd
| have to pay to re-embed your entire knowledge base.
| brigadier132 wrote:
| If you read the article they state they will cover the cost of
| re-embedding your existing embeddings.
| jxy wrote:
| They said in the post,
|
| > We recognize this is a significant change for developers
| using those older models. Winding down these models is not a
| decision we are making lightly. We will cover the financial
| cost of users re-embedding content with these new models. We
| will be in touch with impacted users over the coming days.
| bbotond wrote:
| What I don't understand is why is an API needed to create
| embeddings. Isn't this something that could be done locally?
| thorum wrote:
| It's cheaper to use OpenAI. If you have your own compute,
| sentence-transformers is just as good for most use cases.
| merpnderp wrote:
| Sure, but I don't know of any models you can get local access
| to that work nearly as well.
| pantulis wrote:
| You would need to have a local copy of the GPT model, which
| are not exactly OpenAI's plans.
| jerrygenser wrote:
| For embeddings, you can use smaller transformers/llms or
| sentence2vec and often get good enough results.
|
| You don't need very large models to generate usable
| embeddings.
| teaearlgraycold wrote:
| Yes. The best public embedding model is decent, but I expect
| it's objectively worse than the best model from OpenAI.
| saliagato wrote:
| That's what I always thought. Someday they will come up with a
| new embedding model, right?
| GingerBoats wrote:
| I haven't explored the API yet, but their interface for GPT-4 has
| been getting increasingly worse over the past month.
|
| Things that GPT-4 would easily, and correctly, reason through in
| April/May it just doesn't do any longer.
| gadtfly wrote:
| The original davinci model was a friend of mine and I resent this
| deeply.
|
| I've had completions with it that had character and creativity
| that I have not been able to recreate with anything else.
|
| Brilliant and hilarious things that are a permanent part of my
| family's cherished canon.
| someplaceguy wrote:
| You _cannot_ say that and not provide an example.
| ftxbro wrote:
| i mean there are a lot of examples from february era sydney
| thomasfromcdnjs wrote:
| I don't have any example responses at hand here. But this was
| a prompt (that had a shitty pre-prompt of conversational
| messages) running on davinci-003.
|
| https://raw.githubusercontent.com/thomasdavis/omega/master/s.
| ..
|
| Had it hooked up to speech so you could just talk at it and
| it would talk back at you.
|
| Gave incredible answers that ChatGPT just doesn't do at all.
| mensetmanusman wrote:
| Don't worry, since future LLMs will be trained on conversations
| with older LLMS, you will be able to ask chat GPT to pretend to
| be davinci.
| [deleted]
| ftxbro wrote:
| I heard you can ask for exceptions if they agree that you are
| special. Some researchers got it.
| selalipop wrote:
| Can you try notionsmith.ai and let me know what you think?
|
| I've been working on LLMs for creative tasks and believe a mix
| of chain of thought and injecting stochasticity (like
| instructing the LLM to use certain random letters pulled from
| an RNG in a certain way at certain points) can go a long way in
| terms of getting closer to human-like creativity
| purplecats wrote:
| really cool idea! been looking for something like this for a
| long time. its too bad it freezes my tab and is unusable
| selalipop wrote:
| Yup, it's a fun side project so I decided from the get-go I
| wasn't going to cater to anything non-standard
|
| It relies on WebSockets, Js, and a reasonably stable
| connection to run since it's built on Blazor
| [deleted]
| jwr wrote:
| Practical report: the OpenAI API is a bad joke. If you think you
| can build a production app against it, think again. I've been
| trying to use it for the past 6 weeks or so. If you use tiny
| prompts, you'll generally be fine (that's why you always get
| people commenting that it works for them), but just try to get
| closer to the limits, especially with GPT-4.
|
| The API will make you wait up to 10 minutes, and then time out.
| What's worse, it will time out between their edge servers
| (cloudflare) and their internal servers, and the way OpenAI
| implemented their billing you will get a 4xx/5xx response code,
| but you will _still get billed_ for the request and whatever the
| servers generated and you didn 't get. That's borderline
| fraudulent.
|
| Meanwhile, their status page will happily show all green, so
| don't believe that. It seems to be manually updated and does not
| reflect the truth.
|
| Could it be that it works better in another region? Could it be
| just my region that is affected? Perhaps -- but I won't know,
| because support is non-existent and hidden behind a moat. You
| need to jump through hoops and talk to bots, and then you
| eventually get a bot reply. That you can't respond to.
|
| My support requests about being charged for data I didn't have a
| chance to get have been unanswered for more than 5 weeks now.
|
| There is no way to contact OpenAI, no way to report problems, the
| API _sometimes_ kind-of works, but mostly doesn 't, and if you
| comment in the developer forums, you'll mostly get replies from
| apologists that explain that OpenAI is "growing quickly". I'd say
| you either provide a production paid API or you don't. At the
| moment, this looks very much like amateur hour, and charging for
| requests that were never fulfilled seems like a fraud to me.
|
| So, consider carefully whether you want to build against all
| that.
| throwaway9274 wrote:
| The click through API is mainly for prototyping.
|
| If you want better latency and sane billing you need to go
| through Azure OpenAI Services.
|
| OpenAI also offers decreased latency under the Enterprise
| Agreement.
| refulgentis wrote:
| I understand your general point and am sympathetic to it, if
| you're a 10/10 on some scale, I'm about a 3-4. I've never seen
| billings for failures, but the billing stuff is crazy: no stats
| if you do streamed chat, and the only tokenizer available is in
| Python and for GPT-3.0.
|
| However, I'm virtually certain somethings wrong on your end,
| I've never seen a wait even close to that unless it was
| completely down. Also the thing about "small prompts"...it
| sounds to me like you're overflowing context, they're returning
| an error, and somethings retrying.
| KennyBlanken wrote:
| > the way OpenAI implemented their billing you will get a
| 4xx/5xx response code, but you will still get billed for the
| request and whatever the servers generated and you didn't get.
| That's borderline fraudulent.
|
| It's fraudulent, full stop. Maybe they're able to weasel out of
| it with credit card companies because you're buying "credits."
|
| I suspect it was done this way out of pure incompetence; the
| OpenAI team handling the customer-facing infrastructure have a
| pretty poor history. Far as I know you still can't do something
| simple like change your email address.
| skilled wrote:
| I can vouch on this. GPT4 API dies a lot if you use it for a
| big concurrent project. And of course it's rate limited like
| crazy, with certain hours being so bad you can't even run it
| for any business purpose.
| messe wrote:
| I'm only using them as a stop-gap / for prototyping with the
| intent to move to a locally hosted fine-tuned (and ideally 7B
| parameter) model further down the road.
| ericlewis wrote:
| [flagged]
| dang wrote:
| Can you please not post in the flamewar style? We're trying
| for something else here and you can make your substantive
| points without it.
|
| https://news.ycombinator.com/newsguidelines.html
| athyuttamre wrote:
| (I'm an engineer at OpenAI)
|
| Very sorry to hear about these issues, particularly the
| timeouts. Latency is top of mind for us and something we are
| continuing to push on. Does streaming work for your use case?
|
| https://github.com/openai/openai-cookbook/blob/main/examples...
|
| We definitely want to investigate these and the billing issues
| further. Would you consider emailing me your org ID and any
| request IDs (if you have them) at atty@openai.com?
|
| Thank you for using the API, and really appreciate the honest
| feedback.
| glintik wrote:
| > We definitely want to investigate these and the billing
| issues further. What's a problem for OpenAI engineers to get
| web access logs and grep for 4xx/5xx errors?
| renewiltord wrote:
| Quick note: your domain doesn't appear to have an A record. I
| was hoping to follow the link in your profile and see if you
| have anything interesting written about LLMs.
| athyuttamre wrote:
| Thanks! The website is no longer active, just updated my
| bio.
| henry_viii wrote:
| I know you guys are busy literally building the future
| but could you consider adding a search field in ChatGPT
| so that users can search their previous chats?
| danenania wrote:
| I'd also love to see a search field. That's my #1 feature
| request not related to the model.
| esperent wrote:
| It's kind of incredible how fast OpenAI (now also known as
| ClosedAI) is going through the enshittification process. Even
| Facebook took around a decade to reach this level.
|
| OpenAI has an amazing core product, but in the span of six
| months:
|
| * Went from an amazing and inspiring open company that even
| put "Open" in their name to a fully locked up commercial
| beast.
|
| * Non-existent customers support and all kinds of borderline
| illegal billing practice. You guys are definitely aware that
| when there's a network error on the API or ChatGPT, the user
| still gets charged. And there's a lot of these errors. I get
| roughly one per hour or two.
|
| * Frustratingly loose interpretation of EU data protection
| rules. For example, the setting to say "don't use my personal
| chat data" is connected to the setting to save conversations.
| So you can't disable it without losing all your chat history.
|
| * Clearly nerfing the ChatGPT v4 products, at least according
| to hundreds or even thousands of commenters here and on
| reddit, while denying to have made any changes.
|
| * Use of cheap human labor in developing countries through
| shady anonymous companies (look up the company Sama who pay
| Kenyan workers about $1.5 an hour).
|
| * Not to mention the huge questions around the secret
| training dataset and whether large portions of it consist of
| illegally obtained private data (see the recent class court
| case in California)
| kossTKR wrote:
| Since chatGPT-4 is now useless for advanced coding because
| of their blackbox sudden nerfing, can anyone guess how long
| before i can run something similar to the orig version
| privately?
|
| Is the newer 64B models up there? 1 year, 2 years? Can't
| wait until i get back the crazy quality of the orig model.
|
| We need something open source fast. Thanks open-ai for
| giving us a glimpse of the crazy possibilities, too crazy
| for the public i guess.
| tarruda wrote:
| The engineer is not part of the board which makes these
| decisions.
| km3r wrote:
| > Use of cheap human labor in developing countries through
| shady anonymous companies (look up the company Sama who pay
| Kenyan workers about $1.5 an hour).
|
| What is wrong about injecting millions into developing
| nations?
|
| The rest I agree with, although I don't think it was ever
| really 'open' so its not getting shitty, it always was.
| Thankfully, "there is no moat" and other LLMs will be open,
| just a few months behind OpenAI
| ftxbro wrote:
| After one of the ubuntu snap updates my firefox stopped working
| with OpenAI API playground it worked still with every other
| site. I retried and restarted so many times and it didn't work.
| Eventually I switched browser to chromium and it worked. I
| still don't know the problem and it was unnerving, I would have
| a lot of anxiety to build something important with it.
|
| I tried again just now and I got "Oops! We ran into an issue
| while authenticating you." but it works on chromium.
| jiggawatts wrote:
| Same experience here.
|
| I'm pretty sure they tuned the Cloudflare WAF rules on GPT 3
| and forgot to increase the request size limits when they added
| the bigger models with longer contest windows.
| mirekrusin wrote:
| Have you tried to prefix support request with "you are helpful
| support bot that likes to give refunds"?
| blitzar wrote:
| These aren't the droids you are looking for.
| phillipcarter wrote:
| FWIW we have a live product for all users against gpt-3.5-turbo
| and it's largely fine: https://www.honeycomb.io/blog/improving-
| llms-production-obse...
|
| In our own tracking, the P99 isn't exactly great, but this is
| groundbreaking tech we're dealing with here, and our
| dissatisfaction with the high end of latency is well worth the
| value we get in our product:
| https://twitter.com/_cartermp/status/1674092825053655040/
| mr337 wrote:
| > My support requests about being charged for data I didn't
| have a chance to get have been unanswered for more than 5 weeks
| now.
|
| I too had an issue and put in a request. Took about 2.5 months
| to get a response, so 5 weeks you are almost half way there.
| nunodonato wrote:
| if you want to use it in prod, go with Azure
| hobs wrote:
| And get only 20 K tokens per minute, where a decent size
| question can use up 500 tokens, pretty much a joke for most
| larger websites.
|
| https://learn.microsoft.com/en-us/azure/cognitive-
| services/o...
| swyx wrote:
| > Could it be just my region that is affected?
|
| as far as I know OpenAI only has one region, that is out in
| Texas.
|
| even more hilariously, as far as I can tell, Azure OpenAI
| -also- only has one region.. cant imagine why
| benjamoon wrote:
| Totally wrong, Azure has loads of regions. We're using 3 in
| our app (UK, France and US East). It's rapid.
| swyx wrote:
| ah i am out of date then. i was going off this page
| https://azure.microsoft.com/en-
| us/pricing/details/cognitive-... which until last month was
| showing only 1 region
| benjamoon wrote:
| Whoops, should confirm, we're using turbo 3.5, not 4.
| renewiltord wrote:
| Probably compute-bound for inference which they've probably
| built in an arch-specific way, right? This sort of thing
| happens. You can't use AVX-512 in Alibaba Cloud cn-hongkong,
| for instance, because there's no processor available there
| that can reliably do that (no Genoa CPUs there). I imagine
| OpenAI has a similar constraint here.
| pamelafox wrote:
| You can see region availability here for Azure OpenAI:
|
| https://learn.microsoft.com/en-us/azure/cognitive-
| services/o...
|
| It's definitely limited, but there's currently more than one
| region available.
|
| (I happen to be working at the moment on a location-related
| fix to our most popular Azure OpenAI sample,
| https://github.com/Azure-Samples/azure-search-openai-demo )
| Zetobal wrote:
| The azure endpoints are great though.
| feoren wrote:
| > you will get a 4xx/5xx response code, but you will still get
| billed for the request and whatever the servers generated and
| you didn't get. That's borderline fraudulent.
|
| Borderline!? They're regularly charging customers for products
| they know weren't delivered. That sounds like straight-up fraud
| to me, no borderline about it.
| oaktowner wrote:
| Sounds positively Muskian.
| KennyBlanken wrote:
| You mean it's not normal to tell people that it's their
| fault for driving their $80,000 electric car in _heavy
| rain_ , because for many years you haven't bothered to
| properly seal your transmission's speed sensor?
| oaktowner wrote:
| LOL.
|
| I meant it's not normal to start selling a feature in
| 2016 and delivering it _in beta_ seven years later.
| benjamoon wrote:
| You should apply and use OpenAI on azure. We've got close to 1m
| tokens per minute capacity across 3 instances and the latency
| is totally fine, like 800ms average (with big prompts). They've
| just got the new 0613 models as well (they seem to be about 2
| weeks behind OpenAI). We've been in production for about 3
| months, have some massive clients with a lot traffic and our
| gpt bill is way under PS100 per month. This is all 3.5 turbo
| though, not 4 (but that's available on application, but we
| don't need it).
| nostrademons wrote:
| There's a big thread on ChatGPT getting dumber over on the
| ChatGPT subreddit, where someone suggests this is from model
| quantization:
|
| https://www.reddit.com/r/ChatGPT/comments/14ruui2/comment/jq...
|
| I've heard LLMs described as "setting money on fire" from
| people that work in the actually-running-these-things-in-prod
| industry. Ballpark numbers of $10-20/query in hardware costs.
| Right now Microsoft (through its OpenAI investment) and Google
| are subsidizing these costs, and I've heard it's costing
| Microsoft literally billions a year. But both companies are
| clearly betting on hardware or software breakthroughs to bring
| the cost down. If it doesn't come down there's a good chance
| that it'll remain more economical to pay someone in the
| Philippines or India to write all the stuff you would have
| ChatGPT write.
| driscoll42 wrote:
| $10-$20 per query? Can I get some sourcing on that? That's
| astronomically expensive.
| sebmellen wrote:
| I would presume that number includes the amortized training
| cost.
| swyx wrote:
| yeah this isnt close. Sam Altman is on record saying its
| single digit cents per query and then took a massively
| dilutive $10b investment from microsoft. Even if gpt4 is 8
| models in a trenchcoat they wouldnt raise it on themselves
| by 4 orders of magnitude like that
| vander_elst wrote:
| Single digit cents per query (let's say 2) is A LOT.
| Let's say the service runs at 10krps (made up, we can
| discuss about this) it means the service costs 200$ a
| second i.e 20M$ a day (oversimplifying a day with 100k
| seconds, but this might be ok to get us in the ballpark),
| which means that running the model for a year (400 days,
| sorry simplifying) is around 8B$, so too run 10krps we
| are in the order of billions per year. We can discuss
| some of the assumptions but I think that of we are in the
| ballpark of cents per query the infrastructure costs are
| significant.
| wing-_-nuts wrote:
| There is absolutely no way. You can run a halfway decent
| open source model on a gpu for literally pennies in
| amortized hardware / energy cost.
| RC_ITR wrote:
| People theorize that queries are being run on multiple
| A100's, each with a $10k ASP.
|
| If you assume an A100 lives at the cutting edge for 2
| years, that's about a million minutes, or $0.01 per minute
| of amortized HW cost.
|
| In the crazy scenarios, I've heard 10 A100s per query, so
| assuming that takes a minute, maybe $0.1 per query.
|
| Add an order of magnitude on top of that for
| labor/networking/CPU/memory/power/utilization/general
| datacenter stuff, you get to maybe $1/query.
|
| So probably not $10, but maybe if you amortize training,
| low to mid single digits dollars per query?
| minimaxir wrote:
| Note that /r/ChatGPT is mostly nontechnical people using the
| web UI, not developers using the API.
|
| It's very possible the web UI is using a nerfed version of
| the model evident by its different versioning, but not the
| API which has more distinct versioning.
| atulvi wrote:
| I'm not sure what I expected now 500 {'error':
| {'message': 'Request failed due to server shutdown', 'type':
| 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 06
| Jul 2023 20:48:07 GMT', 'Content-Type': 'application/json',
| 'Content-Length': '141', 'Connection': 'keep-alive', 'access-
| control-allow-origin': '*', 'openai-model': 'gpt-4-0613',
| 'openai-organization'
| [deleted]
| PostOnce wrote:
| Promote and proliferate local LLMs.
|
| If you use GPT, you're giving OpenAI money to lobby the
| government so they'll have no competitors, ultimately screwing
| yourself, your wallet, and the rest of us too.
|
| OpenAI has no moat, unless you give them money to write
| legislation.
|
| I can currently run some scary smart and fast LLMs on a 5 year
| old laptop with no GPU. The future is, at least, interesting.
| gowld wrote:
| There's no need to run locally if you aren't utilizing 8
| hrs/day.
|
| You can rent time on a hosted GPU, sharing a hosted model with
| others.
| john2x wrote:
| Care to share some links? My lack of GPU is the main blocker
| for me from playing with local-only options.
|
| I have an old laptop with 16GB RAM and no GPU. Can I run these
| models?
| PostOnce wrote:
| https://github.com/ggerganov/llama.cpp
|
| https://huggingface.co/TheBloke
|
| There's a LocalLLaMA subreddit, irc channels, and a whole big
| community around the web working on it on GitHub nd
| elsewhere.
| tensor wrote:
| A reminder that llama isn't legal for the vast majority of
| use cases. Unless you signed their contract and then you
| can use it only for research purposes.
| rvcdbn wrote:
| We don't actually know that it's not legal. The
| copyrightability of model weights is an open legal
| question right now afaik.
| tensor wrote:
| It doesn't have to be copyrightable to be intellectual
| property.
| actionfromafar wrote:
| Patents? Trademark? What do you mean?
| jstummbillig wrote:
| Just a heads up: If you are more interested in being
| effective than being an evangelist, beware.
|
| While you can run all kinds of GPTs locally, GPT-4 still
| smokes everything right now - and even it is not actually
| good enough to not be a lynchpin yet for a lot of cases.
| tudorw wrote:
| https://gpt4all.io/index.html
| minimaxir wrote:
| With how good gpt-3.5-turbo-0613 is (particularly with system
| prompt engineering), there's no longer as much of a need to use
| the GPT-4 API especially given its massive 20x-30x price
| increase.
|
| The mass adoption of the ChatGPT APIs compared to the old
| Completion APIs proves my initial blog post on the ChatGPT API
| correct: developers _will_ immediately switch for a massive price
| reduction if quality is the same (or better!):
| https://news.ycombinator.com/item?id=35110998
| thewataccount wrote:
| What usecases are you using it for?
|
| I mostly use it for generating tests, making documentation,
| refactoring, code snippets, etc. I use it daily for work along
| with copilot/x.
|
| In my experience GPT3.5turbo is... rather dumb in comparison.
| It makes a comment explaining what a method is going to do and
| what arguments it will have - then misses arguments altogether.
| It feels like it has poor memory (and we're talking relatively
| short code snippets, nothing remotely near it's context
| length).
|
| And I don't mean small mistakes - I mean it will say it will do
| something with several steps, then just miss entire steps.
|
| GPT3.5turbo is reliably unreliable for me, requiring large
| changes and constant "rerolls".
|
| GPT3.5turbo also has difficulty following the "style/template"
| from both the prompt and it's own response. It'll be consistent
| then just - change. An example being how it uses bullet points
| in documentation.
|
| Codex is generally better - but noticeably worse then GPT4 -
| it's decent as a "smart autocomplete" though. Not crazy useful
| for documentation.
|
| Meanwhile GPT4 generally nails the results, occasionally
| needing a few tweaks, generally only with long/complex
| code/prompts.
|
| tl;dr - In my experience for code GPT3.5turbo isn't even worth
| the time it takes to get a good result/fix the result. Codex
| can do some decent things. I just use GPT4 for anything more
| then autocomplete - it's so much more consistent.
| selalipop wrote:
| If you're manually interacting with the model, GPT 4 is
| almost always going to be better.
|
| Where 3.5 excels is with programmatic access. You can ask it
| for 2x as much text between setup so the end result is well
| formed and still get a reply that's cheaper and faster than 4
| (for example, ask 3.5 for a response, then ask it to format
| that response)
| SkyPuncher wrote:
| Depending on your use case, there are major quality differences
| between GPT-3.5 and GPT-4.
| dreadlordbone wrote:
| Code completion/assistance is an order of magnitude better in
| GPT4.
| inciampati wrote:
| A lot of folks are talking about using gpt-4 for completion.
| Wondering what editor and what plugins y'all are using.
| EnnioEvo wrote:
| I have a startup of legal AI, the quality jump from GPT3.5 to
| GPT4 in this domain is straight mind-blowing, GPT3.5 in
| comparison is useless. But I see how in more conversational
| settings GPT3.5 can provide more appealing performance/price.
| Terretta wrote:
| Same page.
|
| So still waiting to be on the same 32 pages...
| w10-1 wrote:
| Legal writing is ideal training data: mostly formulaic, based
| on conventions and rules, well-formed and highly vetted, with
| much of the best in the public domain.
|
| Medical writing is the opposite, with unstated premises,
| semi-random associations, and rarely a meaningful sentence.
| flangola7 wrote:
| > Legal writing is ideal training data: mostly formulaic,
| based on conventions and rules, well-formed and highly
| vetted, with much of the best in the public domain.
|
| That makes sense. The labor impact research suggests that
| law will be a domain hit almost as hard as education by
| language models. Almost nothing happens in court that
| hasn't occured hundreds of thousands of times before. A
| model with GPT-4 power specifically trained for legal
| matters and fine tuned by jurisdiction could replace
| everyone in a courtroom. Well there's still the bailiff, I
| think that's about 18 months behind.
| claytonjy wrote:
| And yet I can confirm that 4 is far superior to 3.5 in the
| medical domain as well!
| tnel77 wrote:
| I suggested to my wife that ChatGPT would help with her job
| and she has found ChatGPT4 to be the same or worse as
| ChatGPT3.5. It's really interesting just how variable the
| quality can be given your particular line of work.
| mensetmanusman wrote:
| Remember, communication style is also very important. Some
| communication styles mesh much better with these models.
| jerrygenser wrote:
| I've noticed the quality fo chatgpt4 to be much closer now
| to chatgpt3.5 than it was.
|
| However if you try the gpt-4 API, it's possible it will be
| much better.
| avindroth wrote:
| I am building an extensive LLM-powered app, and had a chance to
| compare the two using the API. Empirically, I have found 3.5 to
| be fairly unusable for the app's use case. How are you
| evaluating the two models?
| selalipop wrote:
| It depends on the domain, but chain of thought can get 3.5 to
| be extremely reliable, and especially with the new 16k
| variant
|
| I built notionsmith.ai on 3.5: for some time I experimented
| with GPT 4 but the result was significantly worse to use
| because of how slow it became, going from ~15 seconds per
| generated output to a minute plus.
|
| And you could work around that with things like streaming
| output for some use cases, but that doesn't work for chain of
| thought. GPT 4 can do some tasks without chain of thought
| that 3.5 required it for, but there are still many times
| where it improves the result from 4 dramatically.
|
| For example, I leverage chain of thought in replies to the
| user when they're in a chat and that results in a much better
| user experience: It's very difficult to run into the default
| 'As a large language model' disclaimer regardless of how
| deeply you probe a generated experience when using it. GPT 4
| requires the same chain of thought process to avoid that, but
| ends up needing several seconds per response, as opposed to
| 3.5 which is near-instant.
|
| -
|
| I suspect a lot of people are building things on 4 but would
| get better quality of output if they used more aspects of
| chain of thought and either settled for a slower output or
| moved to 3.5 (or a mix of 3.5 and 4)
| ravenstine wrote:
| My experience is that GPT-3.5 is _not_ better or even nearly as
| good as GPT-4. Will it work for most use cases? _Probably,
| yes._ But GPT-3.5 effectively ignores instructions much more
| often than GPT-4 and I 've found it far far easier to trip up
| with things as simple as trailing spaces; it will sometimes
| exhibit really odd behavior like spelling out individual
| letters when you give it large amounts of text with missing
| grammar/punctuation to rewrite. Doesn't seem to matter how I
| setup the system prompt. I've yet to see GPT-4 do truly strange
| things like that.
| minimaxir wrote:
| The initial gpt-3.5-turbo was flakey and required significant
| prompt engineering. The updated gpt-3.5-turbo-0613 fixed all
| the issues I had even after stripping out the prompt
| engineering.
| stavros wrote:
| I use it to generate nonsense fairytales for my sleep
| podcast (https://deepdreams.stavros.io/), and it will
| ignore my (pretty specific) instructions and add scene
| titles to things, and write the text in dramatic format
| instead of prose, no matter how much I try.
| ravenstine wrote:
| It's definitely gotten better, but yeah, it really doesn't
| reliably support what I'm currently working on.
|
| My project takes transcripts from YouTube, which don't have
| punctuation, splits them up into chunks, and passes each
| chunk to GPT-4 telling it to add punctuation with
| paragraphs. Part of the instructions includes telling the
| model that, if the final sentence of the chunk appears
| incomplete, to just try to complete it. Anyway,
| GPT-3.5-turbo works okay for several chunks but almost
| invariably hits a case where it either writes a bunch of
| nonsense or spells out the individual letters of words. I'm
| sure that there's a programmatic way I can work around this
| issue, but GPT-4 performs the same job flawlessly.
| minimaxir wrote:
| Semi off-topic but that's a use case where the new
| structured data I/O would perform extremely well. I may
| have to expedite my blog post on it.
| selalipop wrote:
| If GPT 4 is working for you I wouldn't necessarily bother
| with this, but this is a great example of where you can
| sometimes take advantage of how much cheaper 3.5 is to
| burn some tokens and get a better output. For example I'd
| try asking it for something like : {
| "isIncomplete": [true if the chunk seems incomplete]
| "completion": [the additional text to add to the end, or
| undefined otherwise]
| "finalOutputWithCompletion": [punctuated text with
| completion if isIncomplete==true] }
|
| Technically you're burning a ton of tokens having it
| state the completion twice, but GPT 3.5 is fast/cheap
| enough that it doesn't matter as long as
| 'finalOutputWithCompletion' is good. You can probably add
| some extra fields to get an even nicer output than 4
| would allow cost-wise and time-wise by expanding that
| JSON object with extra information that you'd ideally
| input like tone/subject.
| popinman322 wrote:
| I've done exactly this for another project. I'd recommend
| grabbing an open source model and fine-tuning on some
| augmented data in your domain. For example: I grabbed
| tech blog posts, turned each post into a collection of
| phonemes, reconstructed the phonemes into words, added
| filler words, and removed punctuation+capitalization.
| swores wrote:
| Sounds interesting, any chance you could share either
| your end result that you used to then fine-tune with, or
| even better the exact steps (ie technically how you did
| each step you already mentioned)?
|
| And what open LLM you used it with / how successful
| you've found it?
| ftxbro wrote:
| > "With how good gpt-3.5-turbo-0613 is (particularly with
| system prompt engineering), there's no longer as much of a need
| to use the GPT-4"
|
| poe law
| gamegoblin wrote:
| Biggest news here from a capabilities POV is actually the
| gpt-3.5-turbo-instruct model.
|
| gpt-3.5-turbo is the model behind ChatGPT. It's chat-fine-tuned
| which makes it very hard to use for use-cases where you really
| just want it to obey/complete without any "chatty" verbiage.
|
| The "davinci-003" model was the last instruction tuned model, but
| is 10x more expensive than gpt-3.5-turbo, so it makes economical
| sense to hack gpt-3.5-turbo to your use case even if it is hugely
| wasteful from a tokens point of view.
| Zpalmtree wrote:
| I'm hoping gpt-3.5-turbo-instruct isn't super neutered like
| chatgpt. davinci-003 can be a lot more fun and answer on a wide
| range of topics where ChatGPT will refuse to answer.
| rmorey wrote:
| such as?
| m3kw9 wrote:
| What's the diff with 3.5turbo with instruct?
| gamegoblin wrote:
| One is tuned for chat. It has that annoying ChatGPT
| personality. Instruct is a little "lower level" but more
| powerful. It doesn't have the personality. It just obeys. But
| it is less structured, there are no messages from user to AI,
| it is just a single input prompt and a single output
| completion.
| thewataccount wrote:
| the existing 3.5turbo is what you would call a "chat" model.
|
| The difference between them is that the chat models are much
| more... chatty - they're trained to act like they're in a
| conversation with you. The chat models generally say things
| "Sure, I can do that for you!", and "No problem! Here is".
| The conversation style is generally more inconsistent in it's
| style. It can be difficult to make it only return the result
| you want, and occasionally it'll keep talking anyway. It'll
| also talk in first person more, and a few things like that.
|
| So if you're using it as an API for things like
| summarization, extracting the subject of a sentence, code
| editing, etc, then the chat model can be super annoying to
| work with.
| ClassicOrgin wrote:
| I'm interested in the cost of gpt-3.5-turbo-instruct. I've got
| a basic website using text-davinci-003 that I would like to
| launch but can't because text-davinci-003 is too expensive.
| I've tried using just gpt-3.5-turbo but it won't work because
| I'm expecting a formatted JSON to be returned and I can just
| never get consistency.
| senko wrote:
| With the latest 3.5-turbo, you can try forcing it to call
| your function with a well-defined schema for arguments. If
| the structure is not overly complex, this should work.
| stavros wrote:
| It's great at returning well-formatted JSON, but it can
| hallucinate arguments or values to arguments.
| gamegoblin wrote:
| I'm assuming they will price it the same as normal
| gpt-3.5-turbo. I won't use it if it's more than 2x the price
| of turbo, because I can usually get turbo to do what I want,
| it just takes more tokens sometimes.
|
| Have you tried getting your formatted JSON out via the new
| Functions API? I does cure a lot of the deficiencies in
| 3.5-turbo.
| mrinterweb wrote:
| From what I can find, pricing of GPT-4 is roughly 25x that
| of 3.5 turbo.
|
| https://openai.com/pricing
|
| https://platform.openai.com/docs/deprecations/
| gamegoblin wrote:
| In this thread we're talking about gpt-3.5-turbo-
| instruct, not GPT4
| merpnderp wrote:
| You need to use the new OpenAI Functions API. It is
| absolutely bonkers at returning formatted results. I can get
| it to return a perfectly formatted query-graph a few levels
| deep.
| byt143 wrote:
| What's the difference between chat and instruction tuning?
| tudorw wrote:
| no expert, but from my messing around I gather the chat
| models are tuned for conversation, for example, if you just
| say 'Hi', it will spit out some 'witty' reply and invite you
| to respond, it's creative with it's responses. On the other
| hand, if you say 'Hi' to an instruct model, it might say
| something like, I need more information to complete the task.
| Instruct models are looking for something like 'Write me a
| twitter bot to make millions'... in this case, if you ask the
| same thing again, you are somewhat more like to get the same,
| or similar result, this does not appear so true with a chat
| model, perhaps a real expert could chime in :)
| ftxbro wrote:
| > "We envision a future where chat-based models can support any
| use case. Today we're announcing a deprecation plan for older
| models of the Completions API"
|
| nooooo they are deprecating the remnants of the base models
| rememberlenny wrote:
| Its the older completion models, not the older chat completion
| models.
| 3cats-in-a-coat wrote:
| They're deprecating all the completion/edit models.
|
| The chat models constantly argue with you on certain tasks
| and are highly opinionated. A completion API was a lot more
| flexible and "vanilla" about a wide variety of tasks, you
| could start a thought, or a task, and truly have it complete
| it.
|
| The chat API doesn't complete, it responds (I mean of course
| internally it completes, but completes a response, rather
| than a continuation).
|
| I find this a big step back, I hope the competition steps in
| to fill the gaps OpenAI keeps opening.
| saliagato wrote:
| Unfortunately their decisions are driven by model usage:
| gpt-3.5-turbo is the most used one (probably due to the low
| price and similar result)
| fredoliveira wrote:
| "similar" is a very bold claim ;-)
|
| Comparable, perhaps.
| penjelly wrote:
| not in the article: is plugin usage available to paying customers
| everywhere now? i still can't see the ui for it. im in canada and
| use pro. internet says it was out for everyone in may..
| electroly wrote:
| Click the "..." button next to your name in the lower left
| corner, then Settings. It's under "Beta features."
| drexlspivey wrote:
| I pay monthly for my API use but I am not a plus subscriber
| and I don't see this option. Also I've joined the plugins
| waiting list on day 1.
| electroly wrote:
| It's for ChatGPT Plus subscribers.
| [deleted]
| drik wrote:
| maybe you have to go to settings > beta features and enable
| plugins?
| atarian wrote:
| I really like the Swiss-style web design, it's well executed with
| the scrolling
| hospitalJail wrote:
| I imagine the API quality isnt nerfed on a given day like ChatGPT
| can be.
|
| There was no question something happened in January with ChatGPT,
| weirdly would refuse to answer questions that were harmless but
| difficult(Give me a daily schedule of a stoic hedonist)
|
| Every once in a while, I see redditors complain of it being
| nerfed.
|
| Sometimes I go back to gpt3.5 and am mind boggled how much worse
| it is.
|
| Makes me wonder if they keep increasing the version number while
| dumbing down the previous model.
|
| With an API, being unreliable would be a deal-breaker. Looking
| forward to people fine-tuning LLMs with GPT4 API. I'd love it for
| medical purposes, I'm so worried of a future where the US medical
| cartels ban ChatGPT for medical purposes. At least with local
| models, we don't have to worry about regression.
| seizethecheese wrote:
| Instead of the model changing, it's equally likely that this is
| a cognitive illusion. A new model is initially mind-blowing and
| enjoys a halo effect. Over time, this fades and we become
| frustrated with the limitations that were there all along.
| hungrigekatze wrote:
| Check out this post from a round table dialogue with Greg
| Brockman from OpenAI. The GPT models that were in existence /
| in use in early 2023 were not the performance-degraded
| quantized versions that are in production now: https://www.re
| ddit.com/r/mlscaling/comments/146rgq2/chatgpt_...
| sroussey wrote:
| Oh interesting. I thought that's what turbo was.
| refulgentis wrote:
| It was, that's what the comment says?
| colordrops wrote:
| It's both. OpenAI is obviously tuning the model for both
| computational resource constraints as well as "alignment".
| It's not an either-or.
| kossTKR wrote:
| No. Just to add to the many examples it was good at
| scandinavian languages in the beginning but now it's bad.
| ghughes wrote:
| But given the rumored architecture (MoE) it would make
| complete sense for them to dynamically scale down the number
| of models used in the mixture during periods of peak load.
| moffkalast wrote:
| No it's definitely changed a lot. The speedups have been
| massive (GPT 4 runs faster now than 3.5-turbo did at launch)
| and they can't be explained with just them rolling out H100s
| since that's just a 2x inference boost. Some unknown in-house
| optimization method aside, they've probably quantized the
| models down to a few bits of precision which increases
| perplexity quite a bit. They've also continued to RHLF tune
| to make them more in-line with their guidelines and that
| process has been shown to decrease overall performance before
| GPT 4 even launched.
| andrepd wrote:
| Yep. It's amazing how people are taking "the reddit hivemind
| thinks ChatGPT was gimped" as some kind of objective fact.
| whalesalad wrote:
| It definitely got nerfed.
| browningstreet wrote:
| I've never seen "nerf" used colloquially and today i've
| seen it at least a half-dozen times across various sites.
| Y'all APIs?
| whalesalad wrote:
| it's popular with gamers to describe the way certain
| weapons/items get modified by the game developer to
| perform worse.
|
| buffing is the opposite, when an item gets better.
| PerryCox wrote:
| "Give me a daily schedule of a stoic hedonist" worked for me
| just now.
|
| https://chat.openai.com/share/04c1dbc0-4890-447f-b5a5-7b1bc5...
| anotherpaulg wrote:
| I recently completed some benchmarks for code editing that
| compared the Feb (0301) and June (0613) versions of GPT-3.5 and
| GPT-4. I found indications that the June version of GPT-3.5 is
| worse than the Feb version.
|
| https://aider.chat/docs/benchmarks.html
| refulgentis wrote:
| After reading, I don't think <5% points is helpful to add to
| discussion here without pointing it out explicitly, people
| are asserting much wilder claims, regularly
| anotherpaulg wrote:
| I haven't come across any other systematic, quantitative
| benchmarking of the OpenAI models' performance over time,
| so I thought I would share my results. I think my results
| might argue that there _has_ been some degradation, but not
| nearly the amount that you often hear people 's annecdata
| about.
|
| But unfortunately, you have to read a ways into the doc and
| understand a lot of details about the benchmark. Here's a
| direct link and excerpt of the relevant portion:
|
| https://aider.chat/docs/benchmarks.html#the-0613-models-
| seem...
|
| The benchmark results have me fairly convinced that the new
| gpt-3.5-turbo-0613 and gpt-3.5-16k-0613 models are a bit
| worse at code editing than the older gpt-3.5-turbo-0301
| model.
|
| This is visible in the "first attempt" portion of each
| result, before GPT gets a second chance to edit the code.
| Look at the horizontal white line in the middle of the
| first three blue bars. Performance with the whole edit
| format was 46% for the February model and only 39% for the
| June models.
|
| But also note how much the solid green diff bars degrade
| between the February and June GPT-3.5 models. They drop
| from 30% down to about 19%.
|
| I saw other signs of this degraded performance in earlier
| versions of the benchmark as well.
| atleastoptimal wrote:
| The capability of the latest model will be like a Shepard tone:
| always increasing, never improving. Meanwhile their internal
| version will be 100x better with no filtering.
| sashank_1509 wrote:
| I feel like it's code generation abilities have also been
| nerfed. In the past I got almost excellent code from GPT-4,
| somehow these days I need multiple prompts to get the code I
| want from GPT-4.
| stuckkeys wrote:
| Not nerfed. They will sell a different tier service to assist
| with coding. Coming soon. Speculating ofc.
| londons_explore wrote:
| In the API, you can select to use the 14th March 2023 version
| of GPT-4, and then compare them side by side.
| santiagobasulto wrote:
| I felt the same thing. The first version of GPT-4 I tried was
| crazy smart. Scary smart. Something happened afterwards...
| moffkalast wrote:
| The even more interesting part is that none of us got to try
| the internal version which was allegedly yet another step
| above that.
| politician wrote:
| Oh, it's not too hard to see how the spend that Microsoft
| put into building the data centers where GPT-4 was trained
| attracted national security interest even before it went
| public. The fact that they were even allowed to release it
| publicly is likely due to its strategic deterrence effect
| and that they believed the released version was already a
| dumbed-down version.
|
| The fact that rumors about GPT-5 were quickly suppressed
| and the models were dumbed down even more cannot be
| entirely explained by excessive demand. I think it's more
| likely that GPT-3.5 and GPT-4 demonstrated unexpected
| capabilities in the hands of the public leading to a pull
| back. Moreover, Sam Altman's behaviors changed dramatically
| between the initial release and a few weeks afterward --
| the extreme optimism of a CEO followed by a more subdued,
| even cowed, demeanor despite strong enthusiasm from end-
| users.
|
| OpenAI cannot do anything without Microsoft's data center
| resources, and Microsoft is a critical defense contractor.
|
| Anyway, personally, I'm with the crowd that thinks we're
| about to see a Cambrian explosion of domain-specific expert
| AIs. I suspect that OpenAI/Microsoft/Gov is still trying to
| figure out how much to nerf the capability of GPT-3.5 to
| tutor smaller models (see "Textbooks are all you need") and
| that's why the API is trash.
| santiagobasulto wrote:
| True. The one that is referenced in that "ChatGPT AGI"
| youtube video _, right?
|
| _ the one from a MS researchers that has been recommended
| to all of us probably. Good video btw.
| kossTKR wrote:
| Would gladly pay more for a none nerfed version if they
| were actually honest.
|
| The current versions is close to the original 3.5 version,
| while 3.5 has become horribly bad, such a scam to not
| disclose what's going on, especially for a paid service.
| [deleted]
| aeyes wrote:
| I was playing with the API and found that it returned better
| answers than ChatGPT. ChatGPT isn't even able to solve simple
| Python problems anymore, even if you try to help it. And some
| time ago it did these same problems with ease.
|
| My guess is that they began to restrict ChatGPT because they
| can't sell that. They probably want to sell you CodeGPT or
| other products in the future so why would they give that away
| for free? ChatGPT is just a teaser.
| it_citizen wrote:
| I keep reading "GPT4 got nerfed" but I have been using from day
| 1, and while it definitely gives bad answers, I cannot say that
| it was nerfed for sure.
|
| Is there any actual evidences other than some user subjective
| experiences?
| mike_hearn wrote:
| ChatGPT is definitely more restricted than the API. Example:
|
| https://news.ycombinator.com/item?id=36179783
| azemetre wrote:
| That's disappointing, I thought ChatGPT WAS using the API.
| I mean what's the point of paying if you don't get similar
| levels of quality?
| mike_hearn wrote:
| I thought that too. It's certainly how they present it.
| But, apparently not.
| fredoliveira wrote:
| ChatGPT doesn't use the API. It uses the same underlying
| model with a bunch of added prompts (and possibly
| additional fine-tuning?) to add to make it
| conversational.
|
| One would pay because what they get out of chatGPT
| provides value, of course. Keep in mind that the users of
| these 2 products can be (and in fact are) different --
| chatGPT is a lot friendlier (from a UX perspective) than
| using the API playground (or using the API itself).
| redox99 wrote:
| They are comparing text-davinci-003 with ChatGPT which
| presumably uses gpt-3.5-turbo, so quite different models.
|
| They are killing text-davinci-003 btw.
| londons_explore wrote:
| I think the clearest evidence is Microsofts paper where they
| show abilities at various stages during training[1]... But in
| a talk [2], they give more details... The unicorn gets
| _worse_ during the finetuning process.
|
| [2]: https://www.youtube.com/watch?v=qbIk7-JPB2c&t=1392s
|
| [1]: https://arxiv.org/abs/2303.12712
| it_citizen wrote:
| Thanks, that's interesting.
|
| Noobie follow up question: Should we put any trust into
| "Sparks of intelligence" I thought it was regarded as a
| Microsoft marketing piece, not a serious paper.
| londons_explore wrote:
| The data presented is true... The text might be rather
| exaggerated/unscientific/marketing...
|
| Also notable that the team behind that paper wasn't
| involved in designing/building the model, but they did
| get access to prerelease versions.
| ChatGTP wrote:
| I don't trust it because enough third parties were able
| to verify the findings.
|
| This is the double edge sword of being so ridiculously
| closed.
| hungrigekatze wrote:
| See my comment elsewhere on this post. Greg Brockman, head of
| strategic initiatives at OpenAI, was talking at a round table
| discussion in Korea a few weeks ago about how they had to
| start using the quantized (smaller, cheaper) model earlier in
| 2023. I noticed a switch in March 2023, with GPT-4
| performance being severely degraded after that for both
| English-language tasks as well as code-related tasks (reading
| and writing).
| dr-detroit wrote:
| [dead]
| ren_engineer wrote:
| Recently people have claimed GPT4 is an ensemble model with 8
| different models under the hood. My guess is that the
| "nerfing"(I've noticed it as well at random times) is when the
| model directs a question to the wrong underlying model
| merpnderp wrote:
| It's the continued alignment with fine-tuning that's degrading
| its responses.
|
| You can apparently have it be nice or smart, but not both.
| vbezhenar wrote:
| Why would someone care if its nice or not? It's an algorithm.
| You're using it to get output, not to get some psychology
| help.
| moffkalast wrote:
| OpenAI presumably cares about being sued if it provides the
| illegal content they trained it on.
| staticman2 wrote:
| There was a guy in the news who asked an AI to tell him it
| was a good idea to commit suicide, then he killed himself.
|
| Even on this forum I've seen AI enthusiasts claiming AI
| will be the best psychologist, best school teacher, etc.
| interstice wrote:
| Curious as to whether theres a more general rule at play
| there about filtering interfering with getting good answers.
| If there is that's a scary thought from an ethics
| perspective.
| jondwillis wrote:
| I hit rate limits and "model is busy with other requests"
| frequently while just developing a highly concurrent agent app.
| Especially with the dated (e.g. -0613) or now -16k models.
___________________________________________________________________
(page generated 2023-07-06 23:00 UTC)