[HN Gopher] It looks like GPT-4-32k is rolling out
___________________________________________________________________
It looks like GPT-4-32k is rolling out
Author : freediver
Score : 228 points
Date : 2023-05-06 13:59 UTC (9 hours ago)
(HTM) web link (community.openai.com)
(TXT) w3m dump (community.openai.com)
| m3kw9 wrote:
| Waiting for gpt4 turbo
| vl wrote:
| What is turbo in this context?
| m3kw9 wrote:
| Gpt3.5 turbo it was like 10x cheaper than a similar version.
| Gpt4 is like 100x more expensive then 3.5 turbo
| moffkalast wrote:
| A model optimized for inference speed instead of raw
| accuracy.
| richardanaya wrote:
| Same .. gpt-4 is a very harsh trade off of time for products.
| hanoz wrote:
| The more token capacity that's added the more wasteful it seems
| to have to use this statelessly. Is there any avoiding this?
|
| Wonderous as this new tech is, it seems a bit much to be paying
| $2 a question in a conversation about a 32k token text.
| weird-eye-issue wrote:
| Handle the state on the application side...
|
| It is like complaining that HTTP is limiting because it is
| stateless. Build state on top of it.
| delusional wrote:
| I think he's talking about computational efficiency. If
| you're loading in 29k tokens and you're expecting to use
| those again, you wouldn't need to do the whole matrix
| multiplication song and dance again if you just kept the old
| buffers around for the next prompt.
| weird-eye-issue wrote:
| I don't think this can necessarily be optimized at least
| with how the models work right now
| jtbayly wrote:
| Yeah, I can see this being useful for one-off queries, but
| don't they want to offer some sort of final training ("last-
| mile" I called it in another comment. I can't remember what the
| proper term is.) to companies to customize the model so it
| already has all the context they need baked in to every query?
| notpachet wrote:
| This is available through Azure:
| https://azure.microsoft.com/en-us/products/cognitive-
| service...
| BoorishBears wrote:
| As far as I know it's not.
| sashank_1509 wrote:
| They used to offer exactly this for fine tuning models. Never
| offered it after ChatGPT, I think the difficulty comes with
| fine tuning RLHF models, not obvious how to correctly do
| this.
| mlyle wrote:
| You can ask multiple/multipart questions.
| heliophobicdude wrote:
| It's unfortunate. There are some online tutorials that instruct
| you to embed all your code and perform top-k cosine similarity
| searches, populating the responses accordingly.
|
| It's quite interesting if you can tweak your search just right.
| You can even use less tokens than 8K even!
| toxicFork wrote:
| The usage needs to be for high value queries.
|
| Using it on a simple conversation is not its intended purpose,
| that's like using a supercomputer to play pong.
| ta988 wrote:
| As a human if you have to present me 32k tokens and I have to
| give you an answer, you would probably have to pay me more than
| $2
| hanoz wrote:
| If I wanted to have a _conversation_ about it, and you wanted
| to charge me a flat fee per utterance on the basis that you
| had to reread the text anew every time, I wouldn 't be paying
| you at all.
| fieididuxue wrote:
| Will this be better than LoRA? 32k seems like a lot.
| heliophobicdude wrote:
| LoRA probably does not affect the models biggest bottle neck.
| The attention mechanism. Original transformer was O(n^2d) where
| n is the query length and d was the cardinal of all the tokens
| [deleted]
| rolisz wrote:
| This is completely separate from LoRA. This is how much stuff
| you can give it in the prompt. You can give it now whole
| chapters of books to summarize for example.
|
| LoRA is for adapting the model to a certain model. It usually
| means you need to give it shorter prompts, but for book
| summarization, it wouldn't help.
| sp332 wrote:
| LoRA doesn't change the context length. Also you can't run
| LoRAs on GPT-4. So those are not relevant to each other.
| jurdjir wrote:
| The distinction becomes less meaningful if you can hold large
| numbers of tokens in attention, doesn't it?
| sp332 wrote:
| I don't... think so. One of us is very confused.
| paulmendoza wrote:
| I feel like this just killed a few small startups who were trying
| to offer more context.
|
| Also, I pay for ChatGPT but I have none of the new features
| except for GPT4. Very frustrating.
| ShamelessC wrote:
| > I feel like this just killed a few small startups who were
| trying to offer more context.
|
| Those startups killed themselves. A 32K context was advertised
| as a feature to be rolled out the same day GPT-4 came out.
|
| Also - what startups are getting even remotely close to 32K
| context at GPT-4's parameter count? All I've seen is attempts
| to use KNN over a database to artificially improve long term
| recall.
| TeMPOraL wrote:
| Depends on the use case. Performance quickly tanks when you get
| to high token count; it's a slowdown I believe the various
| summarizers/context extenders mostly avoid.
|
| (Also UI probably tanks too. I dread what the OpenAI Playground
| will do when you start actually using 32k model for real, like
| throwing a 15k token long prompt at it. ChatGPT UI has no
| chance.)
| arbuge wrote:
| Same here:
| https://twitter.com/arbuge/status/1654288169397805057
|
| The really odd thing is that I was given GPT-4 with browsing
| alpha enabled - for a single session last week.
|
| As soon as I reloaded the page, it was gone. Since then the
| picture has reverted back to the above.
|
| Twitter has become a bit painful to read these days, with all
| the AI influencers posting about what GPT-4 and plugins, code
| interpreter etc. can do.
| MichaelZuo wrote:
| Considering that increasing context length is O(n^2), and that
| current 8k GPT-4 is already restricted to 25 prompts/3 hours, I
| think they will launch it at substantially higher pricing.
| totoglazer wrote:
| It's been available on Azure in preview. Pricing is double
| the 8K model.
| tempaccount420 wrote:
| > current 8k GPT-4 is already restricted to 25 prompts/3
| hours
|
| I'm pretty sure they're using a 4k GPT-4 model for ChatGPT
| Plus, even though they only announced 8k and 32k... It can't
| handle more than 4k of tokens (actually a little below that,
| starts ignoring your last few sentences if you get close). If
| you check developer tools, the request to an API /models
| endpoint says the limit for GPT-4 is 4096. It's very
| unfortunate.
| reaperman wrote:
| Ah this explains a lot. I couldn't understand why I
| couldn't get close to the ~12 pages that everyone was
| saying 8,000 tokens implied.
| tempaccount420 wrote:
| As far as I know it's not documented anywhere and there
| is no way to ask the team at ChatGPT questions. I sent
| them an email about it a few days after GPT-4 release and
| still haven't received a reply.
|
| Another thing that annoys me is how most updates don't
| get a changelog entry. For whatever reason, they keep
| little secrets like that.
| int_19h wrote:
| The raw chat log has the system message on top, plus
| "user:" and "assistant:" for each message, and
| im_start/im_end tokens to separate messages, hence why the
| visible chat context is slightly under 4k.
| cubefox wrote:
| O(n^2) seems unlikely:
|
| https://cognitiverevolution.substack.com/p/openais-
| foundry-l....
|
| https://news.ycombinator.com/item?id=34977194#:~:text=Sparse.
| ..
| MichaelZuo wrote:
| Your second link has the immediate comment "Gpt3 includes
| dense attention layers that are n^2". So it's not at all
| unlikely.
| space_fountain wrote:
| GPT3 was released 3 years ago now. There have been major
| advancements in scaling attention so it would be strange
| if they didn't use some of them
| Keyframe wrote:
| some of the context length will be lost to waste spent on
| truncated posts, or are replies not considered part of
| context on ChatGPT? In both cases, might be worth designing a
| prompt, every so often, to get a reply with which to re-
| establish the context, thus compressing it.
| choeger wrote:
| It will be interesting to see how far this quadratic
| algorithm carries in practice. Even the longest documents can
| only have hundreds of thousands of tokens, right?
| sebzim4500 wrote:
| Ideally you'd be able to put your entire codebase +
| documentation + jira tickets + etc. into the context. I
| think there is no practical limit to how many tokens would
| be useful for users, so the limits imposed by the model
| (either hard limits or just pricing) will always be a
| bottleneck.
| jtbayly wrote:
| I'm confused by this. Would you want to just include your
| codebase, documentation, etc. in some last-mile training?
| That way you don't need the expense of including huge
| amounts of context in every query. It's baked in.
| sdenton4 wrote:
| Yeah there's really three options here... Throw
| everything in context, fine tune, or add external search
| a la RETRO.
|
| The latter is definitely the cheapest option; updates are
| trivial.
| mlyle wrote:
| Yah... we really need some kind of architecture that
| juggles concept vectors around to external storage and
| does similarity search, etc, instead of forcing us to
| encode everything into giant tangles of coefficients.
|
| GPT-4 seems to show that linear algebra definitely can do
| the job, but training is so expensive and the model gets
| so huge and inflexible.
|
| It seems like having fixed format vectors of knowledge
| that the model can use-- denser and more precise than
| just incorporating tool results as tokens like OpenAI's
| plugin approach-- is a path forward towards extensibility
| and online learning.
| sebzim4500 wrote:
| I haven't tried this myself, but it is my understanding
| that finetuning does not work well in practice as a way
| of acquiring new knowledge.
|
| There may be a middle ground between these two approaches
| though. If every query used the same prompt prefix
| (because you only update the codebase + docs
| occasionally) then you could put it into the model once
| and cache the keys and values from the attention heads. I
| wonder if OpenAI does this with whatever prefix they use
| for ChatGPT?
| chillfox wrote:
| Same. No plugins or GPT-4 API for me despite signing up for the
| waiting lists on the day they were announced.
| saulpw wrote:
| Have you been using the API with GPT-3.5? I wonder if they're
| prioritizing access to 'active' users who appear to be trying
| to make something with it, over casual looky-loos.
| YetAnotherNick wrote:
| 32k context is $1.92 for each request.
| achandlerwhite wrote:
| Is it prorated for the actual context used for each request?
| [deleted]
| ZiiS wrote:
| Yes
| ZephyrBlu wrote:
| Yes, it's not a fixed price: https://openai.com/pricing.
| danjc wrote:
| It's exceedingly expensive and must surely come down over
| time.
| nico wrote:
| For reference, a dev making USD 100k/year and working about
| 240 days a year, 8 hours/day = total of 1920 hours, or about
| USD 52/hour, USD 416/day
|
| 52/1.92 = 27 416/1.92 = 217
|
| So using GPT-4 with 32k tokens, 27 times per hour, or 217
| times per day, in terms of cost, is approximately the
| equivalent of another dev
| KeplerBoy wrote:
| That's a lot of requests.
|
| Not that it matters for the calculation, but i wonder how
| long such a request (ingesting 32k tokens and responding
| with a similar amount) would take.
|
| At the speed of regular ChatGPT take would take a good
| while.
| atq2119 wrote:
| Batch processing scales quadratically with the context
| size (assuming OpenAI is still using standard transformer
| architecture) but the batch processing of the prompt is
| also fast compared to generating tokens because it's
| batched (parallel). So I wouldn't expect effective
| response times to go up quadratically. At most linearly,
| depending on the details of how they implement inference.
| ukuina wrote:
| FYI, 27 times per hour is basically nothing. With GPT4 over
| the API, I make 2-3 completion requests a minute, for 30-60
| minutes at a time, when building an LLM app. This happens
| for 3-4 hours per day.
|
| At the upper bound, this would be $2 * 3 * 60 * 4 = $1440 a
| day.
|
| Thankfully, I am using retriever-augmentation and context
| stuffing into the base 4k model, so costs are manageable.
|
| The 32k context model cannot be deployed into a production
| app at this pricing as a more capable drop-in replacement
| for shorter-context models.
| ZephyrBlu wrote:
| Depends heavily on your product. I can imagine there are
| quite a lot of use cases that have relatively infrequent
| API usage or highly cacheable responses.
| jerrygenser wrote:
| Paying for chatgpt I believe is separate from API access
| VeninVidiaVicii wrote:
| Again, frustrating. I'm an antibiotics researcher with oodles
| of data and I need ChatGPT plugins/API to make any real
| progress. (I'm kind of in this intellectual space on my own,
| so other people can't really help that much) I'm not sure why
| I've been on the waiting list for so long now.
| sashank_1509 wrote:
| I got access to ChatGPT plugins and they're really bad,
| completely deserving of "alpha". I'd be pissed if I paid
| 25$ for this fyi.
|
| It's very slow, almost 10X slower than ChatGPT
|
| It's integration is bad. For most plugins it doesn't do
| anything smart with its API call. For example if I ask
| "Nearest cheap International flight", it literally goes to
| Kayak and searches Nearest Cheap International Flight, if
| Kayak can't handle that query, GPT can't either.
|
| The only plug-in with good integration is Wolfram and it
| makes so many syntax errors calling Wolfram that it's
| thrash. Often it just syntax errors out for half my queries
|
| I wouldn't have minded if they spent a few more months
| internally testing plug-ins before rolling it out to me,
| seeing it's current state. The annoying thing is the chat
| website automatically starts at plugins mode which is
| borderline unusable. So every time I have to click on the
| drop-down and then choose ChatGPT or GPT4.
| VeninVidiaVicii wrote:
| Thanks for assuaging my FOMO a bit. I think one of the
| most frustrating parts is that everyone in my lab looks
| to me when they see this stuff on Twitter and all I can
| really do is shrug.
| JieJie wrote:
| I use the API for anything I can't do with Bing Chat, but
| I've found Bing Chat to be quite useful.
|
| For code, I use phind.com.
|
| https://www.phind.com/tutorial
| ZephyrBlu wrote:
| Dude, chill. Plugins are insanely new. Barely anyone has
| access to them. It just seems like they are widespread
| because they've been going viral.
|
| The initial blog post was only just over a month ago, and
| it was announcing _alpha_ access for a few users and
| developers:
|
| > _Today, we will begin extending plugin alpha access to
| users and developers from our waitlist. While we will
| initially prioritize a small number of developers and
| ChatGPT Plus users, we plan to roll out larger-scale access
| over time._
|
| https://openai.com/blog/chatgpt-plugins
|
| We are literally 1 month into the _alpha_ of plugins.
| mptest wrote:
| I think part of the anxiety, at least for me, is how fast
| progress is being made too. Can begin to feel like the
| "LET ME IN" meme, when you're watching all day the cool
| things those inside the magic shop can do lol. Layman btw
| just looking to use it to automate some volunteer work I
| do. Thanks for this perspective on how new this stuff is.
| ZephyrBlu wrote:
| I completely agree, I feel the same way as a dev. GPT-4
| is _not even 2 months old_.
|
| The developer livestream was on March 14th:
| https://www.youtube.com/live/outcGtbnMuQ?feature=share.
|
| The time since GPT-4 already feels something like 6
| months. So far I'm perpetually feeling behind.
| mptest wrote:
| Can't imagine trying to keep up as a dev. Any of these
| tools useful for you in practice yet?
|
| I struggle to keep up and all I need to do is understand
| developments well enough to simplify them in to palatable
| morsels for my tech skeptic colleagues in politics and
| non profits.
|
| Challenging because they have a form of technology PTSD.
| when they hear "new technology" nft's of monkeys with 6
| digit prices and peter thiel's yacht flash before their
| eyes and they see red.
|
| And I can't _really_ blame them, the rhetoric around
| crypto was enough to sour most non techies (in my little
| corner of lefty politics anyway) against the idea that
| any tech advancement is noteworthy. One of the first more
| serious individuals in politics to hear me out did so
| because "i sounded like one of the early linux
| proselytizers" lol.
|
| Completely agree how time has slowed. I rotate between
| absolute giddy anticipation at our future thanks to the
| tech and nihilistic doomerism. Even as a hobbyist though
| I knew to take this seriously since I saw robert miles
| talk about gpt 2 in 2017(?) and note there's zero sign of
| these things plateauing in ability simply by ramping up
| parameter count.
|
| I've gone on long enough but that live stream felt like
| the intro to a sci fi movie at points. Can't wait to have
| multi modal and plugins rolled out.
| VeninVidiaVicii wrote:
| I can't believe it's only been 1 month. It feels like 3-4
| somehow.
| nickthegreek wrote:
| Finally got gpt4 api access. Now I can cancel my ChatGPT plus
| sub and save a bunch of cash by just using a local client.
| maxdaten wrote:
| It is. For API access you have to create an account at
| https://platform.openai.com. You pay per 1k token. For API
| access to GPT-4 put your organization (org id) on the
| waitlist.
| danjc wrote:
| Try OpenAI services in Azure. We were added to a waitlist but
| got approved a week later. Had 32k for a few weeks now but
| still on the waitlist for plugins.
| HarHarVeryFunny wrote:
| MosaicML StoryWriter 65K model just released a day or two ago.
| chrisMyzel wrote:
| https://www.mosaicml.com/blog/mpt-7b 65k+ context window,
| open source, open weights
| toxicFork wrote:
| It's Hella expensive so I think they are ok for now
|
| Until they cut down the cost then they should worry yeah
| fakedang wrote:
| Honestly for the firms that would use it, for example finance
| or legal, it's very reasonable.
| fbrncci wrote:
| Those startups will move on to open source models because
| OpenAI api calls with 32k token contexts are way too expensive.
| raincole wrote:
| I don't think it's expensive at all. For things that don't
| need to be so correct (like, unfortunately, marketing blog
| posts) it's a <$1 per post generator, which is very cheap to
| me.
|
| For things where correctness matters, the majority of cost
| will still come from humans who are in charge of ensuring
| correctness.
| danjc wrote:
| Use cases for individual people are ok but it's far too
| expensive to deploy into your SaaS where a large number of
| users will use it.
| fbrncci wrote:
| This does not scale. With an open source model the cost per
| post would be 0$ if we leave out the hardware.
| cced wrote:
| What is the latest in conversational models that allow GPT3
| like (or close) performance w.r.t running things locally?
| modernpink wrote:
| GPT3 is dated so many open source models are competitive
| with it, but Vicuna 13b is supposed to be competitive with
| GPT4
| speedgoose wrote:
| Against GPT3.5 perhaps the gaps aren't too big for your
| use cases, but I wouldn't say it's in the GPT4 league. It
| looks close in the benchmarks but the difference in
| quality feels (to me) huge in practice. The others models
| are simply a lot worse.
| modernpink wrote:
| Interesting. Have you tried StableVicuna?
| speedgoose wrote:
| No, is it worth a try? I didn't see a lot of hype about
| it so I didn't try it.
| noman-land wrote:
| Apparently Vicuna 13B is quite good according to Google's
| own leaked docs.
|
| https://twitter.com/jelleprins/status/1654197282311491592
| space_fountain wrote:
| That's according to this
| (https://lmsys.org/blog/2023-03-30-vicuna/) promotional
| blog post and just cited by the google memo right? Which
| isn't really even a doc, just a memo that was circulating
| inside google.
|
| I also find it strange they don't contrast gpt4 and
| gpt3.5
| amelius wrote:
| Is there a way to always stay up to date with the latest
| and best performing models? Perhaps it's me but I find it
| difficult to navigate HuggingFace and find models sorted
| by benchmark.
| nickthegreek wrote:
| I check r/LocalLlama
| noman-land wrote:
| Honestly, I just read hackernews :).
| amelius wrote:
| HN posts are not always in chronological order.
| int_19h wrote:
| This assessment is based largely on GPT-4 evaluation of
| the output. In actual use, Vicuna-13B isn't even as good
| as GPT-3.5, although I do have high hopes for 30B if and
| when they decide to make that available (or someone else
| trains it, since the dataset is out).
|
| And don't forget that all the LLaMA-based models only
| have 2K context size. It's good enough for random chat,
| but you quickly bump into it for any sort of complicated
| task solving or writing code. Increasing this to 4K -
| like GPT-3.5 has - would require significantly more RAM
| for the same model size.
| decompiled_dev wrote:
| I was waiting for a while, but then I found there was a page
| where if you selected "I want to build plugins" then you would
| have never seen the option to request them.
|
| Once I filled that in I got access within a few days.
|
| https://openai.com/waitlist/plugins
|
| If you are the person to say: "I am a developer and want to
| build a plugin"
|
| Then it is likely you missed the option to request which
| plugins you want access to.
| tough wrote:
| OK thanks for the pointer I submited there for every plugin
| now as a not-developer to see if that helps.
|
| I stopped paying the pro because without plugins it didnt do
| that much tbh
| kinlan wrote:
| Yeah. It's weird, I've signed up when it came up
| and....nothing. :(
| virgildotcodes wrote:
| Does this purely affect the amount of tokens that can be fed in
| and retained in context during a session?
|
| The output from that prompt seems spectacular, so I'm wondering
| if there are any other differences.
|
| I just tried the same prompt with GPT-4 and the style was much
| more GPT-like, what I'm used to, not near the same quality as in
| the OP, although maybe it's just luck?
| arbuge wrote:
| The prompts in the example on that page are pretty short...
| hardly taking advantage of the longer context window.
|
| I'm actually not sure if longer responses can be expected with
| the 32k vs 8k models. Anyone from OpenAI care to comment on
| that?
| vorticalbox wrote:
| The token limit is for the whole conversation, system,
| assistant and user messages all count.
|
| Once the limit is hit it will stop, sometimes mid sentence.
|
| I have a slack bot at work with a system document which is a
| little over 1k tokens, meaning there is around 3k tokens left
| for questions and replies.
|
| Trick I am currently doing is to prune older messages to keep
| it under the limit.
| heliophobicdude wrote:
| Can you please elaborate on how you prune?
| vorticalbox wrote:
| you can just use the API, if you set completions to 0 it
| will return the token count. Then you can just remove the
| oldest message until it's under any number. I picked 3k
| to allow 1k for the reply
| akiselev wrote:
| You can also use the `tiktoken` python library:
| import tiktoken len(tiktoken.encoding_for_model(
| "gpt-4").encode(contents))
| dmix wrote:
| > Once the limit is hit it will stop, sometimes mid
| sentence.
|
| Oh that's why GPT does that in a long thread.
|
| It makes sense in retrospect it's the whole conversation
| not the individual messages.
|
| ChatGPT UX has much to be desired, there should be error
| messages that communicate this stuff better.
| mlyle wrote:
| There's the token limit-- the maximum number of tokens to
| respond.
|
| There's also the token context: how many words into the
| "past" it considers when formulating the next word.
|
| They're different things. You can generate very, very long
| responses with a model with a short context window; it just
| will have amnesia about what it said earlier-- though
| OpenAI often seems to restrict you / prevent you from
| having context scroll off in this way.
| JonAtkinson wrote:
| Does anyone have any examples of promoting to feed such a large
| amount of tokens? For example, would you use something like "I am
| going to send you an entire codebase, with the filename and path,
| followed by the file content. Here is the first of 239 files:
| ..."
| noonething wrote:
| You could see the legal impact of your actions before you take
| them. You could template out an Operating system and have it
| fill in the blanks. You could rewrite entire literary arts, in
| the authors style, to cater to your reading style or story
| preferences.
| dested wrote:
| Here's someone that passed a 23 page congressional document:
|
| https://twitter.com/SullyOmarr/status/1654576774976770048
| mysterydip wrote:
| Sounds like a return to the days of download managers or file
| splitters :)
| fragsworth wrote:
| I think what's more important is the end part of your prompt,
| which explains what was previously described and includes a
| request/question.
| danielbln wrote:
| I've had access to the 32k model for a bit and I've been using
| this to collect and stuff codebases into the context:
| https://github.com/mpoon/gpt-repository-loader
|
| It works really well, you can tell it to implement new features
| or mutate parts of the code and it having the entire (or a lot
| of) the code in its context really improves the output.
|
| The biggest caveat: shit is expensive! A full 32k token request
| will run you like $2, if you do dialog back and forth you can
| rack up quite the bill quickly. If it was 10x cheaper, I would
| use nothing else, having a large context window is that much of
| a game changer. As it stands, I _very_ carefully construct the
| prompt and move the conversation out of the 32k into the 8k
| model as fast as I can to save cost.
| bamboozled wrote:
| Do you use it for proprietary code and if so, you don't feel
| weird about it ?
| RivieraKid wrote:
| I wouldn't feel weird about it. The risks - someone
| stealing know-how / someone finding a security hole - are
| negligible.
| danielbln wrote:
| Not weirder than using Github, or Outlook or Slack or
| whatever.
| guzik wrote:
| How it calculates the price? I thought that once you load the
| content (32k token request / 2$) it will remember the context
| so you can ask questions much cheaper.
| mediaman wrote:
| It does not have memory outside the context window. If you
| want to have a back-and-forth with it about a document,
| that document must be provided in the context (along with
| your other relevant chat history) with every request.
|
| This is why it's so easy to burn up lots of tokens very
| fast.
| r3trohack3r wrote:
| I already do this with the current context limits. I include a
| few of my relevant source files before my prompt in ChatGPT. It
| work's unreasonably well.
|
| Something like the following
|
| Here is the Template class:
|
| ...
|
| Here is an example component:
|
| ...
|
| Here is an example Input element:
|
| ...
|
| I need to create another input element that allows me to select
| a number from a drop down between 1/32 and 24/32 in 1/32
| increments
| ChildOfChaos wrote:
| I'd just like to see GPT4 more available, even on the free
| chatGPT, although I wonder if that will ever fully come with
| ChatGPT getting so much use and GPT 3.5 being cheaper to run.
|
| Plus seems expensive to me and it is still rate limited quite a
| lot.
|
| I guess it's going to take further optimisation to make it
| worthwhile for OpenAI.
| Mistletoe wrote:
| I love the prompt and yes it does very well at mimicking DFW.
| It's kind of weirding me out in a Year of the Trial-Size Dove Bar
| kind of way.
| isoprophlex wrote:
| That example is really shockingly good, indeed. I'm not always
| convinced that GPTs can be properly artistic, most things lack
| soul and read like some rambling Dan Brown on amphetamine...
| this DFW works very well.
|
| It gave me the same vague feeling of annoyance and disgust at
| the "look how smart I am" linguistic obstreperousness I get
| when reading the real deal.
| hackernewds wrote:
| the output is rather fantastic it's mind numbing. would DFW
| think this is a solid example of prose?
| cubefox wrote:
| I thought it was already available for a while on Microsoft
| Azure?
| zb3 wrote:
| Of course I still don't even have the basic GPT-4 after nearly
| two months of waiting.
| phillipcarter wrote:
| What's your usage and stated use case? I got access for my
| company account, but I'm pretty sure that's because we've built
| and shipped product using their API.
| zb3 wrote:
| I applied for personal use, I stated that I'd like to
| experiment with its coding abilities. Yeah it seems that they
| prioritized companies making GPT-4 products first.
| Stagnant wrote:
| I joined the GPT4 waitlist 2 or 3 days after it was
| released (around mid-march) and finally got access last
| week. I also applied for personal use and wrote one or two
| sentences about wanting to experiment / compare it to other
| models. So they definitely do give the API access to
| regular folks as well, no idea how they prioritize it
| though. I've been a paying customer of ChatGPT plus for
| three months now which might have helped.
| halfjoking wrote:
| I applied the first day and just got API access a few days ago.
|
| It is strange they can roll out 32k for some, while not even
| having 8k for everyone yet.
| throwaway50606 wrote:
| You pay for Plus and don't have it? Maybe try canceling and
| subscribing on another account. I got it immediately after I
| subscribed.
| zb3 wrote:
| No, I'm talking about the API waitlist. Still, I'm using
| https://nat.dev/ to access GPT-4 so I don't care that much
| anymore.
| itsgrimetime wrote:
| I signed up for API use as soon as it was released and just
| got access yesterday, so they're still rolling it out it I
| guess.
| gwd wrote:
| Bummer -- I got access after 2-3 weeks.
|
| I haven't actually even hit the 8k limit yet, and even
| experimenting with 32k is pretty expensive, so I'm not sure
| what I'd do with it.
| GordonS wrote:
| I tried to checkout nat.dev, but it wants me to create an
| account to see anything at all. What is nat.dev please?
| zb3 wrote:
| This is a paid LLM playground site with many models
| including GPT-4 and Claude-v1.
| ren_engineer wrote:
| I've now got access to GPT-4-0314, anybody know the difference
| between that and the 32k model in this post?
| mirekrusin wrote:
| 8k vs 32k context.
| ec109685 wrote:
| Yours is a snapshot of the 8k token context taken on March
| 14th.
| pjot wrote:
| Does the May 3rd release use the 32k token model?
| ec109685 wrote:
| No, they would call that out specifically in the model
| name. It's just a further snapshot so you don't have to
| jump straight to the next finetuned version without testing
| your app.
| eh9 wrote:
| Do the output limits change? If I give it an entire codebase,
| could it potentially spit out an entire codebase?
|
| I'm wondering if this is a quick way to get to $5 each round trip
| (send and receive)
| capableweb wrote:
| > A helpful rule of thumb is that one token generally corresponds
| to ~4 characters of text for common English text. This translates
| to roughly 3/4 of a word (so 100 tokens ~= 75 words) -
| https://platform.openai.com/tokenizer
|
| 32,000 (tokens) * 4 = 128,000 (characters)
|
| > While a general guideline is one page is 500 words (single
| spaced) or 250 words (double spaced), this is a ballpark figure -
| https://wordcounter.net/words-per-page
|
| Assuming (on average) one word = 5 letters, context ends up being
| ~50 pages (128000 / (500 * 5)).
|
| Just to put the number of "32k tokens" into somewhat estimated
| context.
| s-macke wrote:
| This is a better calculator, because it also supports GPT-4:
|
| https://tiktokenizer.vercel.app/
|
| Not sure, why they don't support GPT-4 on their own website.
| modernpink wrote:
| At $0.60 for 20k prompt tokens, it's not going to be cheap so
| they will need to bring the price down to get broader
| adoption.
|
| As far as I can tell, the initial reading in of the document
| (let's say a 20k token one), will be a repeated cost for each
| subsequent query over the document. If I have a 20k token
| document, and ask 10 follow-up prompts consisting of 100
| tokens, that would take me to a total spend of 20k * 10 + (10
| * 11)/2 * 100 = 205,500 prompt tokens, or over $6. This does
| not include completion tokens or the response history which
| would edge us closer to $8 for the chat session.
| 1024core wrote:
| > At $0.60 for 20k prompt tokens
|
| Is it $0.60 even for the input tokens?
| dragonwriter wrote:
| Yes, its $0.03/1K for prompt and $0.06/1K for response,
| which is $0.6/20K for prompt, $1.2/20K for response.
| smodo wrote:
| What I've read is that people let a kind of meta-chat run
| alongside the client interaction. The meta channel decides
| what parts of the history to retain so the primary channel
| doesn't use as many resources. You could let GPT decide
| when it needs to see the whole context again etc. There's a
| lot of interesting ways to let GPT handle it's own scope, I
| think.
| modernpink wrote:
| Is this anything like what Langchain supports which is
| various strategies for chat history compression
| (summarisation)?
|
| https://www.pinecone.io/learn/langchain-conversational-
| memor...
|
| But aside from this, if I have a large document that I
| want to "chat" with, it looks like I am either chunking
| it and then selectively retrieving relevant subsections
| at question time, or I am naively dumping the whole
| document (that now fits in 32k) and then doing the chat,
| at a high cost. So 32k (and increased context size in
| general) does not look to be a huge gamechanger in
| patterns of use until cost comes down by an order of
| magnitude or two.
| smodo wrote:
| Yes, that's one example. Relatively simple to implement
| yourself. Any frontend for chatgpt already does the basic
| thing, which is to pass the previous messages along with
| the prompt.
|
| I think we may end up with first and second stage
| completions, where the first stage prepares the context
| for the second stage. The first stage can be a (tailored)
| gpt3.5 and the second stage can do the brainy work. That
| way you can actively control costs by making the first
| stage forward a context of a given maximum size.
| bravura wrote:
| Is this a native feature? Or something home brewed? If
| the second, how to use it?
| happycube wrote:
| I forget if Langchain can do that, but something along
| those lines _will_ exist if it doesn 't already, it's too
| obvious not to, and too important for the free ChatGPT
| equivalents which will be popping up over a matter of
| days/weeks now that truly free versions of llama are
| coming out.
|
| TL;DR the puck should get there _very soon_ , not just
| Really Soon Now.
| smodo wrote:
| Right now all of this is people tinkering with the API.
| If you look at the docs you will note that it doesn't
| even provide chat history or any kind of session. You
| have to pass all context yourself. So you're already
| homebrewing that, why not add some spice.
| skybrian wrote:
| If it's not any faster, I'm thinking that how long you're
| willing to wait for an answer will be the practical bottleneck?
|
| 38 seconds for that example.
| sdo72 wrote:
| 32k tokens, 3/4 of 32k is 24k words, each page average is 500
| or 0.5k words, so that's basically 24k / .5k = 24 x 2 =~48
| pages.
| capableweb wrote:
| That's great, imagine if more researchers were as excited to
| reproduce results as you are? The world would be much better.
| elashri wrote:
| probably off topic, but part of this is "Good luck writing
| a grant proposal that says I want to reproduce the work of
| this group and get it accepted". Unless of course you are
| claiming ground breaking paradigm shift like evidence of
| unified theory or super-conductivity at room temperature.
| sdwr wrote:
| This comes across as patronizing to me. "Who's a good
| little verifier. You are!"
| tysam_and wrote:
| Forgive my ignorance, but what is the relevance of this to
| the above comment?
| capableweb wrote:
| We both arrived at mostly the same result from the same
| question of "how many pages of text would 32k tokens
| be?", they basically did the calculation again albeit in
| a slightly different way. Just like when researchers try
| to reproduce the results of other's studies.
| cmonnow wrote:
| [dead]
| dmix wrote:
| I wonder how many LoC that is on average. Is there an average
| for LoC? It's probably based on the language...
| MacsHeadroom wrote:
| A LoC is ~7 tokens thanks to the new toktoken tokenization in
| GPT-4.
|
| 32k is ~4.5k LoC
| ethbr0 wrote:
| Finally, APL's day is here!
| https://en.m.wikipedia.org/wiki/APL_(programming_language)
| wongarsu wrote:
| Testing some random rust code, it's about 15-20 tokens per
| LoC, so about 1500-2000 LoC in a 32k context.
|
| Interestingly, using 2-space indentation, as soon as you are
| about 3-4 indentation levels deep you spend as many tokens on
| indentation as on the actual code. For example,
| "log::LevelFilter::Info" is 6 tokens, same as 6 consecutive
| spaces. There are probably a lot of easy gains here
| reformatting your code to use longer lines or maybe no
| indentation at all.
| JimDabell wrote:
| > There are probably a lot of easy gains here reformatting
| your code to use longer lines or maybe no indentation at
| all.
|
| Using tabs for indentation needs fewer tokens than using
| multiple spaces for indentation.
| mafuy wrote:
| Curious how this point in the tabs vs. spaces debate re-
| emerges
| ec109685 wrote:
| Make sure you are testing with the tiktokenizer:
| https://news.ycombinator.com/item?id=35460648
| wongarsu wrote:
| Ah, good catch, it's actually closer to 8 tokens per LoC
| with GPT4's toktoken, so about twice as good. Some quick
| testing suggests that's mostly down to better whitespace
| handling.
| omega3 wrote:
| How do this even work if the code also uses external
| libraries?
| BaculumMeumEst wrote:
| if you wanted to send 32k tokens of code, are you able to do
| that using a model with a 4k context limit by spreading those
| tokens out across multiple messages? or does it not work that
| way?
| dragonwriter wrote:
| The context limit is for request + response, and there is no
| storage in between requests (ongoing chat interactions are
| done by _adding_ prior interactions to the prompt, so the
| whole chat - before things start falling out of history - is
| limited to the context window.)
| simonbw wrote:
| Not really. The API is stateless, you pass it in a whole
| conversation and it responds with the next message. The
| entire conversation _including its response_ is limited to
| 32k tokens.
| BaculumMeumEst wrote:
| i'm just confused because i thought i remembered sending
| long chunks of code using the api, the request will fail,
| but then i would split it up and then it would work okay.
|
| i guess i'm running into a different limit (not context
| length), or maybe i'm misremembering
| user_named wrote:
| 0.75 * 32,000 = 24,000 words is faster and more direct
| capableweb wrote:
| Thanks, math was never my strong suit and I was writing the
| comment as I was calculating towards the results, never
| refined and raw, as it should :)
| sva_ wrote:
| Almost $2 if you want to use the full context length (32 * $.06).
| Yikes.
| teaearlgraycold wrote:
| :D
___________________________________________________________________
(page generated 2023-05-06 23:01 UTC)