[HN Gopher] It looks like GPT-4-32k is rolling out
       ___________________________________________________________________
        
       It looks like GPT-4-32k is rolling out
        
       Author : freediver
       Score  : 228 points
       Date   : 2023-05-06 13:59 UTC (9 hours ago)
        
 (HTM) web link (community.openai.com)
 (TXT) w3m dump (community.openai.com)
        
       | m3kw9 wrote:
       | Waiting for gpt4 turbo
        
         | vl wrote:
         | What is turbo in this context?
        
           | m3kw9 wrote:
           | Gpt3.5 turbo it was like 10x cheaper than a similar version.
           | Gpt4 is like 100x more expensive then 3.5 turbo
        
           | moffkalast wrote:
           | A model optimized for inference speed instead of raw
           | accuracy.
        
         | richardanaya wrote:
         | Same .. gpt-4 is a very harsh trade off of time for products.
        
       | hanoz wrote:
       | The more token capacity that's added the more wasteful it seems
       | to have to use this statelessly. Is there any avoiding this?
       | 
       | Wonderous as this new tech is, it seems a bit much to be paying
       | $2 a question in a conversation about a 32k token text.
        
         | weird-eye-issue wrote:
         | Handle the state on the application side...
         | 
         | It is like complaining that HTTP is limiting because it is
         | stateless. Build state on top of it.
        
           | delusional wrote:
           | I think he's talking about computational efficiency. If
           | you're loading in 29k tokens and you're expecting to use
           | those again, you wouldn't need to do the whole matrix
           | multiplication song and dance again if you just kept the old
           | buffers around for the next prompt.
        
             | weird-eye-issue wrote:
             | I don't think this can necessarily be optimized at least
             | with how the models work right now
        
         | jtbayly wrote:
         | Yeah, I can see this being useful for one-off queries, but
         | don't they want to offer some sort of final training ("last-
         | mile" I called it in another comment. I can't remember what the
         | proper term is.) to companies to customize the model so it
         | already has all the context they need baked in to every query?
        
           | notpachet wrote:
           | This is available through Azure:
           | https://azure.microsoft.com/en-us/products/cognitive-
           | service...
        
             | BoorishBears wrote:
             | As far as I know it's not.
        
           | sashank_1509 wrote:
           | They used to offer exactly this for fine tuning models. Never
           | offered it after ChatGPT, I think the difficulty comes with
           | fine tuning RLHF models, not obvious how to correctly do
           | this.
        
         | mlyle wrote:
         | You can ask multiple/multipart questions.
        
         | heliophobicdude wrote:
         | It's unfortunate. There are some online tutorials that instruct
         | you to embed all your code and perform top-k cosine similarity
         | searches, populating the responses accordingly.
         | 
         | It's quite interesting if you can tweak your search just right.
         | You can even use less tokens than 8K even!
        
         | toxicFork wrote:
         | The usage needs to be for high value queries.
         | 
         | Using it on a simple conversation is not its intended purpose,
         | that's like using a supercomputer to play pong.
        
         | ta988 wrote:
         | As a human if you have to present me 32k tokens and I have to
         | give you an answer, you would probably have to pay me more than
         | $2
        
           | hanoz wrote:
           | If I wanted to have a _conversation_ about it, and you wanted
           | to charge me a flat fee per utterance on the basis that you
           | had to reread the text anew every time, I wouldn 't be paying
           | you at all.
        
       | fieididuxue wrote:
       | Will this be better than LoRA? 32k seems like a lot.
        
         | heliophobicdude wrote:
         | LoRA probably does not affect the models biggest bottle neck.
         | The attention mechanism. Original transformer was O(n^2d) where
         | n is the query length and d was the cardinal of all the tokens
        
         | [deleted]
        
         | rolisz wrote:
         | This is completely separate from LoRA. This is how much stuff
         | you can give it in the prompt. You can give it now whole
         | chapters of books to summarize for example.
         | 
         | LoRA is for adapting the model to a certain model. It usually
         | means you need to give it shorter prompts, but for book
         | summarization, it wouldn't help.
        
         | sp332 wrote:
         | LoRA doesn't change the context length. Also you can't run
         | LoRAs on GPT-4. So those are not relevant to each other.
        
           | jurdjir wrote:
           | The distinction becomes less meaningful if you can hold large
           | numbers of tokens in attention, doesn't it?
        
             | sp332 wrote:
             | I don't... think so. One of us is very confused.
        
       | paulmendoza wrote:
       | I feel like this just killed a few small startups who were trying
       | to offer more context.
       | 
       | Also, I pay for ChatGPT but I have none of the new features
       | except for GPT4. Very frustrating.
        
         | ShamelessC wrote:
         | > I feel like this just killed a few small startups who were
         | trying to offer more context.
         | 
         | Those startups killed themselves. A 32K context was advertised
         | as a feature to be rolled out the same day GPT-4 came out.
         | 
         | Also - what startups are getting even remotely close to 32K
         | context at GPT-4's parameter count? All I've seen is attempts
         | to use KNN over a database to artificially improve long term
         | recall.
        
         | TeMPOraL wrote:
         | Depends on the use case. Performance quickly tanks when you get
         | to high token count; it's a slowdown I believe the various
         | summarizers/context extenders mostly avoid.
         | 
         | (Also UI probably tanks too. I dread what the OpenAI Playground
         | will do when you start actually using 32k model for real, like
         | throwing a 15k token long prompt at it. ChatGPT UI has no
         | chance.)
        
         | arbuge wrote:
         | Same here:
         | https://twitter.com/arbuge/status/1654288169397805057
         | 
         | The really odd thing is that I was given GPT-4 with browsing
         | alpha enabled - for a single session last week.
         | 
         | As soon as I reloaded the page, it was gone. Since then the
         | picture has reverted back to the above.
         | 
         | Twitter has become a bit painful to read these days, with all
         | the AI influencers posting about what GPT-4 and plugins, code
         | interpreter etc. can do.
        
         | MichaelZuo wrote:
         | Considering that increasing context length is O(n^2), and that
         | current 8k GPT-4 is already restricted to 25 prompts/3 hours, I
         | think they will launch it at substantially higher pricing.
        
           | totoglazer wrote:
           | It's been available on Azure in preview. Pricing is double
           | the 8K model.
        
           | tempaccount420 wrote:
           | > current 8k GPT-4 is already restricted to 25 prompts/3
           | hours
           | 
           | I'm pretty sure they're using a 4k GPT-4 model for ChatGPT
           | Plus, even though they only announced 8k and 32k... It can't
           | handle more than 4k of tokens (actually a little below that,
           | starts ignoring your last few sentences if you get close). If
           | you check developer tools, the request to an API /models
           | endpoint says the limit for GPT-4 is 4096. It's very
           | unfortunate.
        
             | reaperman wrote:
             | Ah this explains a lot. I couldn't understand why I
             | couldn't get close to the ~12 pages that everyone was
             | saying 8,000 tokens implied.
        
               | tempaccount420 wrote:
               | As far as I know it's not documented anywhere and there
               | is no way to ask the team at ChatGPT questions. I sent
               | them an email about it a few days after GPT-4 release and
               | still haven't received a reply.
               | 
               | Another thing that annoys me is how most updates don't
               | get a changelog entry. For whatever reason, they keep
               | little secrets like that.
        
             | int_19h wrote:
             | The raw chat log has the system message on top, plus
             | "user:" and "assistant:" for each message, and
             | im_start/im_end tokens to separate messages, hence why the
             | visible chat context is slightly under 4k.
        
           | cubefox wrote:
           | O(n^2) seems unlikely:
           | 
           | https://cognitiverevolution.substack.com/p/openais-
           | foundry-l....
           | 
           | https://news.ycombinator.com/item?id=34977194#:~:text=Sparse.
           | ..
        
             | MichaelZuo wrote:
             | Your second link has the immediate comment "Gpt3 includes
             | dense attention layers that are n^2". So it's not at all
             | unlikely.
        
               | space_fountain wrote:
               | GPT3 was released 3 years ago now. There have been major
               | advancements in scaling attention so it would be strange
               | if they didn't use some of them
        
           | Keyframe wrote:
           | some of the context length will be lost to waste spent on
           | truncated posts, or are replies not considered part of
           | context on ChatGPT? In both cases, might be worth designing a
           | prompt, every so often, to get a reply with which to re-
           | establish the context, thus compressing it.
        
           | choeger wrote:
           | It will be interesting to see how far this quadratic
           | algorithm carries in practice. Even the longest documents can
           | only have hundreds of thousands of tokens, right?
        
             | sebzim4500 wrote:
             | Ideally you'd be able to put your entire codebase +
             | documentation + jira tickets + etc. into the context. I
             | think there is no practical limit to how many tokens would
             | be useful for users, so the limits imposed by the model
             | (either hard limits or just pricing) will always be a
             | bottleneck.
        
               | jtbayly wrote:
               | I'm confused by this. Would you want to just include your
               | codebase, documentation, etc. in some last-mile training?
               | That way you don't need the expense of including huge
               | amounts of context in every query. It's baked in.
        
               | sdenton4 wrote:
               | Yeah there's really three options here... Throw
               | everything in context, fine tune, or add external search
               | a la RETRO.
               | 
               | The latter is definitely the cheapest option; updates are
               | trivial.
        
               | mlyle wrote:
               | Yah... we really need some kind of architecture that
               | juggles concept vectors around to external storage and
               | does similarity search, etc, instead of forcing us to
               | encode everything into giant tangles of coefficients.
               | 
               | GPT-4 seems to show that linear algebra definitely can do
               | the job, but training is so expensive and the model gets
               | so huge and inflexible.
               | 
               | It seems like having fixed format vectors of knowledge
               | that the model can use-- denser and more precise than
               | just incorporating tool results as tokens like OpenAI's
               | plugin approach-- is a path forward towards extensibility
               | and online learning.
        
               | sebzim4500 wrote:
               | I haven't tried this myself, but it is my understanding
               | that finetuning does not work well in practice as a way
               | of acquiring new knowledge.
               | 
               | There may be a middle ground between these two approaches
               | though. If every query used the same prompt prefix
               | (because you only update the codebase + docs
               | occasionally) then you could put it into the model once
               | and cache the keys and values from the attention heads. I
               | wonder if OpenAI does this with whatever prefix they use
               | for ChatGPT?
        
         | chillfox wrote:
         | Same. No plugins or GPT-4 API for me despite signing up for the
         | waiting lists on the day they were announced.
        
           | saulpw wrote:
           | Have you been using the API with GPT-3.5? I wonder if they're
           | prioritizing access to 'active' users who appear to be trying
           | to make something with it, over casual looky-loos.
        
         | YetAnotherNick wrote:
         | 32k context is $1.92 for each request.
        
           | achandlerwhite wrote:
           | Is it prorated for the actual context used for each request?
        
             | [deleted]
        
             | ZiiS wrote:
             | Yes
        
             | ZephyrBlu wrote:
             | Yes, it's not a fixed price: https://openai.com/pricing.
        
           | danjc wrote:
           | It's exceedingly expensive and must surely come down over
           | time.
        
           | nico wrote:
           | For reference, a dev making USD 100k/year and working about
           | 240 days a year, 8 hours/day = total of 1920 hours, or about
           | USD 52/hour, USD 416/day
           | 
           | 52/1.92 = 27 416/1.92 = 217
           | 
           | So using GPT-4 with 32k tokens, 27 times per hour, or 217
           | times per day, in terms of cost, is approximately the
           | equivalent of another dev
        
             | KeplerBoy wrote:
             | That's a lot of requests.
             | 
             | Not that it matters for the calculation, but i wonder how
             | long such a request (ingesting 32k tokens and responding
             | with a similar amount) would take.
             | 
             | At the speed of regular ChatGPT take would take a good
             | while.
        
               | atq2119 wrote:
               | Batch processing scales quadratically with the context
               | size (assuming OpenAI is still using standard transformer
               | architecture) but the batch processing of the prompt is
               | also fast compared to generating tokens because it's
               | batched (parallel). So I wouldn't expect effective
               | response times to go up quadratically. At most linearly,
               | depending on the details of how they implement inference.
        
             | ukuina wrote:
             | FYI, 27 times per hour is basically nothing. With GPT4 over
             | the API, I make 2-3 completion requests a minute, for 30-60
             | minutes at a time, when building an LLM app. This happens
             | for 3-4 hours per day.
             | 
             | At the upper bound, this would be $2 * 3 * 60 * 4 = $1440 a
             | day.
             | 
             | Thankfully, I am using retriever-augmentation and context
             | stuffing into the base 4k model, so costs are manageable.
             | 
             | The 32k context model cannot be deployed into a production
             | app at this pricing as a more capable drop-in replacement
             | for shorter-context models.
        
               | ZephyrBlu wrote:
               | Depends heavily on your product. I can imagine there are
               | quite a lot of use cases that have relatively infrequent
               | API usage or highly cacheable responses.
        
         | jerrygenser wrote:
         | Paying for chatgpt I believe is separate from API access
        
           | VeninVidiaVicii wrote:
           | Again, frustrating. I'm an antibiotics researcher with oodles
           | of data and I need ChatGPT plugins/API to make any real
           | progress. (I'm kind of in this intellectual space on my own,
           | so other people can't really help that much) I'm not sure why
           | I've been on the waiting list for so long now.
        
             | sashank_1509 wrote:
             | I got access to ChatGPT plugins and they're really bad,
             | completely deserving of "alpha". I'd be pissed if I paid
             | 25$ for this fyi.
             | 
             | It's very slow, almost 10X slower than ChatGPT
             | 
             | It's integration is bad. For most plugins it doesn't do
             | anything smart with its API call. For example if I ask
             | "Nearest cheap International flight", it literally goes to
             | Kayak and searches Nearest Cheap International Flight, if
             | Kayak can't handle that query, GPT can't either.
             | 
             | The only plug-in with good integration is Wolfram and it
             | makes so many syntax errors calling Wolfram that it's
             | thrash. Often it just syntax errors out for half my queries
             | 
             | I wouldn't have minded if they spent a few more months
             | internally testing plug-ins before rolling it out to me,
             | seeing it's current state. The annoying thing is the chat
             | website automatically starts at plugins mode which is
             | borderline unusable. So every time I have to click on the
             | drop-down and then choose ChatGPT or GPT4.
        
               | VeninVidiaVicii wrote:
               | Thanks for assuaging my FOMO a bit. I think one of the
               | most frustrating parts is that everyone in my lab looks
               | to me when they see this stuff on Twitter and all I can
               | really do is shrug.
        
               | JieJie wrote:
               | I use the API for anything I can't do with Bing Chat, but
               | I've found Bing Chat to be quite useful.
               | 
               | For code, I use phind.com.
               | 
               | https://www.phind.com/tutorial
        
             | ZephyrBlu wrote:
             | Dude, chill. Plugins are insanely new. Barely anyone has
             | access to them. It just seems like they are widespread
             | because they've been going viral.
             | 
             | The initial blog post was only just over a month ago, and
             | it was announcing _alpha_ access for a few users and
             | developers:
             | 
             | > _Today, we will begin extending plugin alpha access to
             | users and developers from our waitlist. While we will
             | initially prioritize a small number of developers and
             | ChatGPT Plus users, we plan to roll out larger-scale access
             | over time._
             | 
             | https://openai.com/blog/chatgpt-plugins
             | 
             | We are literally 1 month into the _alpha_ of plugins.
        
               | mptest wrote:
               | I think part of the anxiety, at least for me, is how fast
               | progress is being made too. Can begin to feel like the
               | "LET ME IN" meme, when you're watching all day the cool
               | things those inside the magic shop can do lol. Layman btw
               | just looking to use it to automate some volunteer work I
               | do. Thanks for this perspective on how new this stuff is.
        
               | ZephyrBlu wrote:
               | I completely agree, I feel the same way as a dev. GPT-4
               | is _not even 2 months old_.
               | 
               | The developer livestream was on March 14th:
               | https://www.youtube.com/live/outcGtbnMuQ?feature=share.
               | 
               | The time since GPT-4 already feels something like 6
               | months. So far I'm perpetually feeling behind.
        
               | mptest wrote:
               | Can't imagine trying to keep up as a dev. Any of these
               | tools useful for you in practice yet?
               | 
               | I struggle to keep up and all I need to do is understand
               | developments well enough to simplify them in to palatable
               | morsels for my tech skeptic colleagues in politics and
               | non profits.
               | 
               | Challenging because they have a form of technology PTSD.
               | when they hear "new technology" nft's of monkeys with 6
               | digit prices and peter thiel's yacht flash before their
               | eyes and they see red.
               | 
               | And I can't _really_ blame them, the rhetoric around
               | crypto was enough to sour most non techies (in my little
               | corner of lefty politics anyway) against the idea that
               | any tech advancement is noteworthy. One of the first more
               | serious individuals in politics to hear me out did so
               | because  "i sounded like one of the early linux
               | proselytizers" lol.
               | 
               | Completely agree how time has slowed. I rotate between
               | absolute giddy anticipation at our future thanks to the
               | tech and nihilistic doomerism. Even as a hobbyist though
               | I knew to take this seriously since I saw robert miles
               | talk about gpt 2 in 2017(?) and note there's zero sign of
               | these things plateauing in ability simply by ramping up
               | parameter count.
               | 
               | I've gone on long enough but that live stream felt like
               | the intro to a sci fi movie at points. Can't wait to have
               | multi modal and plugins rolled out.
        
               | VeninVidiaVicii wrote:
               | I can't believe it's only been 1 month. It feels like 3-4
               | somehow.
        
           | nickthegreek wrote:
           | Finally got gpt4 api access. Now I can cancel my ChatGPT plus
           | sub and save a bunch of cash by just using a local client.
        
           | maxdaten wrote:
           | It is. For API access you have to create an account at
           | https://platform.openai.com. You pay per 1k token. For API
           | access to GPT-4 put your organization (org id) on the
           | waitlist.
        
         | danjc wrote:
         | Try OpenAI services in Azure. We were added to a waitlist but
         | got approved a week later. Had 32k for a few weeks now but
         | still on the waitlist for plugins.
        
         | HarHarVeryFunny wrote:
         | MosaicML StoryWriter 65K model just released a day or two ago.
        
           | chrisMyzel wrote:
           | https://www.mosaicml.com/blog/mpt-7b 65k+ context window,
           | open source, open weights
        
         | toxicFork wrote:
         | It's Hella expensive so I think they are ok for now
         | 
         | Until they cut down the cost then they should worry yeah
        
           | fakedang wrote:
           | Honestly for the firms that would use it, for example finance
           | or legal, it's very reasonable.
        
         | fbrncci wrote:
         | Those startups will move on to open source models because
         | OpenAI api calls with 32k token contexts are way too expensive.
        
           | raincole wrote:
           | I don't think it's expensive at all. For things that don't
           | need to be so correct (like, unfortunately, marketing blog
           | posts) it's a <$1 per post generator, which is very cheap to
           | me.
           | 
           | For things where correctness matters, the majority of cost
           | will still come from humans who are in charge of ensuring
           | correctness.
        
             | danjc wrote:
             | Use cases for individual people are ok but it's far too
             | expensive to deploy into your SaaS where a large number of
             | users will use it.
        
             | fbrncci wrote:
             | This does not scale. With an open source model the cost per
             | post would be 0$ if we leave out the hardware.
        
           | cced wrote:
           | What is the latest in conversational models that allow GPT3
           | like (or close) performance w.r.t running things locally?
        
             | modernpink wrote:
             | GPT3 is dated so many open source models are competitive
             | with it, but Vicuna 13b is supposed to be competitive with
             | GPT4
        
               | speedgoose wrote:
               | Against GPT3.5 perhaps the gaps aren't too big for your
               | use cases, but I wouldn't say it's in the GPT4 league. It
               | looks close in the benchmarks but the difference in
               | quality feels (to me) huge in practice. The others models
               | are simply a lot worse.
        
               | modernpink wrote:
               | Interesting. Have you tried StableVicuna?
        
               | speedgoose wrote:
               | No, is it worth a try? I didn't see a lot of hype about
               | it so I didn't try it.
        
             | noman-land wrote:
             | Apparently Vicuna 13B is quite good according to Google's
             | own leaked docs.
             | 
             | https://twitter.com/jelleprins/status/1654197282311491592
        
               | space_fountain wrote:
               | That's according to this
               | (https://lmsys.org/blog/2023-03-30-vicuna/) promotional
               | blog post and just cited by the google memo right? Which
               | isn't really even a doc, just a memo that was circulating
               | inside google.
               | 
               | I also find it strange they don't contrast gpt4 and
               | gpt3.5
        
               | amelius wrote:
               | Is there a way to always stay up to date with the latest
               | and best performing models? Perhaps it's me but I find it
               | difficult to navigate HuggingFace and find models sorted
               | by benchmark.
        
               | nickthegreek wrote:
               | I check r/LocalLlama
        
               | noman-land wrote:
               | Honestly, I just read hackernews :).
        
               | amelius wrote:
               | HN posts are not always in chronological order.
        
               | int_19h wrote:
               | This assessment is based largely on GPT-4 evaluation of
               | the output. In actual use, Vicuna-13B isn't even as good
               | as GPT-3.5, although I do have high hopes for 30B if and
               | when they decide to make that available (or someone else
               | trains it, since the dataset is out).
               | 
               | And don't forget that all the LLaMA-based models only
               | have 2K context size. It's good enough for random chat,
               | but you quickly bump into it for any sort of complicated
               | task solving or writing code. Increasing this to 4K -
               | like GPT-3.5 has - would require significantly more RAM
               | for the same model size.
        
         | decompiled_dev wrote:
         | I was waiting for a while, but then I found there was a page
         | where if you selected "I want to build plugins" then you would
         | have never seen the option to request them.
         | 
         | Once I filled that in I got access within a few days.
         | 
         | https://openai.com/waitlist/plugins
         | 
         | If you are the person to say: "I am a developer and want to
         | build a plugin"
         | 
         | Then it is likely you missed the option to request which
         | plugins you want access to.
        
           | tough wrote:
           | OK thanks for the pointer I submited there for every plugin
           | now as a not-developer to see if that helps.
           | 
           | I stopped paying the pro because without plugins it didnt do
           | that much tbh
        
           | kinlan wrote:
           | Yeah. It's weird, I've signed up when it came up
           | and....nothing. :(
        
       | virgildotcodes wrote:
       | Does this purely affect the amount of tokens that can be fed in
       | and retained in context during a session?
       | 
       | The output from that prompt seems spectacular, so I'm wondering
       | if there are any other differences.
       | 
       | I just tried the same prompt with GPT-4 and the style was much
       | more GPT-like, what I'm used to, not near the same quality as in
       | the OP, although maybe it's just luck?
        
         | arbuge wrote:
         | The prompts in the example on that page are pretty short...
         | hardly taking advantage of the longer context window.
         | 
         | I'm actually not sure if longer responses can be expected with
         | the 32k vs 8k models. Anyone from OpenAI care to comment on
         | that?
        
           | vorticalbox wrote:
           | The token limit is for the whole conversation, system,
           | assistant and user messages all count.
           | 
           | Once the limit is hit it will stop, sometimes mid sentence.
           | 
           | I have a slack bot at work with a system document which is a
           | little over 1k tokens, meaning there is around 3k tokens left
           | for questions and replies.
           | 
           | Trick I am currently doing is to prune older messages to keep
           | it under the limit.
        
             | heliophobicdude wrote:
             | Can you please elaborate on how you prune?
        
               | vorticalbox wrote:
               | you can just use the API, if you set completions to 0 it
               | will return the token count. Then you can just remove the
               | oldest message until it's under any number. I picked 3k
               | to allow 1k for the reply
        
               | akiselev wrote:
               | You can also use the `tiktoken` python library:
               | import tiktoken          len(tiktoken.encoding_for_model(
               | "gpt-4").encode(contents))
        
             | dmix wrote:
             | > Once the limit is hit it will stop, sometimes mid
             | sentence.
             | 
             | Oh that's why GPT does that in a long thread.
             | 
             | It makes sense in retrospect it's the whole conversation
             | not the individual messages.
             | 
             | ChatGPT UX has much to be desired, there should be error
             | messages that communicate this stuff better.
        
             | mlyle wrote:
             | There's the token limit-- the maximum number of tokens to
             | respond.
             | 
             | There's also the token context: how many words into the
             | "past" it considers when formulating the next word.
             | 
             | They're different things. You can generate very, very long
             | responses with a model with a short context window; it just
             | will have amnesia about what it said earlier-- though
             | OpenAI often seems to restrict you / prevent you from
             | having context scroll off in this way.
        
       | JonAtkinson wrote:
       | Does anyone have any examples of promoting to feed such a large
       | amount of tokens? For example, would you use something like "I am
       | going to send you an entire codebase, with the filename and path,
       | followed by the file content. Here is the first of 239 files:
       | ..."
        
         | noonething wrote:
         | You could see the legal impact of your actions before you take
         | them. You could template out an Operating system and have it
         | fill in the blanks. You could rewrite entire literary arts, in
         | the authors style, to cater to your reading style or story
         | preferences.
        
         | dested wrote:
         | Here's someone that passed a 23 page congressional document:
         | 
         | https://twitter.com/SullyOmarr/status/1654576774976770048
        
         | mysterydip wrote:
         | Sounds like a return to the days of download managers or file
         | splitters :)
        
         | fragsworth wrote:
         | I think what's more important is the end part of your prompt,
         | which explains what was previously described and includes a
         | request/question.
        
         | danielbln wrote:
         | I've had access to the 32k model for a bit and I've been using
         | this to collect and stuff codebases into the context:
         | https://github.com/mpoon/gpt-repository-loader
         | 
         | It works really well, you can tell it to implement new features
         | or mutate parts of the code and it having the entire (or a lot
         | of) the code in its context really improves the output.
         | 
         | The biggest caveat: shit is expensive! A full 32k token request
         | will run you like $2, if you do dialog back and forth you can
         | rack up quite the bill quickly. If it was 10x cheaper, I would
         | use nothing else, having a large context window is that much of
         | a game changer. As it stands, I _very_ carefully construct the
         | prompt and move the conversation out of the 32k into the 8k
         | model as fast as I can to save cost.
        
           | bamboozled wrote:
           | Do you use it for proprietary code and if so, you don't feel
           | weird about it ?
        
             | RivieraKid wrote:
             | I wouldn't feel weird about it. The risks - someone
             | stealing know-how / someone finding a security hole - are
             | negligible.
        
             | danielbln wrote:
             | Not weirder than using Github, or Outlook or Slack or
             | whatever.
        
           | guzik wrote:
           | How it calculates the price? I thought that once you load the
           | content (32k token request / 2$) it will remember the context
           | so you can ask questions much cheaper.
        
             | mediaman wrote:
             | It does not have memory outside the context window. If you
             | want to have a back-and-forth with it about a document,
             | that document must be provided in the context (along with
             | your other relevant chat history) with every request.
             | 
             | This is why it's so easy to burn up lots of tokens very
             | fast.
        
         | r3trohack3r wrote:
         | I already do this with the current context limits. I include a
         | few of my relevant source files before my prompt in ChatGPT. It
         | work's unreasonably well.
         | 
         | Something like the following
         | 
         | Here is the Template class:
         | 
         | ...
         | 
         | Here is an example component:
         | 
         | ...
         | 
         | Here is an example Input element:
         | 
         | ...
         | 
         | I need to create another input element that allows me to select
         | a number from a drop down between 1/32 and 24/32 in 1/32
         | increments
        
       | ChildOfChaos wrote:
       | I'd just like to see GPT4 more available, even on the free
       | chatGPT, although I wonder if that will ever fully come with
       | ChatGPT getting so much use and GPT 3.5 being cheaper to run.
       | 
       | Plus seems expensive to me and it is still rate limited quite a
       | lot.
       | 
       | I guess it's going to take further optimisation to make it
       | worthwhile for OpenAI.
        
       | Mistletoe wrote:
       | I love the prompt and yes it does very well at mimicking DFW.
       | It's kind of weirding me out in a Year of the Trial-Size Dove Bar
       | kind of way.
        
         | isoprophlex wrote:
         | That example is really shockingly good, indeed. I'm not always
         | convinced that GPTs can be properly artistic, most things lack
         | soul and read like some rambling Dan Brown on amphetamine...
         | this DFW works very well.
         | 
         | It gave me the same vague feeling of annoyance and disgust at
         | the "look how smart I am" linguistic obstreperousness I get
         | when reading the real deal.
        
         | hackernewds wrote:
         | the output is rather fantastic it's mind numbing. would DFW
         | think this is a solid example of prose?
        
       | cubefox wrote:
       | I thought it was already available for a while on Microsoft
       | Azure?
        
       | zb3 wrote:
       | Of course I still don't even have the basic GPT-4 after nearly
       | two months of waiting.
        
         | phillipcarter wrote:
         | What's your usage and stated use case? I got access for my
         | company account, but I'm pretty sure that's because we've built
         | and shipped product using their API.
        
           | zb3 wrote:
           | I applied for personal use, I stated that I'd like to
           | experiment with its coding abilities. Yeah it seems that they
           | prioritized companies making GPT-4 products first.
        
             | Stagnant wrote:
             | I joined the GPT4 waitlist 2 or 3 days after it was
             | released (around mid-march) and finally got access last
             | week. I also applied for personal use and wrote one or two
             | sentences about wanting to experiment / compare it to other
             | models. So they definitely do give the API access to
             | regular folks as well, no idea how they prioritize it
             | though. I've been a paying customer of ChatGPT plus for
             | three months now which might have helped.
        
         | halfjoking wrote:
         | I applied the first day and just got API access a few days ago.
         | 
         | It is strange they can roll out 32k for some, while not even
         | having 8k for everyone yet.
        
         | throwaway50606 wrote:
         | You pay for Plus and don't have it? Maybe try canceling and
         | subscribing on another account. I got it immediately after I
         | subscribed.
        
           | zb3 wrote:
           | No, I'm talking about the API waitlist. Still, I'm using
           | https://nat.dev/ to access GPT-4 so I don't care that much
           | anymore.
        
             | itsgrimetime wrote:
             | I signed up for API use as soon as it was released and just
             | got access yesterday, so they're still rolling it out it I
             | guess.
        
             | gwd wrote:
             | Bummer -- I got access after 2-3 weeks.
             | 
             | I haven't actually even hit the 8k limit yet, and even
             | experimenting with 32k is pretty expensive, so I'm not sure
             | what I'd do with it.
        
             | GordonS wrote:
             | I tried to checkout nat.dev, but it wants me to create an
             | account to see anything at all. What is nat.dev please?
        
               | zb3 wrote:
               | This is a paid LLM playground site with many models
               | including GPT-4 and Claude-v1.
        
       | ren_engineer wrote:
       | I've now got access to GPT-4-0314, anybody know the difference
       | between that and the 32k model in this post?
        
         | mirekrusin wrote:
         | 8k vs 32k context.
        
         | ec109685 wrote:
         | Yours is a snapshot of the 8k token context taken on March
         | 14th.
        
           | pjot wrote:
           | Does the May 3rd release use the 32k token model?
        
             | ec109685 wrote:
             | No, they would call that out specifically in the model
             | name. It's just a further snapshot so you don't have to
             | jump straight to the next finetuned version without testing
             | your app.
        
       | eh9 wrote:
       | Do the output limits change? If I give it an entire codebase,
       | could it potentially spit out an entire codebase?
       | 
       | I'm wondering if this is a quick way to get to $5 each round trip
       | (send and receive)
        
       | capableweb wrote:
       | > A helpful rule of thumb is that one token generally corresponds
       | to ~4 characters of text for common English text. This translates
       | to roughly 3/4 of a word (so 100 tokens ~= 75 words) -
       | https://platform.openai.com/tokenizer
       | 
       | 32,000 (tokens) * 4 = 128,000 (characters)
       | 
       | > While a general guideline is one page is 500 words (single
       | spaced) or 250 words (double spaced), this is a ballpark figure -
       | https://wordcounter.net/words-per-page
       | 
       | Assuming (on average) one word = 5 letters, context ends up being
       | ~50 pages (128000 / (500 * 5)).
       | 
       | Just to put the number of "32k tokens" into somewhat estimated
       | context.
        
         | s-macke wrote:
         | This is a better calculator, because it also supports GPT-4:
         | 
         | https://tiktokenizer.vercel.app/
         | 
         | Not sure, why they don't support GPT-4 on their own website.
        
           | modernpink wrote:
           | At $0.60 for 20k prompt tokens, it's not going to be cheap so
           | they will need to bring the price down to get broader
           | adoption.
           | 
           | As far as I can tell, the initial reading in of the document
           | (let's say a 20k token one), will be a repeated cost for each
           | subsequent query over the document. If I have a 20k token
           | document, and ask 10 follow-up prompts consisting of 100
           | tokens, that would take me to a total spend of 20k * 10 + (10
           | * 11)/2 * 100 = 205,500 prompt tokens, or over $6. This does
           | not include completion tokens or the response history which
           | would edge us closer to $8 for the chat session.
        
             | 1024core wrote:
             | > At $0.60 for 20k prompt tokens
             | 
             | Is it $0.60 even for the input tokens?
        
               | dragonwriter wrote:
               | Yes, its $0.03/1K for prompt and $0.06/1K for response,
               | which is $0.6/20K for prompt, $1.2/20K for response.
        
             | smodo wrote:
             | What I've read is that people let a kind of meta-chat run
             | alongside the client interaction. The meta channel decides
             | what parts of the history to retain so the primary channel
             | doesn't use as many resources. You could let GPT decide
             | when it needs to see the whole context again etc. There's a
             | lot of interesting ways to let GPT handle it's own scope, I
             | think.
        
               | modernpink wrote:
               | Is this anything like what Langchain supports which is
               | various strategies for chat history compression
               | (summarisation)?
               | 
               | https://www.pinecone.io/learn/langchain-conversational-
               | memor...
               | 
               | But aside from this, if I have a large document that I
               | want to "chat" with, it looks like I am either chunking
               | it and then selectively retrieving relevant subsections
               | at question time, or I am naively dumping the whole
               | document (that now fits in 32k) and then doing the chat,
               | at a high cost. So 32k (and increased context size in
               | general) does not look to be a huge gamechanger in
               | patterns of use until cost comes down by an order of
               | magnitude or two.
        
               | smodo wrote:
               | Yes, that's one example. Relatively simple to implement
               | yourself. Any frontend for chatgpt already does the basic
               | thing, which is to pass the previous messages along with
               | the prompt.
               | 
               | I think we may end up with first and second stage
               | completions, where the first stage prepares the context
               | for the second stage. The first stage can be a (tailored)
               | gpt3.5 and the second stage can do the brainy work. That
               | way you can actively control costs by making the first
               | stage forward a context of a given maximum size.
        
               | bravura wrote:
               | Is this a native feature? Or something home brewed? If
               | the second, how to use it?
        
               | happycube wrote:
               | I forget if Langchain can do that, but something along
               | those lines _will_ exist if it doesn 't already, it's too
               | obvious not to, and too important for the free ChatGPT
               | equivalents which will be popping up over a matter of
               | days/weeks now that truly free versions of llama are
               | coming out.
               | 
               | TL;DR the puck should get there _very soon_ , not just
               | Really Soon Now.
        
               | smodo wrote:
               | Right now all of this is people tinkering with the API.
               | If you look at the docs you will note that it doesn't
               | even provide chat history or any kind of session. You
               | have to pass all context yourself. So you're already
               | homebrewing that, why not add some spice.
        
         | skybrian wrote:
         | If it's not any faster, I'm thinking that how long you're
         | willing to wait for an answer will be the practical bottleneck?
         | 
         | 38 seconds for that example.
        
         | sdo72 wrote:
         | 32k tokens, 3/4 of 32k is 24k words, each page average is 500
         | or 0.5k words, so that's basically 24k / .5k = 24 x 2 =~48
         | pages.
        
           | capableweb wrote:
           | That's great, imagine if more researchers were as excited to
           | reproduce results as you are? The world would be much better.
        
             | elashri wrote:
             | probably off topic, but part of this is "Good luck writing
             | a grant proposal that says I want to reproduce the work of
             | this group and get it accepted". Unless of course you are
             | claiming ground breaking paradigm shift like evidence of
             | unified theory or super-conductivity at room temperature.
        
             | sdwr wrote:
             | This comes across as patronizing to me. "Who's a good
             | little verifier. You are!"
        
             | tysam_and wrote:
             | Forgive my ignorance, but what is the relevance of this to
             | the above comment?
        
               | capableweb wrote:
               | We both arrived at mostly the same result from the same
               | question of "how many pages of text would 32k tokens
               | be?", they basically did the calculation again albeit in
               | a slightly different way. Just like when researchers try
               | to reproduce the results of other's studies.
        
             | cmonnow wrote:
             | [dead]
        
         | dmix wrote:
         | I wonder how many LoC that is on average. Is there an average
         | for LoC? It's probably based on the language...
        
           | MacsHeadroom wrote:
           | A LoC is ~7 tokens thanks to the new toktoken tokenization in
           | GPT-4.
           | 
           | 32k is ~4.5k LoC
        
           | ethbr0 wrote:
           | Finally, APL's day is here!
           | https://en.m.wikipedia.org/wiki/APL_(programming_language)
        
           | wongarsu wrote:
           | Testing some random rust code, it's about 15-20 tokens per
           | LoC, so about 1500-2000 LoC in a 32k context.
           | 
           | Interestingly, using 2-space indentation, as soon as you are
           | about 3-4 indentation levels deep you spend as many tokens on
           | indentation as on the actual code. For example,
           | "log::LevelFilter::Info" is 6 tokens, same as 6 consecutive
           | spaces. There are probably a lot of easy gains here
           | reformatting your code to use longer lines or maybe no
           | indentation at all.
        
             | JimDabell wrote:
             | > There are probably a lot of easy gains here reformatting
             | your code to use longer lines or maybe no indentation at
             | all.
             | 
             | Using tabs for indentation needs fewer tokens than using
             | multiple spaces for indentation.
        
               | mafuy wrote:
               | Curious how this point in the tabs vs. spaces debate re-
               | emerges
        
             | ec109685 wrote:
             | Make sure you are testing with the tiktokenizer:
             | https://news.ycombinator.com/item?id=35460648
        
               | wongarsu wrote:
               | Ah, good catch, it's actually closer to 8 tokens per LoC
               | with GPT4's toktoken, so about twice as good. Some quick
               | testing suggests that's mostly down to better whitespace
               | handling.
        
           | omega3 wrote:
           | How do this even work if the code also uses external
           | libraries?
        
         | BaculumMeumEst wrote:
         | if you wanted to send 32k tokens of code, are you able to do
         | that using a model with a 4k context limit by spreading those
         | tokens out across multiple messages? or does it not work that
         | way?
        
           | dragonwriter wrote:
           | The context limit is for request + response, and there is no
           | storage in between requests (ongoing chat interactions are
           | done by _adding_ prior interactions to the prompt, so the
           | whole chat - before things start falling out of history - is
           | limited to the context window.)
        
           | simonbw wrote:
           | Not really. The API is stateless, you pass it in a whole
           | conversation and it responds with the next message. The
           | entire conversation _including its response_ is limited to
           | 32k tokens.
        
             | BaculumMeumEst wrote:
             | i'm just confused because i thought i remembered sending
             | long chunks of code using the api, the request will fail,
             | but then i would split it up and then it would work okay.
             | 
             | i guess i'm running into a different limit (not context
             | length), or maybe i'm misremembering
        
         | user_named wrote:
         | 0.75 * 32,000 = 24,000 words is faster and more direct
        
           | capableweb wrote:
           | Thanks, math was never my strong suit and I was writing the
           | comment as I was calculating towards the results, never
           | refined and raw, as it should :)
        
       | sva_ wrote:
       | Almost $2 if you want to use the full context length (32 * $.06).
       | Yikes.
        
         | teaearlgraycold wrote:
         | :D
        
       ___________________________________________________________________
       (page generated 2023-05-06 23:01 UTC)