[HN Gopher] 100K Context Windows
___________________________________________________________________
100K Context Windows
Author : samwillis
Score : 578 points
Date : 2023-05-11 16:46 UTC (6 hours ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| fire wrote:
| god I'd love to work there
| tikkun wrote:
| This is the first time I've felt like Anthropic may be a true
| competitor to OpenAI.
|
| I see 6 ways to improve foundation LLMs other than cost. If your
| product is best at one of the below, and has parity at the other
| 5 items, then customers will switch. I'm currently using
| GPT-4-8k. I regularly run into the context limit. If Claude-100K
| is close enough on "intelligence" then I will switch.
|
| Six Dimensions to Compare Foundation LLMs:
|
| 1. Smarter models
|
| 2. Larger context windows
|
| 3. More input and output modes
|
| 4. Lower time to first response token and to full response
|
| 5. Easier prompting
|
| 6. Integrations
| rizky05 wrote:
| [dead]
| RobotToaster wrote:
| >Six Dimensions to Compare Foundation LLMs
|
| I'd add open source to the list, which neither "open"AI or this
| is.
| ugh123 wrote:
| I don't think most of the large customers will care about OSS
| AI. Over the last decade they've learned (trained
| themselves?) where to put their money towards (cloud vs. in-
| house infra for all manner of things, for better or worse)
| and I think AI tools will follow similar trends.
|
| Businesses will certainly care about cost, but just as
| important will be:
|
| - Customization and fine-tuning capabilities (also 'white
| labeling' where appropriate)
|
| - Integrations (with 3rd party and in-house services & data
| stores)
|
| - SLA & performance concerns
|
| - Safety features
|
| Open Source AI will have a place, but may be more towards
| personal-use and academic work. And it will certainly drive
| competition with the major players (OpenAI, Google, etc) and
| push them to innovate more which is starting to play out now.
| ibains wrote:
| A lot of B2B startups can technically the cloud API to
| provide value added applications to Enterprises, but often
| the banks and healthcare companies will not want their data
| running through startups pipes to OpenAI pipes.
|
| We provide a low code data transformation product
| (prophecy.io), and we'll never close sales at any volume,
| if we have a to get an MSA that approves this. Might get
| easier if we become large :)
| lannisterstark wrote:
| >I don't think most of the large customers will care about
| OSS AI
|
| Problem again, is centralization of LLMs by either the
| governments (and they always act in your best interest,
| amirite?) and corporation, which Non-FOSS LLMs prevent.
|
| Democratization of the models is the only way to actually
| prevent bad actors from doing bad things.
|
| "But they'll then have access to it too" you say. Yes, they
| will, but given how many more people who will also have
| access to open LLMs we'd have tools to prevent actually
| malicious acts.
| dragonwriter wrote:
| > I don't think most of the large customers will care about
| OSS AI.
|
| OSS AI will open up more diverse and useful services than
| the first-party offerings from relatively risk averse major
| vendors, which customers *will" care about.
| simonw wrote:
| Here's a really important reason to care about open source
| models: prompt engineering is fiddly enough without the
| risk of your model provider "upgrading" the model you are
| using in a way that breaks your existing prompts.
|
| OpenAI already upset a lot of (admittedly non-paying
| academic) users when they shut off access to the old Ada
| code model with only a few week's notice.
| danysdragons wrote:
| The OpenAI APi has model checkpoints, right now the chat
| options are:
|
| gpt-4 gpt-3-5-turbo gpt-4-0314 gpt-3-5-turbo-0301
| spacebanana7 wrote:
| I'm curious about how enterprises will manage model
| upgrades.
|
| On one hand, as you mention, upgrades could break or
| degrade prompts in ways that are hard to fix. However,
| these models will need constant streams of updates for
| bugs and security fixes just like any other piece of
| software. Plus the temptation to get better performance.
|
| The decisions around how and whether to upgrade LLMs will
| be much more complicated than upgrading Postgres
| versions.
| Vecr wrote:
| Why would the models themselves need security fixes? The
| software running the models, sure, but you should be able
| to upgrade that without changing anything observable
| about the actual model.
| ebiester wrote:
| Yes, but I think for most companies this has more to do
| with cost. They're not going to pay for the OSS model, and
| if they can use an OSS model + fine tuning, they'll choose
| to save the money.
| hdjjhhvvhga wrote:
| > I don't think most of the large customers will care about
| OSS AI.
|
| One would think the same in the 90s but yet, for some
| reason, Open Source prevailed and took over the world. I
| don't believe it was about cost, at least not only. In my
| career I had to evaluate many technical solutions and
| products and OSS was often objectively superior at several
| levels without taking account the cost.
|
| The first really successful alternative to "Open"AI will:
|
| * gather many talented developers
|
| * will quickly become a de facto standard solution
|
| * people will rapidly start developing a wide range of
| integrations for it
|
| * everybody will be using it, including large orgs,
| because, well, it's open source
| ugh123 wrote:
| True, but the difference here is that running a
| performant and capable AI solution will be
| infrastructure-dependent, which has real costs.
| [deleted]
| nullc wrote:
| Companies that aren't mindful of vendor lock in aren't long
| for the world.
|
| Though those cloud platforms all have their own proprietary
| components most users are savvy enough to constrain and
| compartmentalize their use of them lest they find
| themselves having all their profits taken by a platform
| that knows it can set its prices arbitrarily. The cloud vs
| in-house adoption is what it is in large part because the
| cloud offerings are a commodity and a big part of them
| being a commodity is that much of the underlying software
| is free software.
| deltree7 wrote:
| History is littered with companies that went dead because
| they focused on things that don't matter (open source,
| anti-microsoft, pro-linux).
|
| There will be a time when those things matter when it
| hurts the bottom-line (Dropbox), but to prematurely
| optimize for that while you are finding product-market-
| fit is crazy and _all_ companies are finding product-
| market-fit in the new AI era
| throwawayadvsec wrote:
| now that I think about it
|
| is it that important to open source models that can only run
| on hardware worth tens of thousand of dollars?
|
| who does that benefit besides their competitors and nefarious
| actors?
|
| I've been trying to run one of the largest models for a
| while, unless 30,000$ falls in my hand I'll probably never be
| able to run the current SOTA
| chrisco255 wrote:
| > is it that important to open source models that can only
| run on hardware worth tens of thousand of dollars?
|
| Yes, because as we've seen with other open source AI
| models, it's often possible for people to fork code and
| modify it in such a way that it runs on consumer grade
| hardware.
| YetAnotherNick wrote:
| I agree utility of open source for personal usecase is
| overblown.
|
| But for commercial usecases, open source is very relevant
| for privacy reasons as many enterprises have strict policy
| not to share data with third party. Also it could be a lot
| cheaper for bulk inference or to have a small model for
| particular task.
| turtles3 wrote:
| However, the same thing could be achieved with closed
| source models. There's nothing to stop an LLM being made
| available to run on prem under a restrictive license. It
| would really be no different to ye olde desktop software
| - keeping ownership over bits shipped to a customer is
| solved with the law rather than technical means.
|
| That said, I really hope open source models can succeed,
| it would be far better for the industry if we had a Linux
| of LLMs.
| sanxiyn wrote:
| > Keeping ownership over bits shipped to a customer is
| solved with the law rather than technical means.
|
| Yes in theory... In practice, what happened with LLaMA
| showed people will copy and distribute weights while
| ignoring the license.
| chaxor wrote:
| They don't only run on high end systems. Good models can
| run on a desktop you have at home. If you don't have a
| desktop... I'm not sure what you're doing on HN.
| circuit10 wrote:
| It will create price competition for different providers of
| the model though, which should drive down prices
| iknowstuff wrote:
| Even a small startup, a researcher or a tinkerer can get a
| cloud instance with a beefy GPU. Also of note, Apple's M1
| Max/Ultra should be be able to run it on their GPUs given
| their 64/128GB of memory, right? That's an order of
| magnitude cheaper.
| mejutoco wrote:
| I am confused. Those amounts are ram, not gpu ram, aren't
| they? Macs cpus are impressive, but not for ml. A most
| realistic one for a consumer is a 4090 rtx 24 GB. A lot
| of models do not fit in that, so A6000 48GB and over for
| some professional cards. That might be around 9000EUR
| already.
| codedokode wrote:
| > Macs cpus are impressive, but not for ml
|
| On Mac GPU has access to all memory.
| piperswe wrote:
| Apple Silicon has unified memory - all memory is
| accessible to both the CPU and GPU parts of the SoC.
| karmasimida wrote:
| But they comes at max 32GB model?
| [deleted]
| mkl wrote:
| Mac Studio (desktop) is up to 128GB, and Macbook Pro is
| up to 96GB.
| himlion wrote:
| I overlooked the unified memory on those machines. Can it
| really run this performantly?
| lannisterstark wrote:
| "It only benefits bad people" is a pretty shitty argument
| at this point tbf. You can apply this logic to any
| expensive thing at this point.
|
| I _can_ for example, afford the hardware worth tens of
| thousands of dollars. I don 't want to, but I can if I
| needed to. Does that automagically make me their competitor
| or a bad actor?
| fnordpiglet wrote:
| Yes, because it can always be down ported by people with
| more constraints than the original authors. We've see a lot
| of this in the LLM space, and a lot of other OSS efforts.
| RobotToaster wrote:
| When linux was first released in 1991 a 386 to run it would
| cost about $2000.
|
| We've already seen big advancements in tools to run them on
| lesser hardware. It wouldn't surprise me if we see some big
| advancements in the hardware to run them over the next few
| years, currently they are mostly being run of graphics
| processors that aren't optimised for the task.
| dfadsadsf wrote:
| $30000 is less than price of average car that Americans buy
| (and most families have two of them) - that's definitely in
| the realm of something that affluent family can buy if it
| provides enough value. I also expect price to go down and
| at $10k it's less than mid-range bathroom update. The
| question is only if it provides enough value or using in
| the cloud better option for almost all families.
| overgard wrote:
| Considering the very smart people asking for a moratorium on
| AI development, and it's potential to disrupt a lot of jobs,
| this may be a good thing.
| nr2x wrote:
| For me I'd say speed trumps all else. It's impossible to truly
| reach scale with the glacial response times you get from
| current API.
| sebzim4500 wrote:
| >speed trumps all else
|
| Then use GPT-2
| nr2x wrote:
| I actually do prefer 3.5-turbo over 4 for many tasks.
| IshKebab wrote:
| Reliability surely? They still haven't managed to make a model
| that says "I don't know" rather than bullshitting. That's by
| far the biggest unsolved problem.
| srowaway2 wrote:
| 7. Price!
|
| GPT4-32K costs ~$2 if you end up using the full 32K tokens, so
| if you're doing any chaining or back-and-forth it can get
| expensive very quickly.
| hesdeadjim wrote:
| Oof, got access to the 8k model recently and was wondering
| what costs would be on the 32k one. That's brutal.
| zomglings wrote:
| Also if you allow users to receive vector representations of
| context and provide such representations as side information
| when querying LLMs.
| danenania wrote:
| One question is how much other factors really matter compared
| to the raw "intelligence" of the model--how good its
| completions are. You're not going to care very much about
| context window, prompting, or integrations if the output isn't
| good. It would be sort of like a car that has the best steering
| and brakes on the market, but can't go above 5 mph.
| majormajor wrote:
| Big question on that for me is that there's a variety of
| "completion styles" and I'm curious how "universal"
| performance on them is. Probably more than this, but a quick
| list that comes to mind:
|
| * Text summary/compression
|
| * Creative writing (fiction/lyrics/stylization)
|
| * Text comparison
|
| * Question-answering
|
| * Logical reasoning/sequencing ("given these tools and this
| scenario, how would you perform this task")
|
| IMO, for stuff like text comparison and question-answering,
| some combo of speed/cost/context-size could make up for a
| lot, even if they do "worse" versions of stuff just that's
| too slow or expensive or context-limited in a different
| model.
| solarkraft wrote:
| I don't know. While using Phind I regularly get annoyed by
| long prose that doesnt answer anything (yes, "concise" is
| always on). Claude seems to be directly geared towards
| solving stuff over nice writing.
| Tostino wrote:
| I generally add to my initial prompts to GPT4 to: From now
| on, please use the fewest tokens possible in all replies to
| save tokens and provide brief and accurate answers.
| modernpink wrote:
| Or rather, more analogously, a self-driving car that has a
| range of 10 000 miles but sometimes makes mistakes when
| driving vs a self-driving car with a range of 800 miles that
| never makes mistakes. Once you've have a taste of
| intelligence it's hard to give up.
|
| However, in many applications there is a limit on how
| intelligent you need the LLM to be. I have found I am able to
| fall back to the cheaper and faster GPT-3.5 to do the grunt
| work of forming text blobs into structured json within a
| chain involving GPT-4 for higher-level functions.
| tikkun wrote:
| Strongly agree. They are ordered by how much I think they
| generally will lead to users choosing one model over the
| other.
|
| Intelligence is the most important dimension by far, perhaps
| an order of magnitude or more above the second item on the
| list.
| danenania wrote:
| On that note, can anyone speak to how Anthropic (or other
| models) are doing on catching up to OpenAI for pure model
| intelligence/quality of completions? Are any others
| approaching GPT-4? I've only used GPT-based tools so I have
| no idea.
| og_kalu wrote:
| The best claude model is closer to GPT-4 than 3.5
| jll29 wrote:
| More languages?
| nico wrote:
| Faster, cheaper fine-tuning and training
|
| If I could train a useful model, on my own data, in a
| reasonable time
|
| I would want to have a CI-training pipeline to always have my
| models up to date
| makestuff wrote:
| Yeah I remember in undergrad I was working on using
| transformation learning to train an object detector.
| Basically you only needed 100ish images to get the model to
| detect that new object really well.
|
| I'm not sure what the analogous term is for a similar process
| on LLMs, but that will be huge when there is a service for
| it.
| visarga wrote:
| LLMs can do that without any examples (zero shot) or with
| one or a few demonstrations in the prompt, if you can
| describe the task in the limited context window.
|
| If you want for example to train the model to learn to use
| a very large API, or access the knowledge in a whole book,
| it might need fine-tuning.
| nico wrote:
| Could I just train a very small LLM with an English
| dictionary + Python + large API documentation + large
| Python code base?
|
| Then do some chat fine tuning (like what HF did with
| StarCoder to get ChatCoder)
|
| And get a lightweight LLM that knows the docs and code
| for the thing I need it for
|
| After that, maybe incrementally fine tune the model as
| part of your CI/CD process
| toss1 wrote:
| How similar were the object to other objects?
|
| E.g., were you trying to distinguish an object vs nothing,
| a bicycle vs a fish, a bird vs a squirrel, or two different
| species of songbird at a feeder?
|
| How much would the training requirements increase or
| decrease moving up or down that scale?
| ilaksh wrote:
| The PaLM 2 stuff released yesterday has fine tuning for their
| newest large models as a core feature.
| moffkalast wrote:
| Until they actually make any of it available in anything but an
| obscure expensive API you have to request access to, they might
| as well not even exist.
| r_thambapillai wrote:
| there are many services that integrate with them that would
| allow you to self-serve signup
| williamstein wrote:
| The landing page says "Easy integration via standard APIs
| Claude can be incorporated into any product or toolchain
| you're building with minimal effort." Then there is a big
| button "Request Access", which for me right now just does
| nothing. OpenAI has really faced the pain to make their
| product available via an API to the general public at scale,
| but Anthropic/Google/etc. don't quite seem to be there yet.
| It's frustrating.
| chaxor wrote:
| I don't think the person you're responding to wants a
| network based or cloud based solution.
|
| When someone says they want it available they mean running
| on their own device.
|
| This is hackernews, nearly everyone on this site should
| have their own self hosted LLM running on a computer/server
| or device they have at their house.
|
| Relying on 'the cloud' for everything makes us worse
| developers in just about every imaginable way, creates a
| ton of completely unnecessary and complicated source code,
| and creates far too many calls to the internet which are
| unnecessary. Using local hard drives for example is
| thousands of times faster than using cloud data storage,
| and we should take advantage of that in the software we
| write. So instead of making billions of calls to download a
| terabyte database query-by-query (seen this 'industry-
| standard' far too many times), maybe make _one_ call and
| build it locally. This is effectively the same problem in
| LLMs /ML in general, and the same incredible stupidity is
| being followed. Download the model once, run your queries
| locally. That's the solution we should be using.
| akiselev wrote:
| Try a browser or a clean profile without any ad blocking
| turned on. It took me a couple of tries to figure out how
| to get it working but you should see a modal with a form
| when it works.
|
| FYI the waitlist form submits a regular POST request so
| it'll reload the main page instead of closing the modal
| dialog. I opened network monitor with preserved logs to
| double check that I made it on the list :facepalm:
| dkarras wrote:
| I've been using it through poe and I prefer it to ChatGPT but
| can't pinpoint why. It just "gets" me better I guess?
| winstonprivacy wrote:
| Don't forget the ability to fine tune based on one's own data
| sources. For me, this is more important than any of the six
| reasons you mentioned.
| ianhawes wrote:
| We use Anthropic Instant in production and it has been much
| faster than Davinci/GPT4 for awhile. In terms of quality,
| Instant is at least as good as GPT3.5.
| [deleted]
| timsuchanek wrote:
| Curious what this will mean for the vector db vendors. Imagine
| finetuning would be quick and cheap. Could there be a world where
| vector dbs aren't needed anymore?
| shri_krishna wrote:
| 100k context limit is still a limit (we have no idea how
| Anthropic is achieving this - if it is extension of the base
| model context limit itself or some vector db trickery in the
| backend or probably even RAG). Even in this example, though it
| could fit entire text of Great Gatsby it still is 1
| book/text/document. Typical business use cases require
| searching through hundreds if not thousands of documents/books
| and finding similar vector embeddings through all of them and
| fetching top-K results (this is how Google search works when it
| has to scan through embeddings for billions of websites). These
| top-K results can be stuffed into the 100k context limit and
| produce an even more holistic picture rather than just stuff
| one book/pdf/file into the context. Depends on the requirements
| though. I don't see how it might affect vector db vendors who
| can process billions of vectors per query and provide top-K
| results.
|
| Also having a massive context length is not necessarily a good
| thing from perspective of cost. It also doesn't work great with
| a chatbot as you will have to feed the same 100k worth context
| back into the chatbot for every question which will turn out to
| be very expensive. At some point you will have to discard some
| parts of the context to be specific to the question being asked
| and that is where vector embeddings come into play. For one off
| research/Q&A 100k limit works great!
| PeterisP wrote:
| All I see in the link is empty PR claims - is there any
| information about _how_ they 're doing that? There are all kinds
| of known techniques that "expand" context window without really
| doing so, with different tradeoffs, and unless they provide
| actual information, any claims should be taken with a pile of
| salt, we shouldn't just assume that they actually have "true"
| 100k context windows.
| justanotheratom wrote:
| This is nice, but it can get quite expensive.
|
| Let's say I have a book and I want to ask multiple questions
| about it. Every query will pay the price of the book's text. It
| would be awesome if I could "index" the book once, i.e. pay for
| the context once, and then ask multiple questions.
| mikrl wrote:
| The analogy I can think of here is a pointer, but AFAIK the
| context would always need to go along with the prompt unless
| you could tweak internal state to bias towards the context.
|
| Otherwise, it might make sense to have a separate routine which
| compresses the context as efficiently as possible. Auto
| encoder?
| wahnfrieden wrote:
| Not sure about this one but you can usually ask multiple
| questions in one shot at least
| minimaxir wrote:
| Generation is more expensive than the prompt input (for
| Claude v1, generation is 3x the cost; for GPT-4 it's 2x the
| cost)
|
| It makes the economics slightly trickier.
| newhouseb wrote:
| I wonder why this is? Naively there's no difference between
| the two from a transformer standpoint.
|
| Perhaps it's because under the hood there's additional
| safety analysis/candidate generate that is resource
| intensive?
| pyth0 wrote:
| Normally the inputs are padded out to the context length
| [1] and so the cost to embed 1 token or N tokens is the
| same. The output is produced token-by-token and so the
| amount of GPU time increases with the number of output
| tokens.
|
| [1] I'm not sure if these huge context lengths are
| achieved the same way (i.e. a single input vector of
| length N) but given the cost is constant for input I
| would assume the resource usage is too.
| newhouseb wrote:
| This doesn't match my mental model (or implemented model
| in the case of GPT2) of how self-attention works (you
| need to calculate the residual stream for each individual
| token, attending to all prior tokens before it). Have a
| link?
| pyth0 wrote:
| I work on infrastructure for serving large language
| models but I don't have any background in ML, so my
| perspective is looking at these models as a black box
| (and also conversations with the people that do the ML
| stuff). It is the case in practice at least from a
| latency side that with a fixed context length N,
| embedding any number of tokens from 0 to N takes the same
| amount of time. Perhaps it's a difference between the
| conceptual and actual implementation on GPU?
|
| _edit_ - This occurred to me after the fact but I wonder
| if the difference is that the use case I work with is
| processing batches of many different embedding requests
| (but computed in one batch), therefore it has to process
| `min(longest embedding, N)` tokens so any individual
| request in theory has no difference. This would also be
| the case for Anthropic however.
| newhouseb wrote:
| Ah, you're thinking about embeddings which are basically
| the encoder stack on a traditional transformer
| architecture. Modern GPT-like models (including Claude),
| however, drop the encoder and use decoder-only
| architectures.
|
| I could imagine something where encoders pad up to the
| context length because causal masking doesn't apply and
| the self attention has learned to look across the whole
| context-window.
| sebzim4500 wrote:
| Everyone serious batches together short prompts so the
| cost is roughly proportional to the tokens.
| space_fountain wrote:
| Well each additional token generated requires rerunning
| the model right to find the next likely token given the
| previous one
| newhouseb wrote:
| Naively, yes, but you can cache the bulk of that
| "rerunning" [1]. That said the (non-flash) attention
| costs go up with the length of the sequence so perhaps
| this is just a simpler way to approximate these costs.
|
| [1] https://kipp.ly/blog/transformer-inference-
| arithmetic/
| tikkun wrote:
| With embeddings, you essentially can. Group the book into
| sections, embed each section, then when you do a prompt, add in
| the N most similar embedded sections to your prompt.
| adamgordonbell wrote:
| What if the question is "What are the main themes of this
| work?"
|
| Or anything where the question answer isn't 'close' to the
| words used in the question?
|
| How well does this work vs giving it the whole thing as a
| prompt?
|
| I assume worse but I'm not sure how this approach compares to
| giving it the full thing in the prompt or splitting it into N
| sections and running on each and then summarizing.
| summarity wrote:
| That is solved by hypothetical embeddings.
|
| Background: https://summarity.com/hyde
|
| Demo: https://youtu.be/elNrRU12xRc?t=1550 (or try it on
| findsight.ai and compare results of the "answer" vs the
| "state" filter)
|
| For even deeper retrieval consider late interaction models
| such as ColBERT
| akiselev wrote:
| Any material comparing the different embedding models?
| I'm working on information retrieval from government
| documents and without any ML experience it's daunting
| jtlicardo wrote:
| You pretty much summed up the drawbacks of the embeddings
| approach. In my experience it's pretty hard to extract the
| relevant parts of text, especially when the text is
| uniform.
| abraxas wrote:
| You could do multi level summaries etc but yeah this is all
| just band aids around token limits.
| Spivak wrote:
| I don't think it's as much of a band-aid as it first
| appears since this roughly mimics how a human would do
| it.
|
| The problem is that humans have continuous information
| retrieval and storage where the current crop of embedding
| systems are static and mostly one shot.
| crucialfelix wrote:
| Humans have limited working memory, they quickly forget
| short term memory (unless it's super significant) and our
| long term memory fades selectively if not reactivated or
| significant (intense).
|
| This weird leaky memory has advantages and disadvantages.
| Forgetting is useful, it removes garbage.
|
| Machine models could vary the balance of temporal types,
| drop out Etc. We may get some weird behavior.
|
| I would guess we will see many innovations in how memory
| is stored in systems like these.
| make3 wrote:
| Yes, caching the states of the sequence would make sense. An
| issue is that it's still more expensive to compute the new
| tokens even if you cache the states viewed so far
| fdgsdfogijq wrote:
| The price on this will plummet over the next few years, the
| economic benefits are too large
| moffkalast wrote:
| The economic benefits of mining asteroids are also too large
| to ignore yet here we are, levelling villages to dig for
| coal.
|
| Just a few manufacturers hold the effective cartel monopoly
| on LLM acceleration and you best bet they will charge out the
| ass for it.
| modernpink wrote:
| Market competition and innovation in both ML and hardware
| has consistently driven down the price of AI in the past
| decade. You only have to look at where we are with
| capabilities today compared to ten years ago when CIFAR100
| classifiers were the state of the art.
|
| Barring a Chinese invasion of Taiwan, these APIs will halve
| in price over the next year.
| [deleted]
| moffkalast wrote:
| Well here's to hoping I guess.
| skybrian wrote:
| I'm wondering what level you're thinking. Cloud vendors?
| GPU vendors? Fabs?
| moffkalast wrote:
| Given what's used right now to my knowledge, the main
| ones would be Nvidia's tensor cores, Apple's M chips and
| Google's cloud TPUs. All of that's TSMC I think?
| nr2x wrote:
| Yes, but physics trumps economics.
| pyth0 wrote:
| This more or less is already a thing and it's called RAG
| [1][2]. It essentially allows you to have a database of
| embeddings (in this case your book) from which a model can pull
| knowledge from while producing answers. As for the standard
| operation of these generative models, the context window is the
| only working memory it has and so it must see the entire text
| each time.
|
| [1] https://arxiv.org/abs/2005.11401
|
| [2] https://huggingface.co/docs/transformers/model_doc/rag
| m1sta_ wrote:
| Cam you help me understand this? The research appears to be
| from a few years ago. Can this be used with Claude (for
| example)? How is it different to the approach many people are
| taking with vector stores and embeddings?
| make3 wrote:
| it's not different. RAG is a way to train embedding stores
| end to end
| pyth0 wrote:
| Other people seem to be suggesting that the user would do
| the retrieval of the relevant parts of the book from a
| vectordb first, and then feed those sections along with the
| question as the prompt. Conceptually it is very similar
| (and it too uses vector database), but with RAG it would
| happen as part of the inferencing pipeline and therefore
| achieve better performance than the end user emulating it.
| [deleted]
| helen___keller wrote:
| This seems like it could be a game changer. Modern LLM based
| applications face a balancing act of context limitations, which
| often results in some kind of mapreduce-type behavior when that
| context can't fit the input
|
| If contexts keep growing, the landscape of LLM application
| engineering will as well
| whimsicalism wrote:
| The problem is there are no public benchmarks usually so it is
| hard to really compare on long context lengths to see if they
| are still performing equally intelligent of tasks.
| terabytest wrote:
| How does Claude stack up to GPT-4?
| tempusalaria wrote:
| Would be great to see some benchmarks on how loss changes across
| this very large context. It's been technically possible to do
| 1mln+ token context for some time with performance deterioration
| so it would be interesting to see how this compares to those
| efforts
| Imnimo wrote:
| >For example, we loaded the entire text of The Great Gatsby into
| Claude-Instant (72K tokens) and modified one line to say Mr.
| Carraway was "a software engineer that works on machine learning
| tooling at Anthropic." When we asked the model to spot what was
| different, it responded with the correct answer in 22 seconds.
|
| This sort of needle-in-a-haystack retrieval is definitely
| impressive, and it makes a lot more sense to achieve this in-
| context rather than trying to use a vector database if you can
| afford it.
|
| I'm curious, though, whether there are diminishing returns in
| terms of how much _analysis_ the model can do over those 100k
| tokens in a single forward pass. A human reading modified-Gatsby
| might eventually spot the altered line, but they 'd also be able
| to answer questions about the overarching plot and themes of the
| novel, including ones that cannot be deduced from just a small
| number of salient snippets.
|
| I'd be curious to see whether huge-context models are also able
| to do this, or if they start to have trouble when the bottleneck
| becomes reasoning capacity rather than input length. I feel like
| it's hard to predict one way or the other without trying it, just
| because LLMs have already demonstrated a lot of surprising
| powers.
| fzliu wrote:
| I'm also not entirely convinced by "huge" context models just
| yet, especially as it relates to fuzzy knowledge such as
| overarching themes or writing style.
|
| In particular, there are 0 mentions of the phrase "machine
| learning" in The Great Gatsby, so adding one sentence that
| introduces the phrase should be easy for self-attention to pick
| out.
| EGreg wrote:
| This sounds like all the other skepticism about what AI can
| do. And then it can spot 200x more than any human and
| correlate it into common themes, and you'll say what?
| devmor wrote:
| Doing more than a human can isn't impressive. Most computer
| problems for any purpose can do more of something, or
| something faster than a human can.
|
| A better comparison would be if it can pick out any
| differences that can't be picked out by more traditional
| and simple algorithms.
| EGreg wrote:
| Or course it can very soon, since those were also written
| by humans. Like AlphaZero vs Rybka
| chaxor wrote:
| It does, using this method.
|
| My immediate thought as well was '... Yeah, well vimdiff
| can do that in milliseconds rather than 22 seconds' - but
| that's obviously missing the point entirely. Of course,
| we need to tell people to use the right tool for the job,
| and that will be more and more important to remind people
| of now.
|
| However, it's pretty clear that the reason they used this
| task is to give something simple to understand what was
| done in a very simple example. Of course it can do more
| semantic understanding related tasks, because that's what
| the model does.
|
| So, without looking at the details we all know that it
| can summarize full books, give thematic differences
| between two books, write what a book may be like if a
| character switch from one book to another is done, etc.
|
| If it _doesn 't_ do these things (not just badly, but
| can't at all) I would be surprised. If it does them, but
| badly, I wouldn't be surprised, but it also wouldn't be
| mind bending to see it do better than any human at the
| task as well.
| lumost wrote:
| I'd be more impressed if it could rewrite Mr. Carraway as an
| ML engineer in the entire novel. However it's not
| intrinsically clear that it cannot do this...
|
| It'll be tough to find good benchmarks on long context
| windows. A human cannot label using 100k tokens of context.
| zooch wrote:
| My thoughts exactly - rewrite the novel with Mr. Carraway
| as an ML engineer while maintaining themes/motifs (possible
| adding new ones too). I'm guessing what's impressive is
| that these are the first steps towards something like this?
| Or is it already possible? Someone please correct me here.
| SkyPuncher wrote:
| Further, the problem with this example is it relies on a
| comparison against public data.
|
| Most of these AI start failing pretty hard when you ask it to
| do the same task on something completely novel to it (like a
| company document). Sometimes they'll get it right. Other times,
| they'll spit out gibberish that's clearly some generic answer.
| dmix wrote:
| I'd imagine working with an entire company document would
| require a lot more hand holding and investment in prompt
| engineering. You can definitely get better results if you add
| much more context of what you're expecting and how the LLM
| should do it. Treating these LLMs as just simple Q&A machines
| is usually not enough unless you're doing simple stuff.
| nomel wrote:
| > Most of these AI
|
| This is as meaningful as saying most of the hominids can't
| count. You can't usefully generalize AI models with the rate
| of change that exists right now. Any statements/comparisons
| about AI has to contain specific models and versions,
| otherwise it's increasingly irrelevant noise.
| robotresearcher wrote:
| Asking to spot the difference between a given document and an
| unseen document is impossible.
| lkbm wrote:
| A couple years ago, I read Superfudge by Judy Blume, a book
| originally published in 1980. In it, the protagonist writes
| a letter to Santa: "Please bring me one or more of the
| following items. A clock-radio, a remote-controlled model
| airplane, a laptop computer, an MP3 player and six CD's."
|
| I didn't need to have seen this book before to know this
| wasn't in the original 1980s text.
|
| Similarly, if I were reading the Great Gatsby for the first
| time, and it identified a character as a software engineer,
| I would notice.
| drusepth wrote:
| I think there are plenty of humans who wouldn't notice,
| though.
|
| And probably plenty of AI implementations that would
| notice.
| tunesmith wrote:
| I've been curious about this for a while, I have a hobby use-
| case of wanting to input in-progress novellas and then asking
| it questions about plot holes, open plot threads, and if new
| chapter "x" presents any serious plot contradiction problems. I
| haven't tried exploring that with a vectordb-embeddings
| approach yet.
| make3 wrote:
| This is an exact example of something a vector dbs would be
| terrible at.
|
| Vector dbs work by fetching segments that are similar in
| topics to the question, so like "Where did <Character> go
| after <thing>" will retrieve segments with locations & the
| character & maybe talking about <thing> as a recent event.
|
| Your question has no similarity with the segments required in
| any way; & it's not the segments that are wrong it's the way
| they relate to the rest of the story
| HarHarVeryFunny wrote:
| Do the OpenAI APIs support converting prompts to vectors,
| or are people running their own models locally to do this?
| Can you recommend any good resources to read up on vector
| DB approaches to working around context length limits ?
| toss1 wrote:
| Good points - LLMs are ok at finding things that exist, but
| they have zero ability to abstract and find what is missing
| (actually, probably negative; they'd likely hallucinate and
| fill in the gaps).
|
| Which makes me wonder if the opposite, but more laborious
| approach might work - request it identify all characters
| and plot themes, then request summaries of each. You'd have
| to review the summaries for holes. Lotsa work, but still
| maybe quicker than re-reading everything yourself?
| TeMPOraL wrote:
| > _LLMs are ok at finding things that exist, but they
| have zero ability to abstract and find what is missing
| (actually, probably negative; they 'd likely hallucinate
| and fill in the gaps)._
|
| I feel this is mostly a prompting issue. Specifically
| GPT-4 shows surprising ability to abstract to some degree
| and work with high-level concepts, but it seems that,
| quite often, you need to guide it towards the right
| "mode" of thinking.
|
| It's like dealing with a 4 year old kid. They may be
| perfectly able to do something you ask them, but will
| keep doing something else, until you give them specific
| hints, several times, in different ways.
| vidarh wrote:
| Firstly, I don't at all agree that they have zero ability
| to abstract. Doesn't fit my experience at all. A lot of
| the tasks I use ChatGPT for is exactly to analyse gaps in
| specifications etc. And have it tell me what is missing,
| suggest additions or ask for clarifications. It does that
| just fine.
|
| But I've started experimenting with the second part, of
| sorts, not to find plot holes but to have it create
| character sheets for my series of novels for my own
| reference.
|
| Basically have it maintain a sheet and feed it chunks of
| one or more chapters and asking it to output an a new
| sheet augmented with the new details.
|
| With a 100K context window I might just test doing it
| over while novels or much larger chunks of one.
| sashank_1509 wrote:
| How are LLM's increasing their context size? I guess you just
| increase input size if it's for the self supervised GPT3 style
| training but for RLHF? Are they creating datasets of books to
| input to the LLM and then making human labelers label the
| response? There might be a smart way that does not involve new
| datasets
| sp332 wrote:
| Mosaic wrote about their new model here.
| https://www.mosaicml.com/blog/mpt-7b It was trained on 65k
| inputs and has decent performance working with 80k+ tokens.
| potatoman22 wrote:
| I don't think RLHF datasets need to take full advantage of the
| context window. There's also many ways to programatically
| generate NLP datasets.
| mark_l_watson wrote:
| With quadratic time complexity for context size, that gets
| expensive.
| ginger2016 wrote:
| How do I sign-up? What is the cost?
| karmasimida wrote:
| Going to be absolutely expensive.
| gigel82 wrote:
| Nice, that's roughly a 250-page book based on average word
| counts.
| maxutility wrote:
| I don't see this in the article. Has Anthropic explained the
| mechanism by which they were able to cost-effectively expand the
| context window, and whether there was additional training or a
| design decision (e.g. alternative positional embedding approach)
| that helped the model optimize for a larger window?
| cheeselip420 wrote:
| Maybe this model can finish Winds of Winter and the rest of GoT
| for us...
| babuloseo wrote:
| Add Berserk to that list.
| azakai wrote:
| 75,000 words is a drop in the bucket for A Song of Ice and
| Fire:
|
| https://blog.fostergrant.co.uk/2017/08/03/word-counts-popula...
| akiselev wrote:
| You'd want to generate it in multiple steps to make it
| feasible to control the text generation anyway. First call
| generates the broad outline, several parallel calls flesh out
| character development and some other details so that they're
| consistent, then generate the story piece by piece by feeding
| in bits of the outline.
| nottorp wrote:
| And then you end up with what the movie did which is not
| exactly a GRRM novel.
| camel-cdr wrote:
| Meanwhile web serial authors: [0] [1]
|
| [0] https://wanderinginn.neocities.org/statistics
|
| [1] https://www.reddit.com/r/Parahumans/comments/rz8ogt/wildb
| ows...
| thepasswordis wrote:
| That's actually a really interesting use case!
| pclmulqdq wrote:
| That may need a million tokens just for one book, though!
| f6v wrote:
| I'd be excited for Dexter ending that doesn't suck.
| [deleted]
| gumballindie wrote:
| I am noticing a different tone coming from Anthropic. Unlike
| openai they dont appear to be focused on fud and replacement.
| Gives the impression it's run by adults instead of crypto bros
| turned ai experts. Curious how their models will work.
| lubesGordi wrote:
| Um Ilya Sutskever isn't a crypto bro.
| gumballindie wrote:
| No but sam altman is. That company can go whistling.
| Workaccount2 wrote:
| Is there any path towards folding tokens into the actual model?
| That is, continual training rather than the current "training
| first then just tokens after"
| ilaksh wrote:
| PaLM 2 on Vertex AI which Google just released yesterday has
| fine tuning the large models as a core part of their offering.
| whimsicalism wrote:
| We need public benchmarks.
|
| This is incredibly fast progress on large contexts and I would
| like to see if they are actually attending equally as well to all
| of the information or there is some sparse approximation leading
| to intelligence/reasoning degradation.
| monlockandkey wrote:
| https://lmsys.org/blog/2023-05-10-leaderboard/
|
| https://chat.lmsys.org/?arena
|
| Claude by Anthropic has more favourable responses then ChatGPT
| Workaccount2 wrote:
| ChatGPT3.5*
|
| It's still below GPT4, but it is closer to 4 than 3.5
| polishdude20 wrote:
| So I tried this prompt in their chatbot arena multiple times.
| Each time getting the wrong answer:
|
| "Given that Beth is Sue's sister and Arnold is Sue's father
| and Beth Junior is Beth's Daughter and Jacob is Arnold's
| Great Grandfather, who is Jacob to Beth Junior?"
| jefftk wrote:
| Is the right answer pointing out that Arnold might not be
| Beth's father, and so Beth Junior might be unrelated to
| Jacob?
| svachalek wrote:
| I just tried it and gpt-3.5-turbo got it right.
| nynx wrote:
| There has got to be a number of fascinating tricks that they're
| using to support context lengths that long. Shame it's all
| closed-source.
| sweezyjeezy wrote:
| Can LLMs take advantage of this bigger window to solve meaningful
| tasks though? I can't imagine in the training data, knowing what
| happened 100k tokens ago would be _that_ relevant to predicting
| the current token very often, so unless this is something that
| the model learns to leverage more implicitly, I'd be a bit
| pessimistic.
| ttul wrote:
| Yes. For instance, a large context window allows you to have a
| chat for months where the model can remember and make use of
| everything you've ever talked about. That enables creating a
| much more effective "assistant" that can remember key details
| months later that may be valuable.
|
| A second example is the analysis of long documents. Today,
| hacks like chunking and HyDE enable us to ask questions about a
| long document or a corpus of documents. But is far superior if
| the model can ingest the whole document and apply attention to
| everything, rather than just one chunk at a time. Chunking
| effectively means that the model is limited to drawing
| conclusions from one chunk at a time and cannot synthesize
| useful responses relating to the entire document.
| m3kw9 wrote:
| Gets pricier as you chat for longer, imagine having to chat a
| line with a history with 20k token.
| sweezyjeezy wrote:
| I'm not questioning whether it would be useful, just whether
| it's actually something that token masking in training is
| going to work to make the model learn this.
| woeirua wrote:
| It remains to be seen just how effective longer contexts are
| because if the attention vectors don't ever learn to pick up
| specific items from further back in the text then having more
| tokens doesn't really matter.
|
| Given that the conventional cost of training attention layers
| grows quadratically with the number of tokens I think
| Anthropic is doing some kind of approximation here. Not clear
| at all that you would get the same results as vanilla
| attention.
| ttul wrote:
| They did mention that the inference time to answer a
| question about the book was something like 22 seconds, so
| perhaps they are indeed still using self-attention.
| SomewhatLikely wrote:
| I would guess that semantic similarity would be the stronger
| training signal than distance once you go beyond a sentence or
| two away.
| sweezyjeezy wrote:
| I'm pretty dubious - how would the model not get absolutely
| swamped by the vast amount of potential context if it's not
| learning to ignore long range signals for the most part?
| [deleted]
| dr_dshiv wrote:
| I often prefer Claude over GPT4 (partially due to speed), but it
| degrades more quickly. Like I can get a better response early,
| but usually the quality drops faster. But, sometimes if it can
| really vibe with it, it gets better over time.
| ilaksh wrote:
| Did anyone else get on the waitlist, get in, and now their
| console link doesn't work? I remember deciding the code
| generation wasn't good enough to bother. Not sure if I actually
| ever activated it but I guess not.
|
| Now I tried to request access again on their form and it just
| redirected. Can't even tell if that worked.
|
| Does anyone know if this can program as well as GPT-4? Because if
| so then the larger context window is a big improvement.
| M4v3R wrote:
| I do have access to it and from my very limited testing it
| looks like it can program at least on par with GPT-3.5. I
| didn't have time yet to test it more comprehensively against
| GPT-4.
| ilaksh wrote:
| OK great thanks that's what I heard. Very interested to hear
| about comparisons with GPT-4.
| ablyveiled wrote:
| What's the catch? Using GPT-4 relative to its own marketing copy
| was a letdown.
| SeanAnderson wrote:
| big if true? :)
|
| Exciting to see competition across LLMs for increasing context
| window size.
|
| I can't find updated pricing anywhere. Previous prices are here:
| https://cdn2.assets-servd.host/anthropic-website/production/...
| but don't seem to be embedded directly on the Anthropic website.
| I tried messing with the URL (apr -> may/jun) but 404'ed.
| kordlessagain wrote:
| > Exciting to see competition across LLMs for increasing
| context window size.
|
| Maybe. I think the debate is going to continue about prompt
| optimization vs. context window size.
|
| A while ago, I had a rather interesting conversation with
| GPT-3.5 about forgetting things. Knowing what to forget, or
| delete from the prompt, may be just as important as what to put
| in it.
|
| Putting the kitchen sink into the prompt probably isn't going
| to help much, past a certain point and it may be putting
| certain things in there based on time and context is a better
| strategy.
| SeanAnderson wrote:
| Yeah, there's definitely diminishing returns. I just wanted
| to talk to ChatGPT about a game I'm developing. I have pages
| upon pages of product design notes and I'm not able to just
| copy/paste the whole thing in and start talking to it at 8k
| context length. There's not really duplicate information as
| far as I can tell since each section covers new topics. I'm
| sure there's a way to express the same ideas more succinctly,
| but I kind of want ChatGPT to do that for me rather than me
| figuring out how to do that just to interface the ideas into
| it.
| seydor wrote:
| so i m going to just paste a few physics book and ask it "make
| fusion"
|
| What is the approach to increase the sequence length here?
| [deleted]
| [deleted]
| swiftcoder wrote:
| > When we asked the model to spot what was different, it
| responded with the correct answer in 22 seconds.
|
| Now we've gone from using ML to implement slow, unreliable
| databases, to using ML to implement slow, unreliable string
| comparison, I guess
| we_never_see_it wrote:
| Google is really trying to catch up to OpenAI & MS. The truth is
| they have never been in the race to begin with. All they had and
| still have is PR stunts. Let's see if their copying of MS model
| will produce anything useful.
| oars wrote:
| Google has multiple horses in this race.
|
| They invested $300m in Anthropic in late 2022:
| https://www.ft.com/content/583ead66-467c-4bd5-84d0-ed5df7b5b...
|
| (Non-paywall: https://archive.is/Y5A9B)
| thewataccount wrote:
| > The truth is they have never been in the race to begin with.
|
| Product race? My understanding is they've been so concerned
| with safety/harm that they've been slow to implement a lot of
| tools - then OpenAI made an attempt at it anyway.
|
| Google has generally been ahead from a research perspective
| though. And honestly it's going to be really sad if they just
| stop releasing papers outright - hopefully the release their
| previous gen stuff as they go :/
| andreyk wrote:
| Curious why you think this? PaLM2 looks great, and Google has
| been productizing cutting edge AI pretty fast for years.
| sebzim4500 wrote:
| I guess PaLM2 is competitive with GPT-3.5 so for people not
| willing to pay it will be an attractive offering.
|
| I'm not sure that counts as 'great' though.
| rsstack wrote:
| Based on what do you think it's comparable to GPT-3.5 and
| not to 4? Did we see a lot of public performance?
| sebzim4500 wrote:
| They claim it is already being used in Bard, also if you
| read the paper it does much worse at the important
| benchmarks.
| MacsHeadroom wrote:
| PaLM 2 can't even solve "Write three sentences ending in the
| word Apple."
|
| It's worse than GPT-3.5. Go see for yourself at
| bard.google.com, which is running on PaLM 2 everywhere but
| the EU as of yesterday.
| Garrrrrr wrote:
| Ah yes, the famous benchmark for all LLMs. I just tried
| your novel example with GPT-3.5 and it couldn't solve it
| either:
|
| > After lunch, I like to snack on a juicy and crisp apple
| to satisfy my sweet tooth.
|
| > In the fall, many families enjoy going to apple orchards
| to pick their own apples and make homemade apple pies.
|
| > The new MacBook Pro features a powerful M1 chip and a
| stunning Retina display, making it the perfect tool for
| creative professionals who work with Apple software.
| mustacheemperor wrote:
| Eh, I think as "human evaluated" metrics go, it's a
| decent test of how well it can parse a reasonably complex
| sentence and reply accurately.
|
| For me:
|
| GPT4 3/3: I couldn't resist the temptation to take a bite
| of the juicy, red apple. Her favorite fruit was not a
| pear, nor an orange, but an apple. When asked what type
| of tree to plant in our garden, we unanimously agreed on
| an apple.
|
| GPT3.5 2/3: "After a long day of hiking, I sat under the
| shade of an apple tree, relishing the sweet crunch of a
| freshly picked apple." "As autumn approached, the air
| filled with the irresistible aroma of warm apple pie
| baking in the oven, teasing my taste buds." "The teacher
| asked the students to name a fruit that starts with the
| letter 'A,' and the eager student proudly exclaimed,
| 'Apple!'"
|
| Bard 0/3: Sure, here are three sentences ending in the
| word "apple": I ate an apple for breakfast.The apple tree
| is in bloom. The apple pie was delicious. Is there
| anything else I can help you with?
|
| Bard definitely seems to fumble the hardest, it's pretty
| funny how it brackets the response too. "Here's three
| sentences ending with the word apple!" nope.
|
| Edit: Interesting enough, Bard seems to outperform GPT3.5
| and at least match 4 on my pet test prompt, asking it
| "What's that Dante quote that goes something like "before
| me there were no something, and only something
| something." 3.5 struggled to find it, 4 finds it
| relatively quickly, Bard initially told me that quote
| isn't in the poem but when I reiterated I couldn't
| remember the whole thing it found it immediately and
| sourced the right translation. It answered as if it were
| reading out of a specific translation too - "The source I
| used was..." Is there agent behavior under the hood of
| bard or is just how the model is trained to communicate?
| kernal wrote:
| OpenAI is the Microsoft Explorer of AI.
| endisneigh wrote:
| I don't know how anyone can say this with a straight face when
| Google is the one who invented LLMs as used today to begin
| with.
|
| Google has a product issue, not an AI research one.
| cubefox wrote:
| DeepMind and Google invented many other things, but I think
| the first GPT style token predictor was actually ... GPT, a
| model by OpenAI. RLHF was also invented at OpenAI. They also
| had the first text-to-image model.
| onlyrealcuzzo wrote:
| It's usually the least informed with the most self-assured
| sweeping opinions.
| darig wrote:
| [dead]
| meghan_rain wrote:
| The most interesting bit is that for the first time since the
| release of ChatGPT in December 2022, OpenAI does not have the
| lead on LLMs anymore.
|
| At least, for people who need large context windows, they would
| not be the first choice anymore.
| sebzim4500 wrote:
| GPT-4 still leads in the chatbot arena[1] but at least it is a
| two horse race now.
|
| [1] https://lmsys.org/blog/2023-05-10-leaderboard/
| refulgentis wrote:
| Claude's very quietly better on everything but pricing, for a
| while, it just got buried because they announced on "AI
| Tuesday" (iirc gpt4 and Bing announcement day)
|
| The ChatGPT equivalent is 3x speed and was somewhere between
| ChatGPT and GPT4 on my TriviaQA benchmark replication I did
|
| Couple tweets with data and examples. Note they're from 8 weeks
| ago, I know Claude got a version bump, GPT3.5/4 accessible via
| API seem the same.
|
| [1] brief and graphical summary of speed and TriviaQA
| https://twitter.com/jpohhhh/status/1638362982131351552?s=46&...
|
| [2] ad hoc side by sides
| https://twitter.com/jpohhhh/status/1637316127314305024?s=46&...
| com2kid wrote:
| > I know Claude got a version bump, GPT3.5/4 accessible via
| API seem the same.
|
| GPT3.5 just got an update a few days ago that resulted in a
| pretty good improvement on its creativity. I saved some
| sample outputs from the previous March model, and for the
| same prompt the difference is quite dramatic. Prose is much
| less formulaic overall.
| ndr_ wrote:
| Is this update made visible somewhere? The language models
| offered on my Playground are still the ones from March,
| same with ChatGPT.
| refulgentis wrote:
| Thank you, every little comment I get from fellow boots on
| the ground is so valuable, lotta noise these days.
|
| Random Q, I don't use the ChatGPT front end much past month
| or two, used it a week back and it seemed blazingly faster
| than my integration: do you have a sense of if it got
| faster too?
| ilaksh wrote:
| How is the code generation of Claude?
| esafak wrote:
| And is code generation ability equivalent to code
| understanding and search ability?
| technics256 wrote:
| I have access to claude. It's not bad, but decently behind
| gpt4 for code
| refulgentis wrote:
| Note, all impressions based on Claude 1.2, got an email
| from Poe in the last week saying it was version bumped to
| 1.3 with a focus on coding improvements.
|
| Impressions:
|
| Bad enough compared to GPT-4 that I default to GPT-4. I
| think if I had api access I'd use it instead, right now it
| requires more coaxing, and using Poe.
|
| I did find "long-term" chats went better, was really
| impressed with how it held up when I was asking it a nasty
| problem that was hard to even communicate verbally. Wrong
| at first, but as I conversed it was a real conversation.
|
| GPT4 seems to circle a lower optima. My academic guess it's
| what Anthropic calls it "sycophancy" in its papers, tldr
| GPT really really wants to do more like what's in the
| context, so the longer the conversation with initial errors
| goes, it's actually harder to talk it out of the errors.
| flerovium wrote:
| It means nothing as long as they don't actually let us test the
| API.
|
| Good luck waiting for it.
| jackson1372 wrote:
| See the pricing PDF[^1] and API docs[^2], but TL;DR:
|
| - Price per token doesn't change compared to regular models
|
| - Existing api users have access now by setting the `model`
| param to "claude-v1-100k" or "claude-instant-v1-100k"
|
| - New customers can join waitlist at anthropic.com/product
|
| [1]: https://cdn2.assets-servd.host/anthropic-
| website/production/... [2]:
| https://console.anthropic.com/docs/api/reference#parameters
| nr2x wrote:
| "POC or GTFO" as the security people say. :-)
| qwertox wrote:
| The day a quantum computer is able to host a huge LLM, things
| will get really interesting for humanity.
|
| I say this, because I'm not sure how all of this is really going
| to scale on GPUs. It feels like LLM's are just as magical as
| quantum computing.
| gdiamos wrote:
| Nice. Will we be able to get to 1M tokens?
| programmarchy wrote:
| Seems like a good target. Even 100K seems too small. As a
| reference point, the Bible is ~750,000 words.
| smallerfish wrote:
| "You are a hebrew god and below the dashes is The Word. Who
| will you smite today?"
| vrglvrglvrgl wrote:
| [dead]
| jacooper wrote:
| Anthropic is basically Google's openAI.
| cubefox wrote:
| It's not a Google company, their share amounts to ~10%.
| m3kw9 wrote:
| Is this real input context or is it some vectordb in the
| background type trickery?
| HarHarVeryFunny wrote:
| Pretty sure it's not "real" (model) context width.
|
| Another wide context model is MosaicML's
| MPT-7B-StoryWriter-65k+ which they are describing as having a
| context width of 65k, but then give a bit more detail to say
| they are using ALiBi - a type of positional encoding that
| allows longer contexts at inference time than training (i.e
| beyond the real context width of the model).
|
| For these types of "extended context" models to actually reason
| over inputs longer than the native context width of the model,
| I _assume_ that there is indeed some sort of vector DB trickery
| - maybe paging thru the input to generate vector DB content,
| then using some type of Retrieval Augmented Generation (RAG) to
| process that using the extended contexts ?
|
| Maybe someone from Anthropic or MosaicML could throw us a bone
| and give a bit more detail of how these are working !
|
| https://www.mosaicml.com/blog/mpt-7b
|
| https://arxiv.org/abs/2005.11401
| [deleted]
| minimaxir wrote:
| No pricing, but given that OpenAI's GPT-4 doubles the cost-per-
| token if you go from 8k to a 32k context window, I suspect the
| pricing here will be 2-4x from the base Claude model which is 9k:
| https://cdn2.assets-servd.host/anthropic-website/production/...
|
| Although with flash attention, who knows if marginal cost scales
| that consistently.
| adamkochanowicz wrote:
| https://cdn2.assets-servd.host/anthropic-website/production/...
| minimaxir wrote:
| Those are the same SKUs I linked.
|
| The new model are a different model identified that's not
| listed in the pricing doc, although it sounds like the intent
| may be to replace the base from looking at the API docs: http
| s://console.anthropic.com/docs/api/reference#-v1-complet...
| f_devd wrote:
| <4x would be quite optimistic, at ~11x the tokens the amount of
| compute/memory required would be n^10 (even with the lower
| starting point of flash attention) so unless they are already
| have excessive margins it wouldn't make much sense to go that
| low.
| sp332 wrote:
| I was assuming they used a different architecture to get the
| increase instead of just letting it eat hardware that way.
| Especially with the speed numbers in the post.
| l1n wrote:
| Pricing is the same as the base model.
| jimsimmons wrote:
| Confirmation here:
|
| https://twitter.com/AnthropicAI/status/1656743460769259521?s.
| ..
| minimaxir wrote:
| Huh. Well that changes things.
| rat9988 wrote:
| Only for the duration of the beta
| jimsimmons wrote:
| Source?
| felixgallo wrote:
| the actual tweet you linked.
| jimsimmons wrote:
| It doesn't say exclusively for the beta period
| scoopertrooper wrote:
| With an extremely literal reading you are correct, but
| there was clearly an implication.
| alpark3 wrote:
| I use GPT-4 through the API, but I can't help but hate the
| token/characterization pricing of these LLM APIs we've seen so
| far. Because the entire context needs to be fed back into the
| model, as my conversation gets longer, it gets more expensive.
| Yeah, it's fractions of a cent and cheaper, but something about
| it is so psychologically taxing that I'd rather pay a flat
| sum/month and get unlimited access, even if it costs more
| considering my usage.
| WA wrote:
| Have you tried to start a new chat after your first question,
| but refine your new prompt to include some infos you gathered
| from the first response? This way, you know exactly how many
| tokens you gonna send.
| RoddaWallPro wrote:
| I requested & have been waiting for access to Claude for nearly 3
| months now. Guess the waitlist must be really long...
| jazzyjackson wrote:
| API access or just access to the chatbot?
|
| You can go through Poe.com
| technics256 wrote:
| You likely got rejected. Was the same for me and I reapplied
| with a good use case and was let in
| melvinmelih wrote:
| > You can drop multiple documents or even a book into the prompt
| and then ask Claude questions that require synthesis of knowledge
| across many parts of the text.
|
| This is cool but does it also work the other way around? Generate
| a book's worth of content based on a single prompt?
| cubefox wrote:
| That's a good question. Can Claude write a coherent book?
| Chabsff wrote:
| Kinda. But it`s going to be a lot like how data compression
| works. There will always be a somewhat fundamental limit to how
| much "creativity" you can get out of a small prompt generating
| large texts when using an isolated model.
| worik wrote:
| Their sign up form does not let me sign in for early access.
|
| A bit disappointing
| skilled wrote:
| My wallet is hardly capable of handling 8k GPT-4.
| ibitto wrote:
| Anyone using Claude? How long did it take you to get access?
| harisec wrote:
| Claude is available for free in the Poe app (poe.com). I think
| it's good and underappreciated.
| danysdragons wrote:
| It is good, but the free subscription to Poe only provides
| access to Claude Instant. It's impressively fast but not
| their smartest model (claude-v1.3).
| dkarras wrote:
| yeah, been using it instead of ChatGPT and it performs better
| IMO. My conversational LLM of choice for sure.
| Mizza wrote:
| I've got access, it's _blazing_ fast and seems very good.
| Solved some of my little puzzles that other models couldn't. I
| haven't tried ChatGPT-4 yet, but it's the best one that I have
| used.
| thewataccount wrote:
| You need to try GPT4 only because GPT3.5 really doesn't
| compare to it in a lot of ways.
| iEchoic wrote:
| GPT-4 is a major leap ahead of everything else I've used
| (including GPT-3.5), so definitely worth trying for
| comparison.
| pk-protect-ai wrote:
| Ok. It has spatial comprehension of some level. Unlike GPT-4 it
| lacks proper time comprehension because it is bad at calculus.
| Unlike GPT-4 it can't properly solve traveling salesman problem.
| com2kid wrote:
| I am curious how consistent Claude is at obeying detailed
| instructions. One issue ChatGPT 3.5 and 4 have, even with just a
| few hundred words of instructions, is it forgets instructions
| given to it earlier on.[1]
|
| This huge context window is awesome though, I'm trying to use
| LLMs to do small town social interaction simulations, with output
| in a structured format. Finding ways to compress existing state
| and pass it around, so the LLM knows the current state of what
| people in the town did for a given day is hard with a tiny token
| limit!
|
| [1] For my use cases, early instructions tend to be describing a
| DSL syntax for responses, if I add too much info after the
| instructions, the response syntax starts getting wonky!
| rescripting wrote:
| A simple example I ran in to was I asked ChatGPT to generate me
| story in madlibs format for my 4 year old daughter. They're in
| the format "The young _____ went to the ______, ...", and she
| fills in the blanks with silly nouns/adjectives.
|
| As she kept asking for more, I prompted "great, do another one"
| and eventually my original instruction fell out of the context
| window. It continued to generate a children's story, but with
| no more blanks.
| com2kid wrote:
| This is actually a different issue, largely a UI one,
| although one I wish ChatGPT would fix it.
|
| There is no good way to tell it "this isn't a conversation,
| just repeat the answer to the initial prompt again".
|
| The solution is to just re-paste the initial prompt in each
| time, but still it isn't ideal. There isn't a good way to
| tell chatgpt "you can throw away all the context after the
| initial prompt and up until now".
|
| Of course the entire point of ChatGPT is that it maintains a
| conversation thread, so I get why they don't fix up this edge
| case.
|
| My problem is more of, I give ChatGPT some complicated
| instructions, and it'll start forgetting the early on
| instructions long before any token limit is reached.
|
| So for example, if early on I ask for certain tokens to be
| returned in parens, well my initial prompt is too long, it'll
| forget the parens thing and start returning tokens without
| the surrounding (), which then breaks my parser!
| orost wrote:
| Almost every UI for LLMs I've seen has a way to specify an
| initial prompt that never goes out of context, it's strange
| that it's not a feature in ChatGPT.
| throwaway012919 wrote:
| Sounds expensive. I guess we know where the $580M 'investment'
| from SBF is going now.
___________________________________________________________________
(page generated 2023-05-11 23:00 UTC)