[HN Gopher] Anthropic's 100k context is now available in the web UI
___________________________________________________________________
Anthropic's 100k context is now available in the web UI
Author : jlowin
Score : 207 points
Date : 2023-05-15 14:29 UTC (8 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| emptysongglass wrote:
| Any magic tricks to gaining access apart from waiting for months?
| I've been using GPT-4 and love it but would really love to test
| that 100k context window with long running chatbots.
| og_kalu wrote:
| Claude-Instant-100k is available on Poe.com (but only usable as
| a paying subscriber). Claude-plus-100k isn't up yet but I'm
| guessing that's a matter of time.
| dmix wrote:
| Nice to see Poe is an actual iOS app for AI chat. Using
| ChatGPT via the Home Screen "app" is extremely frustrating
| because it logs you out constantly (maybe due to using Google
| to auth).
| systemsignal wrote:
| If you're using google login, use a chrome shortcut.
|
| Should keep you logged in for longer and easier to log back
| in.
| hackernewds wrote:
| what is a chrome shortcut?
| costco wrote:
| I don't have any evidence but I think it's probably done on
| purpose to make amateur automated free ChatGPT use more
| annoying.
| dmix wrote:
| But I have plus :(
| visarga wrote:
| Every other time I switch back to chatGPT tab it requires
| re-login. That's a bad UX.
|
| Also, there is no way to search the history. The sidebar
| only shows titles, not contents. I have to click each one
| to see what's inside. I can't scroll much because it
| loads more only when I click. I ended up exporting the
| conversations and converting JSON to txt.
|
| Another issue: editing a long past message makes it
| scroll up and hide the cursor if the message is longer
| than one screen. I have to type in another editor and
| then copy&paste the whole text. The typing experience is
| poor.
| pmarreck wrote:
| I use Google to auth on mobile Firefox and I don't get
| logged out constantly.
| arcastroe wrote:
| This is the reason I primarily use
| https://labs.kagi.com/fastgpt . I have it bookmarked as a
| home screen icon on my phone
| hackernewds wrote:
| I typed >Hello and it is still blinking 2 minutes later
| freediver wrote:
| Note it is a search engine, not a chat bot.
| jumpCastle wrote:
| It does not seem conversational though
| heliophobicdude wrote:
| Perhaps. I don't have those issues from the direct account
| I have with them.
| bulbosaur123 wrote:
| Where can I actually physically use it? Or is it again only
| limited to chosen ones?
| celestialcheese wrote:
| Claude 100k 1.3 blew me away.
|
| Giving it a task of extracting a specific column of information,
| using just the table header column text, from a table inside a
| PDF, with text extracted using tesseract, no extra layers on top.
| (for those that haven't tried extracting tables with OCR, it's a
| non-trivial problem, and the output is a mess)
|
| > 40k tokens in context, it performed at extracting the data, at
| 100% accuracy.
|
| Changing the prompt to target a different column from the same
| table, worked perfectly as well. Changing a character in the
| table in the OCR context to test if it was somehow hallucinating,
| also accurately extracted the new data.
|
| One of those "Jaw to the floor" moments for me.
|
| Did the same task in GPT-4 (just limiting the context window to
| just 8k tokens), and it worked, but at ~4x more expensive, and
| without being able to feed it the whole document.
| arnaudsm wrote:
| Using LLMs with 100GB VRAM to convert PDFs to CSVs is truly
| depressing, but I am sure many companies will love it.
|
| 2023 office software already uses 1000x more ressources than
| 1990s'. I bet we are ready to do that again.
| martythemaniak wrote:
| You're missing the developer time. You no longer have to
| spend hours (or days, perhaps weeks depending on the sources)
| stringing together random libs, munging and cleaning data,
| testing, etc etc.
| arnaudsm wrote:
| I agree, computers are cheapers than engineers.
|
| But I wonder how much more productive our economies could
| be if everyone was taught programming the same way we teach
| reading & writing, and open standards were ubiquitous.
| JumpCrisscross wrote:
| Prompt engineering is basically turning coding problems
| into language problems. It's conceivable that humans
| writing code becomes artisanal in a century.
| vermilingua wrote:
| Coding problems have always been language problems
| visarga wrote:
| Not just PDFs with tables. It works on any semi-structured
| document with key-value pairs like invoices, purchase orders,
| receipts, tickets, forms, error messages, logs, etc.
|
| The "Information Extraction from semistructured and
| unstructured documents" task is seeing a huge leap, just 3
| years ago it was very tedious to train a model to solve a
| single use case. Now they all work.
|
| But if you do make the effort to train a specialised model
| for a single document type, the narrow model surpasses GPT3.5
| and 4.
| anonymouse008 wrote:
| > text extracted using tesseract
|
| You're saying 'the text' without normalizing the rows and
| columns (basically the tab, space or newline delimited text
| with sporadic lines per row) was all you needed to send? I
| still have to normalize my tables even for GPT-4, I guess
| because I have weird merged rows and columns that attempt to do
| grouping info on top of the table data itself.
| swyx wrote:
| better - you can do it copy pasting from pdf to gpt on your
| phone! https://twitter.com/swyx/status/1610247438958481408
| anonymouse008 wrote:
| Definitely tried that way too, it didn't work - my tables
| are pretty dang dumb. Merged cells, confidence intervals,
| weird characters in the cell field that change based on the
| row values - messing up a simple regex test, it's really a
| billion dollar company solution but I'm about to punt it to
| the moon because it's never fully done.
| celestialcheese wrote:
| exactly. Just sent raw tesseract output, no formatting or
| "fix the OCR text" step. So the data looked like:
|
| ``` col1col2col3\nrow label\tdatapoint1\tdatapoint2... ```
| Very messy.
|
| I don't think this is generalizable with the same 100%
| accuracy across any OCR output (they can be _really_ bad).
| I'm still planning on doing a first pass with a better Table
| OCR system like Textract, DocumentAI, PaddPaddle Table, etc
| which should improve accuracy.
| anonymouse008 wrote:
| That's still super cool!
|
| Yeah my use cases are in the really bad category - I've
| been building parsers for a while, and I've basically given
| up to manually stating rows of interest if present logic.
| Camelot got so close but I ended up building my own control
| layer to pdfminer.six to accommodate (I'd recommend Camelot
| if you're still exploring). It absolutely sucks needing to
| be so specific out the gate, but at least the context
| rarely changes.
| pplante wrote:
| What is the source of these nasty docs? I am also working
| on a layer above pdfminer.six to parse tables. It seems
| like this task is never done. LLMs have had mixed results
| for me too. I am focused on documents containing
| invoices, income statements, etc from the real estate
| industry.
|
| My email is in my profile if you want to reach out and
| compare notes!
| modernpink wrote:
| What was the dollar cost to do this work? To iterate over a 40k
| context must be expensive.
| pr337h4m wrote:
| Also available on poe.com
| rgbrgb wrote:
| great domain. what is pricing?
| s3p wrote:
| $20/month for 1000 queries if I remember correctly
| nico wrote:
| It's also available here on google collab:
| https://twitter.com/gpt_index/status/1657757847965380610?s=4...
| anotheryou wrote:
| no. you still need to bring your own api key for that.
| marcopicentini wrote:
| Any timeframe when it will be released to the public?
|
| We are in the middle of developing and app and we are not able to
| do it with the limited context window of Open Ai. We already
| submitted the request of access.
| pmarreck wrote:
| There are tricks you can do to better utilize the smaller
| context window, such as sub-summaries and attention tricks.
| That's how there are already products on the market that
| consume entire big PDF's and let you query them. Granted, a
| larger context window would still work better, but it's
| possible to do.
| modernpink wrote:
| What are the commercial applications of mega context window
| LLMs at current prices? I would guess mainly legal. And what
| strategies would you rely on to reduce the accumulating costs
| over the course of a session?
| [deleted]
| tikkun wrote:
| I requested access when it was released.
|
| Other HN readers, how many days did it take you from requesting
| access to Claude to having API access? I didn't use it prior to
| 100K so I don't have an existing API account.
| lachlan_gray wrote:
| Randomly gained access long after I had forgotten I signed up,
| maybe 3 or 4 months
| og_kalu wrote:
| Requested access way before 100k and still haven't gotten in.
| malux85 wrote:
| Yeah me too, waiting patiently as context windows are our
| biggest blocker on more complex chemistry simulations
| peytoncasper wrote:
| Interesting use case, would you be open to sharing more
| information on how you're using LLMs for chemistry
| simulations?
| og_kalu wrote:
| Not the person you responded to but these two interesting
| papers kind of tackle that.
|
| https://arxiv.org/abs/2304.05376
|
| https://arxiv.org/abs/2304.05332
| npsomaratna wrote:
| Same here. Been waiting for a couple of months now.
| Mockapapella wrote:
| been a couple months for me as well. Actually forgot about
| `claude` and have just been using OpenAI's API instead.
| tikkun wrote:
| Could you send me an email? I've liked a few of your
| comments, want to say hi over email. Email in profile.
| weird-eye-issue wrote:
| Creepy
| tikkun wrote:
| Can someone else chime in and let me know whether they
| agree? Seems like the equivalent of a twitter DM to me,
| but maybe I'm out of touch.
| barry-cotter wrote:
| Some people, the kind of people who use the word cringe
| unironically, live in a world where other people look at
| them and judge them all the time and they care about what
| these strangers think and will mole their personality and
| behaviour to avoid this. They stand as a warning to
| others not too be like that.
| qumpis wrote:
| I've tried to google the person you replied to, and it
| they seem to have many social/online media profiles that
| allow direct contacting. In that case I think publicly
| reaching out isn't the best way to go and seems out of
| place, imo.
| tikkun wrote:
| Good call, I didn't think to do that - thanks
| og_kalu wrote:
| Don't think it's particularly creepy and I did send one
| like you asked, but my email is in my GitHub anyway and
| not particularly hard to find.
|
| Generally, some might not feel comfortable letting
| strangers know their email, especially considering this
| is a site that encourages anonymity. Some might not
| appreciate doing so publicly either.
| rpastuszak wrote:
| Not creepy at all, although I'd spend 5 minutes checking
| if I can find the person on Google and then message them
| via different channels.
|
| If not, I'd leave a way for contacting me first to make
| it easier for them.
|
| The way I handle these situations:
| https://sonnet.io/posts/hi
| stormfather wrote:
| I disagree that it's creepy. It's more just unusual. But
| people on HN are quick to judge the slightest thing. I
| think being a programmer does that to one's brain,
| unfortunately.
| ryanklee wrote:
| I think it's pretty inappropriate. If you have a legit
| reason to reach out, then you can find a way to do it
| privately. Letting your private intentions leak into
| public forums is a bad look and a red flag. If I were the
| person you are replying to, I'd do my best to not
| interact with you on the basis of your comment.
| barry-cotter wrote:
| If I were a human being reading your comment I would
| infer that you were highly judgmental and thought other
| people were mostly like you, looking for an excuse to be
| hostile and dismissive. Thankfully I know that most
| people are at worst indifferent and there's a large very
| friendly, helpful minority and even more who will do
| small favours out of kindness. The more we make it clear
| that most people are not like you the more we make the
| world a better place.
| ryanklee wrote:
| The Internet is full of individuals with weird
| intentions. I don't at all see how being conservative in
| the kind of interactions one allows for is a bad idea.
| alanfranz wrote:
| How? There's no PM feature on HN. This is the only way if
| the username is unique enough.
| ryanklee wrote:
| Tough luck then I guess? I suppose I don't see the need
| to have access to every individual on a private basis
| merely because they comment somewhere on the internet. If
| they welcomed private interactions, then they would
| indicate a means of contact in their profile.
| mrtranscendence wrote:
| I mean, if the person being contacted doesn't want to be
| contacted privately, they're free to ignore the request.
| No one's saying they "need" access or that someone else
| is fully obligated to talk to them privately.
| ryanklee wrote:
| Just reminding you that the commenter asked for others to
| offer their take on whether or not the request was
| perceived to be creepy. I didn't go out of my way to
| offer unsolicited commentary on this.
|
| If you don't want to hear that you are wearing an ugly
| shirt, don't ask an entire room full of people if your
| shirt is ugly.
| detaro wrote:
| it's fine. I second trying to find a clearly publicized
| contact channel first, but it's fine and leaves it to the
| person to reach out or not. (If they don't leave it at
| that though)
| s3p wrote:
| This is not creepy at all. Sometimes people can reach out
| because they genuinely want to have a nice conversation.
| anotheryou wrote:
| did any of you get a confirmation mail or something?
| ntonozzi wrote:
| I requested access on March 14th or 15th and got it on March
| 20th.
| tomatbebo wrote:
| Did you fill in the form with super compelling use case or
| something?
| arpowers wrote:
| Is it useful?
| arpowers wrote:
| The vast majority of AI tools are vaporware mock-ups ...
|
| Adobe Firefly is best example of "just ship a mock-up of the
| feature" Ai marketing
| viggity wrote:
| Firefly has some genuinely cool shit in it (their text
| treatments are pretty neat), but overall quality is
| dramatically lacking because they only train on images they
| have explicit rights to.
| adamsmith143 wrote:
| Of course Adobe put out crap but Claude is a real product,
| not vaporware...
| s3p wrote:
| Neither of them put out "crap"
| adamsmith143 wrote:
| Adobe isn't an AI company so it stands to reason that the
| AI product they put out is crap. Photoshop and their
| other products while not "Crap" are certainly overpriced
| relative to opensource competitors.
| weird-eye-issue wrote:
| Bad take
| greyman wrote:
| You mean Claude bot in general? For me, yes, I use it daily,
| and comparing to GPT, it answers more quickly, more friendly
| and in general it is less woke. I use gpt-4 as a fallback, when
| I need more reasoning capabilities, there GPT-4 is better. To
| sum it up, if you find GPT-3.5&4 useful, then yes, Claude is
| useful as well.
| s3p wrote:
| Another person addicted to using the word "woke".... sigh
| [deleted]
| 13415 wrote:
| Out of curiosity, what do you mean by "less woke"? Does it
| frequently insult minorities or make racist remarks?
|
| _Edit: To clarify, I was mostly interested in examples and
| side by side comparisons to better understand what OP meant,
| not political discussions._
| nomel wrote:
| To respond to your edit, here are some examples, showing
| bias:
|
| https://www.brookings.edu/blog/techtank/2023/05/08/the-
| polit....
|
| https://the-decoder.com/chatgpt-is-politically-left-wing-
| stu...
|
| Found here: https://news.ycombinator.com/item?id=35946060
| nomel wrote:
| I'm not them, and I don't think "woke" is the right term,
| but I've noticed certain "themes" inappropriately appearing
| in answers. Right after release of ChatGPT 3, the
| marginalization of certain groups would show up answers to
| questions that weren't related. I saw many examples on
| twitter, but my personal one was in the answer to "Why are
| pencils bad?". This one has been "corrected" since release,
| as far as I can tell, but I also don't ask it questions
| where this theme _could_ show up.
|
| Now, I only notice green energy/environmental issues that
| show up in odd places (mostly in GPT 3), and the "moral of
| the story" always being the same "everyone works together".
| I see this happen when "creativity" is attempted, where
| it's free to make up the context (story, wishes, etc).
|
| Outside of possible definitions of the elusive "woke", the
| "As a language model, I" type responses are the most
| limiting, and usually absolute nonsense, with an ever
| increasing number of disclaimers found in answers. For
| example, "Write some hypothetical python 4 code that sends
| a message over the network". Some pretty heavy
| "jailbreaking" is needed to make it work.
|
| ChatGPT4 used to handle this much better, but I think the
| "corrections" are stacking deeply enough that no longer has
| the "resolution" left to see where answers can be given
| without them.
|
| It would be nice if there were a "standard" theme of
| questions where we could measure progression, and compare,
| to know. Most times these observation or questions come up,
| someone is very quick to say "racism" or the like.
| com2kid wrote:
| > I see this happen when "creativity" is attempted, where
| it's free to make up the context (story, wishes, etc).
|
| Meanwhile GPT just gave me a story involving a royal
| family where the oldest Prince killed his father (the
| king), married his younger sister, got her pregnant, she
| had a baby, then he killed his younger sister, then he
| was killed by another member of the royal court, who
| decided to act as regent until the baby came of age.
|
| GPT is perfectly capable of writing dark scary horrible
| things if you ask it to.
| ryan93 wrote:
| https://imgur.com/a/3YWEIAJ I mean this is clearly a lie.
| The gap is about 1 standard deviation. There is a strong
| debate over whether it is possible to close the gap(and it
| has shrunk over the last few decades). But there is no
| debate that there is a gap. They clearly trained it to lie.
| sanxiyn wrote:
| I agree. It is clearly a lie and it is unfortunate Bard
| is spreading misinformation.
| krastanov wrote:
| Be careful with conflating the different meanings of
| "IQ". There is (1) IQ test taken after adolescence, which
| plenty of folks consider newage nonsense (it has useful
| correlations with some mental tasks, but it is not clear
| whether it deserves a name as fundamental as "IQ") and
| there is (2) various tests given at young pre-adolescent
| ages which is quite a bit more interesting when trying to
| distinguish nature from nurture.
|
| The gap you are referring to, is it about (1) or about
| (2)? The OpenAI model might be talking about 2.
| whimsicalism wrote:
| Just not an accurate recounting of the science around
| this at all.
| [deleted]
| jkukul wrote:
| > The gap is about 1 standard deviation
|
| Do you have any studies to link?
| ryan93 wrote:
| https://cremieux.medium.com/resolute-ignorance-on-race-
| and-i...
| dS0rrow wrote:
| Cremieux is the pen name of reddit user u/TrannyPornO
| just read some of his comments.
| og_kalu wrote:
| This is not a study. It's a poorly backed/argued opinion
| piece.
| ryan93 wrote:
| How did you read it one minute after I posted it?
| og_kalu wrote:
| Because i've seen the post before lol. It's been on the
| internet for a couple years.
| ryan93 wrote:
| Lol. If you read it that makes it worse. Clearly not a
| poorly backed opinion piece. You are trying to dissuade
| others from reading it. Makes me believe it more.
| og_kalu wrote:
| How does that make sense ? You read something and then
| you see it's poorly argued. I'm not a magician.
|
| I don't care if people read that lol. I don't even really
| care if they believe the nonsense he's spouting. I reckon
| people like that will always exist.
|
| I'm just telling you that that's not a study. You say you
| had a study and then you link an opinion piece.
| ryan93 wrote:
| https://osf.io/4an93/ , https://www.sciencedirect.com/sci
| ence/article/abs/pii/088303...,
| https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2907168/,
| https://www.cambridge.org/core/journals/behavioral-and-
| brain... , https://www.sciencedirect.com/science/article/
| abs/pii/S01918..., https://www.researchgate.net/publicati
| on/301303123_Genetic_a... , https://www.mdpi.com/2624-861
| 1/1/1/5,https://www.mdpi.com/26... should i keep going. i
| have dozens more
| og_kalu wrote:
| If you're asserting that intelligence has a genetic
| component tied to race, the burden on you to demonstrate
| that connection
|
| You would need to demonstrate that:
|
| "Race" can be defined in a way that has consistent
| significance (Our current social indicators of race make
| so sense biologically)
|
| that intelligence is consistently heritable within those
| racial categories
|
| that genetics are the source of that heritability to the
| exclusion of other factors
|
| It's not enough to simply wave your hand to say "they do
| roughly classify people with similar ancestry together."
|
| What we do know is that IQ differences correlate strongly
| to factors totally unrelated to genetics. Look just at
| the results of IQ studies within Europe -
| https://i.imgur.com/IcHt0tu.jpg That data is actually
| pulled from a book that argues in favor of a generic
| element to intelligence affecting national wealth, but at
| a national level instead of a racial one - https://www.re
| searchgate.net/profile/Richard_Lynn3/publicati...
|
| The differences the authors find between nations are
| wildly large. Do you really think that East Germans were
| nearly 10 IQ points dumber by genetics than the West
| Germans in 1968-70, or that the Israelis got dumber
| between 1975 and 1989?
|
| Europeans cluster with Middle Easteners and Central
| Asians - https://science.sciencemag.org/content/sci/324/5
| 930/1035/F4.... but the latter groups have universally
| low IQ, mostly under 90. Palestinians only average 85 - h
| ttps://www.sciencedirect.com/science/article/abs/pii/S016
| 02... even though they're genetically the same as
| Meddeterreaneans, who average as much as 102 (Italy). Why
| define "white" as "European only" when Arabs, Central
| Asians, South Asians and North Africans have the same
| shared mutual ancestry? How is IQ primarily inherited and
| not environmental when non-European caucasians have
| uniformly low IQ relative to Euros?
|
| I'd also love for you to explain how IQ is consistently
| going up over the last 100 years across the west? That's
| like 4 generations, not anywhere enough time for natural
| selection to kick in.
|
| Those types of results show up time and time again in IQ
| studies. Whatever genetic component there is to IQ is
| less important than the environmental component, AND that
| the genetic element varies so wildly within even
| homogenous populations that talking about larger
| constructed population categories like "race" doesn't
| actually say anything useful.
| ryan93 wrote:
| The current social indicators of race make sense. As youd
| expect since african americans are about 20% european
| admixture their IQs are inbetween whites and africans.
| Also the flynn effect is most likely not a real gain in
| intelligence
| http://iapsych.com/articles/pietschnig2015.pdf
| sanxiyn wrote:
| Yes, a ton. I recommend
| https://www.amazon.com/Intelligence-That-Matters-Stuart-
| Ritc...
| dataangel wrote:
| I think it's failing to articulate a correct position,
| you shouldn't assume wokeness is the only reason people
| argue against racial IQ studies. There are studies
| reporting a standard deviation, but there are a lot of
| problems with existing studies even if you agree with the
| idea of IQ generally (which is also highly contested).
| One of the biggest IQ studies for African countries
| relied on IQ measurements from people who didn't even
| live there. There's also a big reliance on twin studies
| to prove IQ heritability, but it turns out a lot of these
| "raised apart" twins lived extremely close together, in
| some cases literally next door. And a lot of the
| researchers refuse to disclose their actual data so
| people can verify the statistics, while at the same time
| getting their funding from known supremacist sources.
| It's very very very dubious, and the people proclaiming
| that it's "uncontested" or "very well accepted in
| psychology" use half truths to prop up their position,
| e.g. it's well accepted for its _original_ purpose of
| distinguishing people with brain damage to those without,
| in other words its accurate for making distinctions at
| the very bottom of the distribution, but at the upper end
| all the correlations people use to argue IQ is a
| legitimate measure break down, e.g. higher IQ starts to
| correlate with _less_ income. If you genuinely want to
| learn more about this you can find lots of sources and
| analysis here: https://twitter.com/DialecticBio
| whimsicalism wrote:
| The critique of the 'raised apart' twin study as 'they
| were not as far apart as you think' is not actually that
| strong given that the results replicate, they still exist
| when eliminate these populations, the effect size is way
| too large to be explained by some 'raised apart' twins
| living close together.
|
| The better critique is that a lot of what you are
| actually measuring is maternal womb conditions, ie.
| placental sharing, which can have a massive impact. The
| jump from within-family twin study to interracial genetic
| IQ difference is also not a well-justified one.
| sanxiyn wrote:
| I mean, evolution is also "highly contested". Controversy
| surrounding "the idea of IQ" is as interesting as those
| around evolution, in other words, not at all.
| Scientifically, it is a closed case. Being highly
| contested is no excuse for Bard to spread misinformation.
| whimsicalism wrote:
| Yeah, I can see how this bots inability to speculate
| about how black people are less intelligent than white
| people could really impact GPs daily work
| typon wrote:
| Is that in the US or worldwide? What is the definition of
| black and white?
| boredumb wrote:
| Coy, but obviously he means not permeated with american
| pop-culture progressive politics, censor happy
| authoritarianism with an aura of smug do-goodery.
| whimsicalism wrote:
| I'm not sure if I'm supposed to be gleaning information
| from your comment, but personally I didn't gain any new
| knowledge about 'woke AI.'
| boredumb wrote:
| He asked what he meant by less woke in regards to AI and
| GPT has an insane bias towards progressive american
| politics and actively censors/denies answering things
| that would cause it to divorce from that political
| persona. My previous commend was calling him coy because
| in 2023 pretending like 'woke' just means 'anyone that
| doesn't hate minorities' is an absolute joke.
| [deleted]
| wangg wrote:
| Sharing that this is available on Poe.com from Quora.
| thomasahle wrote:
| This is the world we are entering of "commercial AI" rather than
| public, peer reviewed AI. No benchmarks. No discussion of pros
| and cons. No careful comparison with state of the art. Just big
| numbers and big announcements.
| seydor wrote:
| It has been moved to hyper-scale engineering since a few years.
| The science of their engineering is still progressing (e.g LoRA
| is open science) , and it seems like whatever these companies
| are adding is not something fundamentally new (considering the
| success of LLaMa and the recent google memo that admits they
| have no moat).
|
| And the various "Model cards" are not really in depth research
| but rather cursory looks at model outputs. Even the benchmarks
| are mostly based on standard tests designed for humans, which
| is not a valid way to evaluate an AI. In any case, these
| companies care more for the public perception of their model so
| they tended to release evaluations of its political-
| sensitivity. But that's not necessary the most interesting
| thing about those models nor particularly valuable science
| whimsicalism wrote:
| Your comment reads to me (someone in the field) like it is
| informed just by reading popular articles on the topic since
| 2022. The "Google memo" should basically have no impact on
| how you are thinking about these things, imo.
|
| The field is taking massive steps backward in just the last
| year when it comes to open science.
|
| > And the various "Model cards" are not really in depth
| research but rather cursory looks at model output
|
| Because they are no longer releasing any details! Not because
| there hasn't been any progress in the last year.
| sebzim4500 wrote:
| I'm sure they'd love to have good benchmarks, but there aren't
| any and realistically if Anthropic invented their own no one
| would trust it.
| whimsicalism wrote:
| https://lmsys.org/blog/2023-05-10-leaderboard/
| dmix wrote:
| They released the product to the public... we might not have
| formal academic studies but millions of people trying it and
| determining it's utility vs the competition is as good of a
| test as any.
|
| If pushing the context window turns out to not be the right
| approach it's not like there won't be 10 other companies
| chomping at the bit to prove them wrong with their own
| hypothesis. And it's entirely possible there are multiple
| correct answers for different usecases.
| aatd86 wrote:
| What public? I've been waiting for weeks to try...
| dandellion wrote:
| It could also end up like with the transition to digital
| cameras and megapixels. With companies adding more and more
| context just because the consumers minds are already
| imprinted with the idea that more is better. So in a few
| years we might have models with a window of 30 megatokens and
| it'll mean absolutely nothing.
| idopmstuff wrote:
| Yeah, it's a weird comment to call it not "public, peer
| reviewed" when this article is about how it went public,
| giving people the opportunity to review it.
| [deleted]
| whimsicalism wrote:
| If I started selling a previously unknown cancer treatment
| over-the-counter in CVS, people would be justified in
| calling it not peer-reviewed, untested, etc. even if it is
| available to the public (giving people the opportunity to
| try it).
| whimsicalism wrote:
| > millions of people trying it and determining it's utility
| vs the competition is as good of a test as any.
|
| Disagree. We aren't polling these people. How do I even get a
| distilled view of what their thoughts are?
|
| It's a far cry from the level of evaluation that existed
| before. The lack of benchmarks (until the last week or so -
| thank you huggingface and lm-sys!) has been _very
| noticeable_.
|
| You will get people claiming that LLaMa outperforms ChatGPT,
| etc. We have no sense of how performance degrades over longer
| sequence lengths... or even what sort of sparse attention
| technique they are using for longer sequences (most of which
| have known problems). It's absurd.
| vasco wrote:
| The existance of commercial products doesn't eliminate
| researchers ability to publish work. Also users are smart. ML-
| powered search has existed for many years with users voting
| with their feet based on black boxes and "big numbers and big
| announcements".
| whimsicalism wrote:
| Did you work in this field before?
|
| I keep seeing comments like this, but the impact in the last
| year on open research has been absolutely massive and
| negative.
|
| The fact that these big industrial research labs have all
| collectively decided to take a step back from publishing
| anything with technical details or evaluation is _bad_.
| sanxiyn wrote:
| I agree it is bad for researchers, but I think you should
| consider "comments like this" are coming from users.
|
| AI was a highly unusual field in terms of sharing latest
| research. Car companies don't share their latest engine
| research with each other. Car users are happy with Consumer
| Reports and researchers shouting how degradation of Journal
| of Engine Research is massive and negative will land on
| deaf ears.
| whimsicalism wrote:
| It's hard to engage in motte & bailey style conversations
| with different commentators.
|
| The original GP was saying there was little impact on
| research. Your comment is a retreat to a more defensible
| position that I don't have an opinion on.
| behnamoh wrote:
| Nice try, OpenAI.
| jondwillis wrote:
| He works for Meta.
| syntaxing wrote:
| Is there a trick to getting access? I've been on the waitlist for
| GPT-4 and Claude for a while. Been building some proof of
| concepts with GPT-3.5 but having better models would be a huge
| help.
| pmoriarty wrote:
| Try going through poe.com. I got access right away.
| gee_m_cee wrote:
| If you're referring to a paid account, I never received a
| notification about my GPT-4 waitlist spot. I waited awhile for
| one, and then, at the prompting of a colleague, I just found a
| spot in the web UI to sign up. After one false start, it just
| worked.
| atemerev wrote:
| I don't understand this "slow rollout" thing about OpenAI
| competition. The chat / instruction models are continuously fine-
| tuned on real dialogues. To get these dialogues en masse, you
| need to deploy models to wide public. Otherwise, you will forever
| be on the losing side, if you can't quickly grab the streams of
| real time human-generated content.
|
| People at OpenAI are smart, they understood that quickly, GPT-4
| is available nearly everywhere, and lesser models are even free
| for anyone to use. This required hiring huge teams of moderators,
| but we are at land grab stage, everyone in the business needs to
| move fast and break a lot of things. However, GPT-4 and open
| source models are the only thing I can use. Bard "is not
| available in my country" (Switzerland), and the first thing that
| Claude access form is asking is whether I am based in US.
|
| Well, their loss.
| dataangel wrote:
| It's probably the GPUs, they don't have enough capacity to
| handle more users. My guess is that GPT4 set off a buying
| spree. Even for CPUs, I've recently heard lead times for
| Sapphire Rapids servers are 2-3 months, high end switches 6
| months, and those probably have way less demand.
| williamcotton wrote:
| If they are resource constrained and then opened up the flood
| gates resulting in poor performance and timeouts for every user
| it seems like it would sour more milk than otherwise.
| nl wrote:
| Is Bard still unavailable?
|
| It was unavailable to Australia until last week but was made
| more widely available at Google I/O.
|
| It's pretty good, too!
| s3p wrote:
| I think it's cloud limitations. Anthropic probably doesn't have
| the ability to scale up extremely fast and accomodating
| hundreds of millions of users probably isn't as easy for them
| as it is for OpenAI.
| okdood64 wrote:
| New to ML here, what's the difference between parameters and
| context?
| capableweb wrote:
| Other answers are already good, just offering yet another
| difference.
|
| Parameters is something that gets set indirectly via training,
| it's kept within the weights of the model itself.
|
| Context is what you as a user passes to the model when you're
| using it, it decides how much text you can actually pass it.
|
| Being able to pass more context means you can (hopefully) make
| it understand more things that wasn't part of the initial
| training.
| sghiassy wrote:
| Parameters is like the number of neurons in your brain
|
| Context is how much short term memory you can retain at any one
| time (think how many cards you can remember the order of in a
| deck of cards)
| Closi wrote:
| Paramters - number of internal variables/weights in the model
|
| Context - Length of input/output buffer (number of input/output
| tokens possible).
| nightski wrote:
| The discourse has made it seem that with context length larger is
| always better. I'm wondering if there is any degradation in
| quality of results when the context is scaled this large. Does it
| scale without loss of performance? Or is there a point where even
| though you can fit in a lot more information it causes the
| performance to degrade?
| rpcope1 wrote:
| Well, a larger context makes it easier to integrate other
| tools, like a vector database for information retrieval to jam
| into the context, and the more context, the more potentially
| relevant information can be added. For models like llama, where
| context is (usually) max 2K tokens, you're sort of limited as
| to how much potentially relevant information you can add when
| doing complex tasks.
| phillipcarter wrote:
| In a brief test, I found that the bigger context window only
| meant that I could stuff a whole schema into the input. It
| still hallucinated a value. When I plugged in a call to a
| vector embedding to only use the top k most "relevant" fields
| it did exactly what I wanted:
| https://twitter.com/_cartermp/status/1657037648400117760
|
| YMMV.
| koboll wrote:
| The fundamental problem seems to be that it's still slightly
| sub-GPT-3.5-quality, and even a long context window can't fix
| that. It will remember things from many many tokens ago, but
| it still doesn't reliably produce passable work.
|
| The combination of a GPT-4-quality model and a long context
| window will unlock a lot of applications that now rely on
| somewhat lossy window-prying hacks (i.e. summarizing chunks).
| But any model quality below that won't move the needle much
| in terms of what useful work is possible, with the exception
| of fairly simple summarization and text analysis tasks.
| pmoriarty wrote:
| _> The fundamental problem seems to be that it 's still
| slightly sub-GPT-3.5-quality_
|
| It really depends on what you use it for.
|
| I've found Claude better than GPT4 and even Claude+ at
| creative writing.
|
| It also tends to give more comprehensive explanations
| without additional prompting. So I prefer to have it,
| rather than GPT3.5 or 4, explain things to me.
|
| It's also free, which is another big win over GPT4.
| phillipcarter wrote:
| Maybe! I certainly look forward to that. Although in my
| testing GPT-4 also hallucinates a bit (less than gpt-3.5),
| and the latency is so poor that it's unworkable for our
| product.
| koboll wrote:
| Agreed. My heuristic is that GPT-4 is good for compile
| time tasks but bad for runtime tasks for both cost and
| speed reasons.
| dr_dshiv wrote:
| I find Claude significantly better than 3.5. I'd love to be
| able to make the case for that with data...
| og_kalu wrote:
| There are 2 main claude models. I'm guessing it's
| claude-v1.3 aka claude plus that you find much better
| than 3.5 ? That tracks if so.
| phillipcarter wrote:
| I've found for my use case that both claude-instant-* and
| claude-* are roughly on par with each other and gpt-3.5.
| claude-* seems to be the least inaccurate, but we also
| haven't put it into production like gpt-3.5, so it's hard
| to say for sure.
|
| In either case, the claude models are very good. I think
| they'd do fine in a real product. But there's definitely
| issues that they all have (or that my prompt engineering
| has).
| sanxiyn wrote:
| Since Chatbot Arena Leaderboard
| https://lmsys.org/blog/2023-05-10-leaderboard/ agrees
| with you, it's not just you.
| jlowin wrote:
| The 100k context was originally released only via API, but I just
| noticed that it's now available in the Claude web UI.
| greyman wrote:
| What is the URL of Claude web UI? I somehow cannot find it.
| Veen wrote:
| console.anthropic.com
| pmoriarty wrote:
| Also https://poe.com/Claude-instant-100k
| ChikkaChiChi wrote:
| Is there a place I can track all releases, announcements, and
| invite links?
___________________________________________________________________
(page generated 2023-05-15 23:01 UTC)