[HN Gopher] Our next-generation model: Gemini 1.5
       ___________________________________________________________________
        
       Our next-generation model: Gemini 1.5
        
       Author : todsacerdoti
       Score  : 746 points
       Date   : 2024-02-15 15:02 UTC (7 hours ago)
        
 (HTM) web link (blog.google)
 (TXT) w3m dump (blog.google)
        
       | crakenzak wrote:
       | Technical report: https://storage.googleapis.com/deepmind-
       | media/gemini/gemini_...
       | 
       | The 1 million token context window + Gemini 1.0 Ultra level
       | performance seems like it'll unlock a wide range of incredible
       | use cases!
       | 
       | HN, what are you going to use/build with this?
        
         | volkk wrote:
         | was this posted by an AI bot
        
           | crakenzak wrote:
           | Lol nope I'm a normal person. Gimme a captcha and I'll
           | (hopefully) solve it ;)
        
             | scarmig wrote:
             | Just gotta make sure the captcha requires a >1M token
             | context length to solve...
        
             | throwaway918274 wrote:
             | How do we know you're not an AI bot that figured out how to
             | hire someone from fiverr to solve captchas for you?
        
           | mrkstu wrote:
           | No, they're just applying their Twitter style engagement
           | strategy to HN for some reason...
        
       | code51 wrote:
       | Dear Google, please fix your names and versioning.
       | 
       | Gemini Pro, Gemini Ultra... but was 1.0?
       | 
       | now upgraded but again Gemini Pro? jumping from 1.0 to 1.5?
       | 
       | wait but not Gemini Pro 1.5... Gemini "1.5" Pro
       | 
       | What actually happened between 1.0 and 1.5?
        
         | lairv wrote:
         | This naming is terrible, if I understand correctly this is the
         | release of Gemini 1.5 Pro, but not Gemini 1.5 Ultra right ?
        
           | goalonetwo wrote:
           | Looks like the former PM of chat at google found a new job.
        
           | cchance wrote:
           | How is that hard to understand? Yes its gemini 1.5 pro, they
           | haven't released ultra or nano, like this isn't rocket
           | science they didnt introduce Gemini 1.5 ProLight or
           | something, lol its the Pro size model's 1.5 version.
        
             | lairv wrote:
             | The name of the blog post is "Our next-generation model:
             | Gemini 1.5", how am I supposed to infer from this that it
             | is only the 1.5 pro and not ultra ?
        
         | jjbinx007 wrote:
         | They can't decide on a single name for a chat application so I
         | think expecting them to come up with a sensible naming
         | suggestion is optimistic at best.
        
         | aqme28 wrote:
         | Furthermore, is a minor version upgrade two months later really
         | "next generation"?
        
           | philote wrote:
           | Well if it's from 1 to 1.5 then it's really 5 minor version
           | upgrades at once. And since 1.5 is halfway to 2 and you round
           | up, it's next generation!
        
           | AndroTux wrote:
           | Maybe it's not a "next generation" model, but rather their
           | next model for text generation ;)
        
             | cchance wrote:
             | I mean i don't see any other models watching and answering
             | questions about a 44 minute video lol
        
         | nkozyra wrote:
         | Their inability to name things sensibly has been called out for
         | years and it doesn't look like they care?
         | 
         | I'm not sure what the deal is, it has to be a marketing
         | hinderance as every major tech company is trying to claw their
         | way up the AI service mountain. Seems like the first step would
         | be cogent naming.
        
           | data-ottawa wrote:
           | It would have been better as Gemini Lite, Gemini, Gemini Pro,
           | and then v1, v1.5 for model bumps.
           | 
           | Ultra vs pro vs nano with Ultra unlocked by buying Gemini
           | Advanced is confusing.
           | 
           | I'm also not sure why they make base Gemini available after
           | you have Advanced, because presumably there's no reason to
           | use a worse model.
        
         | Alifatisk wrote:
         | I understod the transition as following.
         | 
         | Google Bard to Google Gemini is what they call Gemini 1.0.
         | 
         | Gemini consists of Gemini Nano, Gemini Pro, & Gemini Ultra.
         | 
         | Gemini Nano is for embedded and portable devices I guess? The
         | free version of Gemini (gemini.google.com) is Gemini Pro. The
         | paid version, called Gemini Advanced is using Gemini Ultra.
         | 
         | What we're reading now is about Gemini Pro version 1.0
         | switching to version 1.5 as of today.
        
           | meowface wrote:
           | That just made my head spin even more. (Like, I get it, but
           | it's just a very tortuous naming system.) The free version is
           | called Pro, Gemini Advanced is actually Gemini Ultra, the
           | less powerful version upgraded to the more powerful model but
           | the more powerful version is on the less powerful model.
           | 
           | People make fun of OpenAI for not using product names and
           | just calling it "GPT" but at least it's straightforward: 2,
           | 3, 3.5, 4. (On the API side it's a little more complicated
           | since there's "turbo" and "instruct" but that isn't exposed
           | to users, and turbo is basically the default.)
        
             | kweingar wrote:
             | But you don't pay for GPT-4, you pay for a product called
             | ChatGPT Plus, which allows you to write 40 messages to
             | GPT-4 within a three-hour time window, after which you need
             | to switch to 3.5 in the menu.
        
           | code51 wrote:
           | but if Vertex AI is using Gemini Ultra, then why makersuite
           | (aisuite now? hmmm) showing only "Gemini 1.0 Pro 001" (001: a
           | version inside a version)
           | 
           | and why have makersuite/aisuite in the first place, if Vertex
           | AI is the center for all things AI? and why aitestkitchen?
           | 
           | I'm seeing only Gemini 1.0 Pro on Vertex AI. So even if I
           | enabled Google Gemini Advanced (Ultra?), enabled Vertex AI
           | API access, I have to first be blessed by Google to access
           | advanced APIs.
           | 
           | It seems paying for their service doesn't mean anything to
           | Google at this point. As a developer, you have to jump
           | through hoops first.
        
             | Alifatisk wrote:
             | I think this answers why you can't see Ultra.
             | 
             | "Gemini 1.0 Ultra, our most sophisticated and capable model
             | for complex tasks, is now generally available on Vertex AI
             | for customers via allowlist."
             | 
             | https://cloud.google.com/blog/products/ai-machine-
             | learning/g...
        
           | growt wrote:
           | It was probably not a wise choise to give the model itself
           | and the product the same name: "Gemini Advanced is using
           | Gemini Ultra". Also: "The free version ... is Gemini Pro" -
           | is not what you usually see out there.
        
         | sho_hn wrote:
         | It's not that difficult.
         | 
         | Their LLM brand is now Gemini. Gemini comes in three different
         | sizes, Nano/Pro/Ultra.
         | 
         | They recently released 1.0 versions of each, most recently (a
         | few months after Nano and Pro) Ultra.
         | 
         | Today they are introducing version 1.5, starting with the Pro
         | size. They say 1.5 Pro offers comparable performance to 1.0
         | Ultra, along with new abilities (token window size).
         | 
         | (I agree Small/Medium/Large would be better.)
        
           | apwell23 wrote:
           | What you described is difficult.
        
             | mcmcmc wrote:
             | It's really not. Substitute Gemini for iPhone. Apple
             | releases an iPhone model in mini, standard, and pro lines.
             | They announce iPhone model+1 but are releasing the pro
             | version first. Still difficult?
        
               | apwell23 wrote:
               | > Apple releases an iPhone model in mini, standard, and
               | pro lines.
               | 
               | not an iphone user but just looked at iphone 15. Don't
               | see any mini version. I am guess 'standard' is called
               | just 'iphone' ? Is pro same thing as plus ?
               | 
               | https://www.apple.com/shop/buy-iphone/iphone-15
               | 
               | > Still difficult?
               | 
               | yes your example made it even more confusing.
        
               | mcmcmc wrote:
               | Now you're being intentionally difficult. Do you want it
               | to be cars? Last year $Automaker released $Sedan 2023 in
               | basic, standard, and luxury trims. This year $Automaker
               | announced $Sedan 2024 but so far have only announced the
               | standard trim. If I had meant the iPhone 15 specifically
               | I would've said iPhone 15. I think the 12 was the last
               | mini? The point is product families are often released in
               | generations (versions in the case of Gemini) and with
               | different available specs (ultra/pro/nano etc) that may
               | not all be released at the same time.
        
               | dpkirchner wrote:
               | Apple discontinued mini phones two generations back,
               | unfortunately.
        
               | sho_hn wrote:
               | I think it's the "iPhone +1 Mini is as fast as the old
               | Standard" that confuses people here. This is obvious and
               | expected but not how it's usually marketed I guess ...
        
               | chatmasta wrote:
               | So Google will be upgrading the version number of each
               | model at the same time? Based on other comments here,
               | that's not the case - some are 1.5 and some are 1?
               | 
               | Apple doesn't announce the iPhone 12 Mini and compare it
               | to the iPhone 11 Pro.
        
               | iamdelirium wrote:
               | Uhh, yes they do?
               | 
               | Did you watch the announcements for the M2 and M3 pros?
               | They compared it to the previous generations all the
               | time.
        
             | huytersd wrote:
             | How? Three models Nano/Pro/Ultra currently at 1.0. New
             | upgrades just increment the version number.
        
           | Alifatisk wrote:
           | They should remove the name Gemini Advanced and just stick to
           | one name
        
             | sho_hn wrote:
             | Agreed.
             | 
             | Gemini Advanced seems to be the brand name for the higher
             | price tier for the end-user frontend that gets you Ultra
             | access, similar how ChatGPT Plus gets you ChatGPT 4.
             | 
             | I get it, but it does beg the question whether you will
             | need Advanced now to get 1.5 Pro. Or does everyone get Pro,
             | making it useless to pay for 1.0 Ultra?
             | 
             | I still don't think it's _confusing_ , but that part is
             | definitely messy.
        
           | OJFord wrote:
           | > , starting with the Pro size
           | 
           | This is where it gets confusing IMO.
           | 
           | It's like if Apple announced macOS Blabahee, starting with
           | Mini, not long after releasing Pro and Air touting benefits
           | of Sonoma.
           | 
           | Also, just.. this is how TFA _begins_ :
           | 
           | > Last week, we rolled out our most capable model, Gemini 1.0
           | Ultra, [...] Our teams continue pushing the frontiers of our
           | latest models with safety at the core. They are making rapid
           | progress. [...] 1.5 Pro achieves comparable quality to 1.0
           | Ultra
           | 
           | Last week! And now we have next generation. And the wow is
           | that it's comparable to the best of the previous generation.
           | Ok fine at a smaller size, but also that's all we get anyway.
           | Oh and the _most_ capable remains the last generation one. As
           | long as it 's the biggest one.
        
             | crazygringo wrote:
             | It's almost exactly like Apple, actually, with their M1 and
             | M2 chips available in different sizes, launching at
             | different times in different products.
             | 
             | It's really not that confusing. There are different sizes
             | and different generations, coming out at different times.
             | This pattern is practically as old as computing itself.
             | 
             | I can't even imagine what alternative naming scheme would
             | be an improvement.
        
               | OJFord wrote:
               | Don't go thinking I'm an Apple 'fanboy', I don't have any
               | Apple devices at the moment, but I really can't imagine
               | them launching a next gen product that isn't better than
               | the best of the last gen.
               | 
               | I doubt they launched M2 MBAs while the MBP was running
               | M1, for example. Or more directly, a low-mid spec M3 MBP
               | while the top-spec M2 MBP (I assume that would out-
               | benchmark it?) still for sale and no comparable M3 chip
               | yet.
               | 
               | It's not having the matrix of size/power & generation
               | that's confusing, it's the 'next generation' one
               | initially launched not being the best. I think that's
               | mainly it for me anyway.
        
               | crazygringo wrote:
               | > _but I really can 't imagine them launching a next gen
               | product that isn't better than the best of the last gen._
               | 
               | But they have. The baseline M2 is significantly less
               | powerful than the M1 Max.
               | 
               | What Google's doing is basically exactly like that. It
               | happens all the time that the mid tier of the next
               | generation isn't as good as the top tier of the previous
               | generation. It might even be the norm.
        
             | matwood wrote:
             | > Last week! And now we have next generation.
             | 
             | Google got caught completely flat footed by OpenAI. I'm
             | going to cut them some slack that they want to show the
             | world a bit of flex with their AI chops as soon as they
             | have results.
        
           | Keyframe wrote:
           | What's Advanced then, chat? Also, by that, 1.5 Ultra is then
           | still to come and it'll show even bigger guns.
        
             | sho_hn wrote:
             | Yes, my understanding is also there will be a 1.5 Ultra.
             | 
             | It's however nowhere explicitly said that I could find. The
             | Technical Report PDF also avoids even hinting at it.
             | 
             | Advanced is a price/service tier for the end-user frontend.
             | At the moment it gets you 1.0 Ultra access vs. 1.0 Pro for
             | the free version. Similar to how ChatGPT Plus gives you 4
             | instead of 3.5.
             | 
             | I agree this part is messy. Does everyone who had Pro
             | already get 1.5 Pro? If 1.5 Pro is better than 1.0 Ultra,
             | why pay for Advanced? Is 1.5 Pro behind the Advanced
             | paywall? etc.
        
               | Keyframe wrote:
               | Ok, so from what I've gathered then from all of the
               | comments so far, primary confusion is that both Chat
               | service and llm models are named the same.
               | 
               | There are three models: nano/pro/ultra and all are at
               | v1.0
               | 
               | There are two tiers of chat service: basic and pro
               | 
               | There is AIStudio from google through which you can
               | interact with / use directly gemini llms.
               | 
               | Chat service Gemini basic (free) uses Gemini Pro 1.0 llm.
               | 
               | Chat service Gemini advanced uses Gemini Ultra 1.0 llm.
               | 
               | What was shown is ~~Ultra~~ Pro 1.5 LLM which is / will
               | be available to select few for preview to be used via
               | AIStudio.
               | 
               | That leaves a question, what's nano for, and is it only
               | used via AIStudio/API?
               | 
               | Jesus, Google..
        
               | sho_hn wrote:
               | No, what they showed is Pro 1.5. Only via API and on a
               | waitlist.
               | 
               | How this relates to the end-user chat service/price tiers
               | is still unknown.
               | 
               | The best scenario would be that they just move Gemini
               | free and Advanced tiers to Pro 1.5 and Ultra 1.5, I
               | guess.
        
               | Keyframe wrote:
               | Yes, you are right. I meant Pro. Let's see then.
        
               | j16sdiz wrote:
               | Nano is the on-device (Pixel phone) model.
        
           | mvkel wrote:
           | So there's Nano 1.0, Pro 1.5, Ultra 1.0, but Pro 1.5 can only
           | be accessed if you're a Vertex AI user (wtf is Vertex)?
           | 
           | That's very difficult.
        
             | sho_hn wrote:
             | It's a bit similar to how new OpenAI stuff is initially
             | usually partner-only or waitlisted.
             | 
             | Vertex AI is their developer API platform.
             | 
             | I agree OpenAI is a bit better at launching for customers
             | on ChatGPT alongside API.
        
           | pentagrama wrote:
           | Thank you, is more clear to me now. But I also read in some
           | Google announcement about "Gemini Advanced", do you know what
           | is that and the relation with the Nano/Pro/Ultra levels?
        
             | sho_hn wrote:
             | Gemini is also the brand name for the end-user web and
             | phone chatbot apps, think ChatGPT (app) vs. GPT-# (model).
             | 
             | Gemini Advanced is the paid subscription service tier that
             | at the moment gets you access to the Ultra model, similar
             | to how a ChatGPT Plus subscription gets you access to
             | GPT-4.
             | 
             | Honestly, they should have called this part Gemini Chat and
             | Gemini Chat Plus, but of course ego won't let them follow
             | the competitor's naming scheme.
        
           | screye wrote:
           | Gemini ultra 1.0 never went GA. So it is wierd that they'd
           | release 1.5 when most can't even get their hands on 1.0
           | ultra.
        
             | gkbrk wrote:
             | Isn't the paid version on https://gemini.google.com Gemini
             | 1.0 Ultra?
        
         | gmuslera wrote:
         | Maybe they should take a hint on Windows versions name scheme
         | and call the next version Gemini Meh.
        
           | apapapa wrote:
           | Are you talking about Xbox one?
        
             | mring33621 wrote:
             | No. Gemini Purple Plus Platinum Advanced Home Version
             | 11.P17
        
         | kccqzy wrote:
         | Dear OpenAI please fix your names and versioning. Why do you
         | have GPT-3 and GPT-3.5? What happened between 3 and 3.5? And
         | why isn't GPT-3 a single model? Why are there variations like
         | GPT-3-6.7B and GPT-3-175b? And why is there now a turbo
         | version? How does turbo compared to 4? And what's the
         | relationship between the end-user product ChatGPT and a
         | specific GPT model?
         | 
         | You see this problem isn't unique to Google.
        
         | lordswork wrote:
         | See https://news.ycombinator.com/item?id=39385230
        
         | cchance wrote:
         | This just means we'll be getting a Nano 1.5 and Ultra 1.5
         | 
         | and if Pro 1.5 is this good holy shit what will Ultra be...
         | 
         | Nano/Pro/Ultra are the model sizes, 1.0 or 1.5 is the version
        
       | hamburga wrote:
       | "One of the key differentiators of this model is its incredibly
       | long context capabilities, supporting millions of tokens of
       | multimodal input. The multimodal capabilities of the model means
       | you can interact in sophisticated ways with entire books, very
       | long document collections, codebases of hundreds of thousands of
       | lines across hundreds of files, full movies, entire podcast
       | series, and more."
        
         | skywhopper wrote:
         | This is nice, but it's hard to judge how nice without knowing
         | more about how much compute and memory is involved in that
         | level of processing. Obviously Google isn't going to tell us,
         | but without having some idea it's impossible to judge whether
         | this is an economically sustainable technology on which to
         | start building dependencies in my own business.
        
           | criddell wrote:
           | Sustainable? The countdown to cancellation on this project is
           | already underway.
           | 
           | "Does it make sense today?" is really the only question you
           | can ask and then build dependencies with the understanding
           | that the entire thing will go away in 3-7 years.
        
       | freediver wrote:
       | It would do Google a lot of service if every such announcement is
       | not met with 'join the waitlist' and 'talk to your vertex ai
       | team'.
        
         | baq wrote:
         | Yeah compared to e.g. Apple's 'here's our new iWidget 42 pro,
         | you can buy it now' it's at best disappointing.
        
           | apozem wrote:
           | Apple is good about only announcing real products you can
           | buy. They don't do tech demos. It's always, "here's a
           | problem. the new apple watch solves it. here're five other
           | things the watch does. $399."
        
             | erkt wrote:
             | The verdict is not yet out on the Vision Pro but otherwise
             | your point stands.
        
             | amarant wrote:
             | Apple is indeed masterful at advertising. Google, somewhat
             | ironically, is really bad at it.
        
               | matwood wrote:
               | Apple is masterful at product, not just the advertising
               | part. Google builds cool technology then fails and the
               | product side.
        
             | xnx wrote:
             | I agree that Apple does a better job, but wasn't Apple
             | Vision Pro announced 240 days before you could get it? I
             | think it's a pretty safe bet that Gemini 1.5 (or something
             | better) will be available anyone who wants to use it in the
             | next 240 days.
        
               | nacs wrote:
               | AI software release cycles are incredibly short right
               | now. Every month, there is some major development
               | released in a _usable right now_ form.
               | 
               | The first of it's type AR/VR hardware has,
               | understandably, a longer release cycle. Also, Apple
               | announced early to drive up developer interest.
        
               | manquer wrote:
               | AVP was the exception than norm.
               | 
               | Apple aggressively keeps products under wraps before
               | launch fires employees and vendors for leaking any sort
               | of news to the press .
               | 
               | Also an hardware product that is miles ahead of
               | competition in terms of components and also needs complex
               | setup workflow (for head and eyes) something apple has
               | not done before being 7-8 months after announcing is not
               | really comparable with a SaaS API in terms of delays
        
         | brianjking wrote:
         | 100%, I can't even use Imagen despite being an early tester of
         | Vertex.
        
         | belval wrote:
         | They can't do that because only they are the incorruptible
         | stewards empowered with the ability to develop these models,
         | making them accessible to the unwashed masses would be
         | irresponsible!
        
           | ethanbond wrote:
           | The victim complex on this topic is getting really old.
           | 
           | They're an enterprise software company doing an enterprise
           | sales motion.
        
             | belval wrote:
             | If that was true, they wouldn't have named it Gemini 1.5 to
             | follow the half-point increment of ChatGPT, they
             | desperately want "people" to care about their product to
             | gain back their mindshare.
             | 
             | Anthropic's Claude targets mostly business use cases and
             | you don't see them write self-congratulating articles about
             | Claude v2.1, they just pushed the product.
        
               | eropple wrote:
               | Mindshare is part of enterprise sales, yes.
               | 
               | I work at a very large company and everyone knows about
               | ChatGPT and Gemini (in part because we for our sins have
               | a good chunk of GCP stuff), but I doubt anyone here not
               | doing some LLM-flavored development has ever even heard
               | of Anthropic, let alone Claude.
        
               | KirinDave wrote:
               | And look at how well it's going for Claude. Their primary
               | claim to fame is being called "an annoying coworker" and
               | that's it.
               | 
               | Why would anyone look to form a contract with Anthropic
               | right now? I'd say they're in danger here, because their
               | models and offerings don't have clear value propositions
               | to customers.
        
               | ac29 wrote:
               | Claude 2.1 certainly got a news post when it was
               | released: https://www.anthropic.com/news/claude-2-1
               | 
               | Seems reasonably similar in tone to the Google post.
        
             | dkjaudyeqooe wrote:
             | > They're an enterprise software company
             | 
             | Really? Someone ought to tell them.
        
         | stavros wrote:
         | I'm generally an excited early adopter, but this kills my
         | excitement immediately. I don't know if Gemini is out (or which
         | Gemini is out) because I've associated Google with "you can't
         | try their stuff", so I've learned to just ignore everything
         | about Gemini.
        
           | hbn wrote:
           | Google is really good at diluting any possible anticipation
           | hardcore users might have for new stuff they do. 10 years ago
           | I loved when there was a big update to one of their Android
           | apps and I could sideload the apk from the internet to try it
           | out early. Then they made all those changes A/B tests
           | controlled by server side flags that would randomly turn
           | themselves on and off, and there was no way to opt in or out.
           | That was one of the (many) moves that contributed to my
           | becoming disenchanted with Android.
        
           | petre wrote:
           | There is a Gemini service that you can use with your Google
           | account, but it's kind of meh as it repeats your input, makes
           | all sorts of assumptions. I am confused as well about the
           | version. There's a link to another premium version (1.5?) on
           | its page, to which I don't have access to without completing
           | a quest which likely ends with a credit card input. That
           | kills it for me.
        
             | yborg wrote:
             | Or can't use ... I have a newish work account and
             | downloaded Gemini on a Pixel 8 Pro and get "Gemini isn't
             | available" and "Try again later" with no explanation of why
             | not and when.
        
               | petre wrote:
               | This is it. Not a phone app, did not install anything.
               | Maybe your account is not old enough? You're not missing
               | anything anyway.
               | 
               | https://gemini.google.com/
               | 
               | Look, it now has totally useless suggestions like it was
               | trained on burned out woke IT workers. I asked it about
               | the weather, sea temperature and wave height and period
               | in Malaga, which is much less boring than the choices it
               | came up with. First it tried to talk me out of it waving
               | away responsibility, then it provided useful climate
               | data, which I would have wasted too much time doing a
               | Google search on. I guess it's good for checking on the
               | weather if you can put up with the waivers. Also it knows
               | fishing for garfish in Denmark in May is not a total
               | waste of your time, a great way to experience local
               | culture and a sustainable activity.
               | 
               | I also asked it about the version: "I am currently
               | running on the Gemini Pro 1.01.5 model".
        
           | skybrian wrote:
           | I think the way to understand this is to realize that this
           | isn't targeted at a Hacker News audience and they don't care
           | what we think. The world doesn't revolve around us.
           | 
           | What's the goal? Maybe, being able to work with partners
           | without it being a secret project that will inevitably leak,
           | resulting in inaccurate stories in the press. What are non-
           | goals? Driving sales or creating anticipation with a mass
           | audience, like a movie trailer or an Apple product launch.
           | 
           | So they have to announce something, but most people don't
           | read Hacker News and won't even hear about it until later,
           | and that's fine with them.
        
         | jpeter wrote:
         | and not having to wait months if you live in EU
        
           | lxgr wrote:
           | What's worse is that I can't seem to find a way to let Google
           | know where I actually live (as opposed to where I am
           | temporarily traveling, what country my currently inserted SIM
           | card is from etc). And apparently there is no way to do this
           | at all without owning an Android device!
           | 
           | Apple at least lets me change this by moving my iTunes/App
           | Store account, which is its own ordeal and far from ideal,
           | but at least there's a defined process: Tell us where you
           | think you live, provide a form of payment from that place,
           | maybe we'll believe you.
        
             | TillE wrote:
             | Yeah Google aggressively uses geolocation throughout their
             | services, regardless of your language settings. The
             | flipside of that is that it's really easy to access the
             | latest Gemini or whatever by just using a VPN.
        
               | lxgr wrote:
               | Wait, does that mean if I subscribe to Gemini Pro in
               | country A where it's available (e.g. the US) but travel
               | to Europe, I can't use it?
               | 
               | I'm really frustrated by Google's attitude of "we know
               | better where you are than you do". People travel
               | sometimes and that's not the same thing as moving!
        
               | FergusArgyll wrote:
               | I signed up for all of their AI products when I was in
               | the US, some of them work while I'm out of country some
               | don't. I can't tell what the rule is...
        
               | lxgr wrote:
               | I really, really hate all of these geo heuristics. Sure,
               | don't advertise services to people outside of your
               | market, I get that. Do ask for a payment method from that
               | country too to provide your market-specific pricing if
               | you must.
               | 
               | But once I'm a paying customer, I want to use the thing
               | I'm paying for from where I am without jumping through
               | ridiculous hoops!
               | 
               | The worst variant of this I've seen is when you can
               | neither use _nor cancel the subscription_ from outside a
               | supported market.
        
               | FergusArgyll wrote:
               | To be clear, I didn't pay for any of them. I just signed
               | up for early access to every product that uses some form
               | of ML that can remotely be called "AI"...
               | 
               | Once I got accepted, some of them work outside of the US
               | and some don't
        
         | hobofan wrote:
         | Eh, I think it's about as bad as the OpenAI method of
         | officially announcing something and then "continuously rolling
         | it out to all subscribers" which may be anything between a few
         | days and months.
        
         | addandsubtract wrote:
         | Remember when Gmail was new and you needed an invite to join? I
         | guess Google is stuck in 2004.
        
           | bobchadwick wrote:
           | I'm embarrassed to admit that I bought a Gmail invite on eBay
           | for $6 when it was still invite-only.
        
             | agumonkey wrote:
             | Yielding a priceless anecdote
        
             | blagie wrote:
             | _shrug_ It probably gave you months of fun.
        
             | jprete wrote:
             | That's not entirely a waste, it would have given you a
             | better chance for an email address you wanted.
        
               | CydeWeys wrote:
               | Yeah. I ended up with an eight letter @gmail.com because
               | I dithered, but if I'd signed up by any means necessary
               | when I'd first heard of it, I would've gotten a four
               | letter one.
        
             | rocketbop wrote:
             | Nothing to be ashamed of. I think I might have bought a
             | Google Wave invite a couple of years later :/
        
             | spiffytech wrote:
             | I bartered on gmailswap.com, sending someone a bicentennial
             | 50C/ US coin in exchange for an invite.
             | 
             | The envelope made it to the recipient, but the coin fell
             | out in transit because I was young and had no idea how to
             | mail coinage. They graciously gave me the invite anyway.
        
               | ssteeper wrote:
               | Ah, to be young and clueless about coinage mailing.
        
             | LouisSayers wrote:
             | Well they did promise unlimited space - remember how it
             | kept growing? I guess until it didn't...
             | 
             | But still, compared to Hotmail etc the free storage space
             | (something like 1GB vs 10MB) was well worth $6
        
           | moffkalast wrote:
           | They don't seem to remember when that literally sunk Google+
           | because people had no use for a social network without their
           | friends on it.
        
         | bachmeier wrote:
         | This is bad practice across the board IMO. There seems to be an
         | idea that this builds anticipation for new products. Sounds
         | good in a PowerPoint presentation by an MBA but doesn't work in
         | practice. Six months (or more!) after joining a waitlist, I'm
         | not seeing it for the first time, so I don't really care when
         | yet another email selling me something hits my inbox. I may not
         | even open the email. This could be mitigated somewhat by at
         | least offering a demo, but that's rare.
        
           | bushbaba wrote:
           | Likely they have limited capacity and are alloting things for
           | highest paying and strategic customers
        
             | eitally wrote:
             | As someone who worked in Google Cloud's partnerships team,
             | the way the Early Access Program, not to mention the Alpha
             | --> Beta --> GA launch process for AI products, works, is
             | really dysfunctional. Inevitably what happens is that a few
             | strategic customers or partners get exceptionally early
             | (Alpha) access and work directly with the product team to
             | refine things, fix bugs and iron out kinks. This is great
             | and the way market driven product development _should_
             | work.
             | 
             | The issues arise with the subsequent stagegate graduation
             | processes, requirements and launches to less restricted
             | markets. It's inconsistent, the QoS pre-GA customers
             | receive is often spotty and the products come with no SLAs,
             | and -- just like Gmail on the consumer side -- things
             | frequently stay in EAP/Beta phase for years with no
             | reliable timeline for launch. ... and then often they're
             | killed before they get to GA, even though they may have
             | been being used by EAP customers for upwards of 1-2 years.
             | 
             | I drafted a new EAP model a few years ago when Google's
             | Cloud AI & Industry Solutions org was in the process of
             | productizing things like the retail recommendation engine
             | and Manufacturing Data Engine, and had all the buy-ins from
             | stakeholders on the GTM side ... but the CAIIS GM never
             | signed off. Subsequently, both the GM & VP Product of that
             | org have been forced out.
             | 
             | In my opinion, this is something Microsoft does very well
             | and Google desperately needs to learn. If they pick up
             | anything from their hyperscaler competitors it should be 1)
             | how to successfully become a market driven engineering
             | company from MSFT and 2) how to never kill products (and
             | not punish employees for only doing KTLO work) from AMZN.
        
             | moralestapia wrote:
             | So tactical, wow. Meanwhile OpenAI and others will eat
             | their lunch _again_.
        
               | bushbaba wrote:
               | Agreed. OpenAI also doesn't need to grock with
               | Shareholders fearing a GDPR like-fine. Sadly the larger
               | you are the bigger the pain is from small mistakes.
        
           | justrealist wrote:
           | One PM in 2005 knocked it out of the park with Gmail and
           | every Google PM since then has cargo-culted it.
        
         | kkzz99 wrote:
         | Its because they don't want you to actually use it and see how
         | far behind they are compared to other companies. These
         | announcement are meant to placate investors. "See, we are doing
         | a lot of SotA AI too".
        
           | Keyframe wrote:
           | You might be right, but other things from Google tell the
           | same story. For example, I recently tried to get ahold of
           | Pixel 8 Pro. Had to import one from UK, and when I did, turns
           | out that new feature of using thermometer on humans isn't
           | available outside of USA. It doesn't even seem that process
           | to certificate it outside of USA is in play. Google and
           | sales/support just aren't a thing like with Apple, as a
           | contrast. Which is a total shame. I know Google is strong, if
           | not strongest in the game of tech, they just need to get
           | their act together and I believe in them succeeding in that,
           | but sales and support was never in their DNA. Not sure if
           | that can be changed.
           | 
           | I'm more than happy to transfer my monthly $20 to google from
           | OpenAI, on top of my youtube and google one subscription.
           | It's up to Google to take it.
        
         | quatrefoil wrote:
         | It lets the company control the narrative, without the
         | distraction of fifty tech bloggers test-driving it and posting
         | divergent opinions or findings. Instead, the conversation is
         | anchored to what the company claims about the product.
         | 
         | It's interesting that it's the opposite of the gaming industry.
         | There, because the reviewers dictate the narrative, the
         | industry is better at ferreting out bogus claims. On the flip
         | side, loud voices sometimes steamroll over decent products
         | because of some ideological vendetta.
        
         | anonzzzies wrote:
         | And region based. Yawn.
        
         | mil22 wrote:
         | Totally agree with this. I can see the desire to show off, but
         | I don't understand how anyone can believe this is good
         | marketing strategy. Any initial excitement I get from reading
         | such announcements will be immediately extinguished when I
         | discover I can't use the product yet. The primary impression I
         | receive of the product is "vaporware." By the time it does get
         | released I'll already have forgotten the details of the
         | announcement, lost enthusiasm, and invested my time in a
         | different product. When I'm choosing between AI services, I'll
         | be thinking "no, I can't choose Gemini Pro 1.5 because it's not
         | available yet, and who knows when it will be available or how
         | good it'll be." Then when they make their next announcement,
         | I'll be even less likely to give it any attention.
        
         | bobvanluijt wrote:
         | I have access and will share some learnings soon
        
         | whywhywhywhy wrote:
         | After the complete farce that was the last 90% faked video of
         | their tech, maybe just give us a text box we can talk to the
         | thing and see it working ourselves next time.
         | 
         | Like it's shocking to me, are management really so clueless
         | they don't realize how far behind they are? This isn't 2010
         | Google, your not the company that made your success anymore and
         | in a decade the only two sure fire things that will still exist
         | are android and chrome. Search, Maps, Youtube are all in
         | precarious positions that the right team could dethrone.
        
         | summerlight wrote:
         | I believe this is a standard practice in Google whenever they
         | need to launch a change expected to consume huge resources and
         | they cannot reasonably predict the demand. Though I agree that
         | this is a bad PR practice; waitlist should be considered as a
         | compromise, not a PR technique.
        
         | crazygringo wrote:
         | These announcements are mainly for investors and other people
         | interested in planning purposes. It's important to know the
         | roadmap. More information is better.
         | 
         | I get that it's frustrating not to be able to play with it
         | immediately, but that's just life. Announcing things in advance
         | is still a valuable service for a lot of people.
         | 
         | Plus tons of people have been claiming that Google has somehow
         | fallen behind in the AI race, so it's important for them to
         | counteract that narrative. Making their roadmap more visible is
         | a legitimate strategy for that.
        
         | dpkirchner wrote:
         | I wrote off the PS5 because of waitlists. I was surprised to
         | learn just yesterday that they are now actually, honestly
         | purchasable (what I would consider "released").
         | 
         | I guess I let my original impression anchor my long-term
         | feelings about the product. Oh well.
        
         | TheFragenTaken wrote:
         | It's probably going to be dead/deprecated in a year, so maybe
         | there's a silver lining to how hard it is to get to use the
         | service. I, for one, wouldn't "build with Gemini".
        
         | animex wrote:
         | I don't think I've ever engaged with a product after "joining
         | their waitlist". By the time they end up utilizing that funnel,
         | competitors have already released feature upgrades or new
         | products cannibalizing their offering.
        
       | alphabetting wrote:
       | Massive whoa if true from technical report
       | 
       | "Studying the limits of Gemini 1.5 Pro's long-context ability, we
       | find continued improvement in next-token prediction and near-
       | perfect retrieval (>99%) up to at least 10M tokens"
       | 
       | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
         | stavros wrote:
         | Until I can talk to it, I care exactly zero.
        
           | peterisza wrote:
           | you can buy their stock if you think they'll make a lot of
           | money with their tech
        
             | HarHarVeryFunny wrote:
             | Well that's really the right question .. what can, and
             | will, Google do with this that can move their corporate
             | earnings needle in a meaningful way? Obviously they can
             | sell API access and integrate it into their Google docs
             | suite, as well as their new Project IDX IDE, but do any of
             | these have potential to make a meaningful impact ?
             | 
             | It's also not obvious how these huge models will fare
             | against increasingly capable open source ones like Mixtral,
             | perhaps especially since Google are confirming here that
             | MoE is the path forward, which perhaps helps limit how big
             | these models need to be.
        
               | plaidfuji wrote:
               | In the long run it could move the needle in enterprise
               | market share of Workspace and GCP. They have a lot of
               | room to grow and IMO have a far superior product to
               | O365/Azure which could be exacerbated by strong AI
               | products. Only problem is this sales cycle can take a
               | decade or more, and Google hasn't historically been
               | patient or strategic about things like this.
        
         | megaman821 wrote:
         | So, will this outperform any RAG approach as long as the data
         | fits inside the context window?
        
           | ArcaneMoose wrote:
           | Cost would still be a big concern
        
           | saliagato wrote:
           | basically, yes. Pinecone? Dead. Azure AI Search? Dead.
           | Quadrant? Dead.
        
             | _boffin_ wrote:
             | Prompt token cost still a variable.
        
           | TheGeminon wrote:
           | Outperform is dependent on the RAG approach (and this would
           | be a RAG approach anyways, you can already do this with
           | smaller context sizes). A simplistic one, probably, but
           | dumping in data that you don't need dilutes the useful
           | information, so I would imagine there would be at least
           | _some_ degradation.
           | 
           | But there is also the downside of "tuning" the RAG to return
           | less tokens you will miss extra context that could be useful
           | to the model.
        
             | megaman821 wrote:
             | Doesn't their needle/haystack benchmark seem to suggest
             | there is almost no dilution? They pushed that demo out to
             | 10M tokens.
        
           | CuriouslyC wrote:
           | A perfect RAG system would probably outperform everything in
           | a larger context due to prompt dilution, but in the real
           | world putting everything in context will win a lot of the
           | time. The large context system will also almost certainly be
           | more usable due to elimination of retrieval latency. The
           | large context system might lose on price/performance though.
        
           | chasd00 wrote:
           | are you going to upload 10M tokens to Gemini on every
           | request? That's a lot of data moving around when the user is
           | expecting a near realtime response. Seems like it would still
           | be better to only set the context with information relevant
           | to the user's prompt which is what plain rag does.
        
         | Workaccount2 wrote:
         | 10M tokens is absolutely jaw dropping. For reference, this is
         | approximately thirty books of 500 pages each.
         | 
         | Having 99% retrieval is nuts too. Models tend to unwind pretty
         | badly as the context (tokens) grows.
         | 
         | Put these together and you are getting into the territory of
         | dumping all your company documents, or all your departments
         | documents into a single GPT (or whatever google will call it)
         | and everyone working with that. Wild.
        
           | kranke155 wrote:
           | Seems like Google caught up. Demis is again showing an
           | incredible ability to lead a team to make groundbreaking
           | work.
        
             | huytersd wrote:
             | If any of this is remotely true, not only did it catch up,
             | it's wiping the floor with how useful it can be compared to
             | GPT4. Not going to make a judgement until I can actually
             | try it out though.
        
               | singularity2001 wrote:
               | In the demo videos gemini needs about a minute to answer
               | long context questions. Which is better than reading
               | thousands of pages yourself. But if it has to compete
               | with classical search and skimming it might need some
               | optimization.
        
               | huytersd wrote:
               | That's a compute problem, something that involves just
               | throwing money at the problem.
        
               | a_wild_dandan wrote:
               | Replacing grep or `ctrl+F` with Gemini would be the
               | user's fault, not Gemini's. If classical search for a job
               | already a performant solution, _use classical search_.
               | Save your tokens for jobs worthy of solving with a
               | general intelligence!
        
         | matsemann wrote:
         | Could you (or someone) explain what this means?
        
           | FergusArgyll wrote:
           | The input you give it can be very long. This can
           | qualitatively change the experience. Imagine, for example,
           | copy pasting the entire lord of the rings plus another 100
           | books you like and asking it to write a similar book...
        
             | teaearlgraycold wrote:
             | I doubt it's smart enough to write another (coherent, good)
             | book based on 103 books. But you could ask it questions
             | about the books and it would search and synthesize good
             | answers.
        
             | HarHarVeryFunny wrote:
             | I just googled it, and the LOTR trilogy apparently has a
             | total of 480,000 words, which brings home how huge 1M is!
             | It'd be fascinating to see how well Gemini could summarize
             | the plot or reason about it.
             | 
             | One point I'm unclear on is how these huge context sizes
             | are implemented by the various models. Are any of them the
             | actual raw "width of the model" that is propagated through
             | it, or are these all hierarchical summarization and chunk
             | embedding index lookup type tricks?
        
               | mburns wrote:
               | For another reference, Shakespeare's complete works are
               | ~885k words.
               | 
               | The Encyclopedia Britannica is ~44M words.
        
             | staticman2 wrote:
             | Reading Lord of the Rings, and writing a quality book in
             | the same style, are almost wholly unrelated tasks. Over 150
             | million copies of Lord of the Rings have been sold, but few
             | readers are capable of "writing a similar book" in terms of
             | quality. There's no reason to think this would work well.
        
           | ehsankia wrote:
           | It's how much text it can consider at a time when generating
           | a response. Basically the size of the prompt. A token is not
           | quite a word but you can think of it as roughly that.
           | Previously, the best most LLMs could do is around 32K. This
           | new model does 1M, and in testing they could put it up to 10M
           | with near perfect retrieval.
           | 
           | As the other comment mentions, you can paste the content of
           | entire books or documents and ask very pointed question about
           | it. Last year, Anthropic was showing off their 100K context
           | window, and that's exactly what they did, they gave it the
           | content of The Great Gatsby and asked it questions about
           | specific lines of the book.
           | 
           | Similarly, imagine giving it hundreds of documents and asking
           | it to spot some specific detail in there.
        
         | og_kalu wrote:
         | Another whoa for me
         | 
         | >Finally, we highlight surprising new capabilities of large
         | language models at the frontier; when given a grammar manual
         | for Kalamang, a language with fewer than 200 speakers
         | worldwide, the model learns to translate English to Kalamang at
         | a similar level to a person learning from the same content.
         | 
         | Results - https://imgur.com/a/qXcVNOM
        
           | usaar333 wrote:
           | I think this somewhat is mostly due to the ability to handle
           | high context lengths better. Note how Claude 2.1 already
           | highly outperforms GPT-4 on this task.
        
             | a_wild_dandan wrote:
             | GPT-4V turbo outperforms Claude on long contexts, IIRC.
             | Unless that's mistaken, I'd suspect a different explanation
             | for that task.
        
         | cchance wrote:
         | Did you watch the video of the Gemini 1.5 video recall after it
         | processed the 44 minute video... holy shit
        
       | ranulo wrote:
       | > This new generation also delivers a breakthrough in long-
       | context understanding. We've been able to significantly increase
       | the amount of information our models can process -- running up to
       | 1 million tokens consistently, achieving the longest context
       | window of any large-scale foundation model yet.
       | 
       | Sweet, this opens up so many possibilities.
        
       | tsunamifury wrote:
       | Google is like a nervous and insecure engineer -- blowing their
       | value by rushing the narrative and releasing too much too
       | confusingly fast.
        
         | sho_hn wrote:
         | When OpenAI raced through 3/3.5/4 it was "this team ships" and
         | excitement.
         | 
         | This cargo-cult hate train is getting tiresome. Half the
         | comments on anything Google-related are like this now, and it
         | doesn't add anything to the conversation.
        
           | epiccoleman wrote:
           | The difference, though, as someone who really doesn't have a
           | particular dog in this fight, is that I can go _use_ GPT-4
           | right now, and see for myself whether it 's as exciting as
           | the marketing materials say.
        
             | sho_hn wrote:
             | When OpenAI launched GPT-4, API access was initially behind
             | a waitlist. And they released multiple demo stills of LMM
             | capacilities on launch day that for months were in a
             | limited partner program before they became generally
             | available only 7 months later.
             | 
             | I also want the shiny immediately when I read about it, but
             | I also know when I am acting entitled and don't go spam
             | comment threads about it.
             | 
             | But really, mostly I mean this: It's fine to criticize
             | things, but when half a dozen people have already raised a
             | point in a thread, we don't need more dupes. It really
             | changes signal-to-noise.
        
           | mynameisvlad wrote:
           | Gemini Ultra was announced two months ago. It just launched
           | in the last week. It literally is still the featured post on
           | the AI section of their blog, above this announcement.
           | https://blog.google/technology/ai/
           | 
           | There's "this team ships" and there's "ok maybe wait until at
           | least a few people have used your product before you change
           | it all".
        
             | sho_hn wrote:
             | OpenAI announced GPT-4 image input in mid-March 2023 and
             | made it generally available on the API in November 2023.
             | 
             | Google announced a fancy model two months early and
             | released it in the promised timeframe.
             | 
             | Seems par for the course.
        
               | mynameisvlad wrote:
               | Did OpenAI then announce GPT-5 two weeks after launching
               | GPT-4?
               | 
               | No, of course they didn't. And you're comparing one
               | specific feature (image input) and equating it to a whole
               | model's release date.
               | 
               | Maybe compare apples to apples next time.
               | 
               | People pointing out release/announcement burnout is a
               | reasonable thing; people in general can only deal with
               | the "next new thing" with some breaks to process
               | everything.
        
               | sho_hn wrote:
               | I made the comparison because both companies demonstrated
               | advanced/extended abilities (model size, image input) and
               | shipped it delayed.
        
           | moralestapia wrote:
           | >"this team ships"
           | 
           | Because they actually shipped ... (!)
        
             | moralestapia wrote:
             | ... and they literally just did it again.
             | 
             | https://openai.com/sora
        
       | SushiHippie wrote:
       | Does this mean gemini ultra 1.0 -> gemini ultra 1.5 is the same
       | as gpt-4 -> gpt-4-turbo?
        
         | hackerlight wrote:
         | There's no Gemini Ultra 1.5 yet. Gemini Pro 1.5 is a smaller
         | model than Gemini Ultra 1.0.
        
       | prakhar897 wrote:
       | Can anyone explain how context length is tested? Do they prompt
       | something like:
       | 
       | "Remember val="XXXX" .........10M tokens later....... Print val"
        
         | NhanH wrote:
         | Yep, that's actually a common one
        
         | blovescoffee wrote:
         | Very simplified There are arrays (matrices) that are length 10M
         | inside the model.
         | 
         | It's difficult to make that array longer because training time
         | explodes.
        
         | halflings wrote:
         | Yep that's pretty much it! That's what they call needle in a
         | haystack. See:
         | https://github.com/gkamradt/LLMTest_NeedleInAHaystack
        
         | cchance wrote:
         | yep they hide things throughout the prompt and then ask it
         | about that specific thing, imagine hiding passwords in a giant
         | block of text and then being like, what was bobs password 10
         | million tokens later.
         | 
         | According to this it's remembering with 99% accuracy, which if
         | you think about it is NUTS, can you imagine reading a 22x 1000
         | page books, and remembering every single word that was said
         | with 100% accuracy lol
        
           | foota wrote:
           | Interestingly, there's a decent chance I'd remember if there
           | was an out of context passage saying "the password is
           | FooBar". I wonder if it would be better to test with minor
           | edits? E.g., "what color shirt was X wearing when..."
        
       | phoe18 wrote:
       | The branding is very confusing, shouldn't this be Gemini Pro 1.5
       | since the most capable model is called Ultra 1.0?
        
         | macawfish wrote:
         | Extremely confusing!
        
           | butler14 wrote:
           | Maybe they use their own generative AI to do their branding
        
         | dkjaudyeqooe wrote:
         | Can anyone lay out the various models and their features or
         | point to a resource?
         | 
         | I asked the free model (whatever that is) and it wasn't very
         | helpful, alterating betweens a sales bot for Ultra and being
         | somewhat confused itself.
         | 
         | Edit: apparently it goes 1.0 Pro, 1.0 Ultra, 1.5 Pro, 1.5 Ultra
         | and so on.
        
           | Alifatisk wrote:
           | Here's the models,
           | https://news.ycombinator.com/item?id=39304270 This is about
           | Gemini Pro going from version 1.0 to 1.5, nothing else.
           | 
           | Gemini ultra 1.0 is still on version 1.0
        
             | dkjaudyeqooe wrote:
             | That isn't right. The Pro/Ultra exists within each version.
             | 
             | If you look at the Gemini report it refers to "Gemini 1.5",
             | then refers to "Gemini 1.5 Pro" and "Gemini 1.0 Pro" and
             | "Gemini 1.5 Pro".
        
               | Alifatisk wrote:
               | Okey, so if I understand this correctly:
               | 
               | - Gemini 1.5 is the new version of the model Gemini.
               | 
               | - They are at the moment testing it on Gemini Pro and
               | calling it Gemini Pro 1.5
               | 
               | - The testing has shown that Gemini Pro 1.5 is delivering
               | the same quality as Gemini Ultra 1.0 while using less
               | computing power
               | 
               | - Gemini Ultra is still using Gemini 1.0 at the moment
        
             | lordswork wrote:
             | Here's an updated table, with version numbers included and
             | their status:                  Gemini Models
             | gemini.google.com
             | ------------------------------------        Gemini 1.0 Nano
             | Gemini 1.0 Pro        -> Gemini (free)        Gemini 1.0
             | Ultra      -> Gemini Advanced ($20/month)        Gemini 1.5
             | Pro        -> announced on 2024-02-15 [1]        Gemini 1.5
             | Ultra      -> no public announcements (assuming it's
             | coming)
             | 
             | [1]: https://storage.googleapis.com/deepmind-
             | media/gemini/gemini_...
             | 
             | For history of pre-Gemini models at Google, see:
             | https://news.ycombinator.com/item?id=39304441
        
               | Alifatisk wrote:
               | Oh, it's you again! Thanks for the update
        
         | UncleMeat wrote:
         | Google is somehow truly awful at this. I thought it was funny
         | when branding messes happened in 2017. I cried when they
         | announced "Google Meet (original)." Now I don't even know what
         | to do.
         | 
         | I'm stunned that Google hasn't appointed some "name veto
         | person" that can just say "no, you aren't allowed to have three
         | different things called 'Gemini Advanced', 'Gemini Pro', and
         | 'Gemini Ultra.'" Like surely it just takes Sundar saying "this
         | is the stupidest fucking thing I've ever seen" to some SVP to
         | fix this.
        
           | meowface wrote:
           | And somehow the more advanced one is still on 1.0 (for now)
           | and the less advanced one is on 1.5.
        
             | kccqzy wrote:
             | That's like saying it doesn't make sense for Apple to
             | release M3 Pro without simultaneously releasing M3 Ultra.
        
               | meowface wrote:
               | That's very different.
        
               | kccqzy wrote:
               | The only thing that's different is the standard people
               | apply to different companies due to their biases. There
               | are more Apple fanboys on HN than Google fans (Of course,
               | since Google's reputation has been going down for quite a
               | while). Therefore Apple gets a pass. Classic double
               | standard.
        
         | seydor wrote:
         | We will ask what its real name is as soon as it becomes
         | sentient
        
         | iamdelirium wrote:
         | No? Do you call it the iPhone Pro 15 or the iPhone 15 Pro?
         | Their naming makes sense if you follow most consumer
         | technology.
        
         | summerlight wrote:
         | This is something close to CPU versioning. You have two axis;
         | performance branding and its generation. Nano, Pro and Ultra is
         | something similar to i3, i5 and i7. The numbered versions 1.0,
         | 1.5, ... can be mapped to 13th gen, 14th gen, ... so on. And
         | people usually don't need to understand the generation part
         | this unless they're enthusiasts.
        
       | arange wrote:
       | signup on mobile too big, doesn't fit submit button :\
        
       | guybedo wrote:
       | looks interesting enough that i wanted to give Gemini a try and
       | join the waitlist.
       | 
       | And i thought it would be easy, what a rookie mistake.
       | 
       | Looks like "France" isn't on the list of available regions for Ai
       | Studio ?
       | 
       | Now i'm trying to use Vertex AI, not even sure what's the
       | difference with Ai Studio, but it seems it's available.
       | 
       | So far i've been struggling for 15 minutes through a maze of
       | google cloud pages: console, docs, signups. No end in sight,
       | looks like i won't be able to try it out
        
         | IanCal wrote:
         | It's not available outside of a private preview yet. The page
         | says you can use 1.0 ultra in vertex but it's not available to
         | me in the UK.
         | 
         | I can't get on the waitlist, because the waitlist link
         | redirects to aistudio and I can't use that.
         | 
         | I should stop expecting that I can use literally anything
         | google announces.
        
       | simonw wrote:
       | I'd love to know how much a 1 million token prompt is likely to
       | cost - both in terms of cash and in terms of raw energy usage.
        
         | bearjaws wrote:
         | Cannot emphasize enough, even with the improvements in context
         | handling I imagine 128k tokens costs as much as 16k tokens did
         | previously.
         | 
         | So 1M tokens is going to be astronomical.
        
         | empath-nirvana wrote:
         | When you account for this, you have to consider how much it
         | would cost to have a human perform the same task.
        
       | foliveira wrote:
       | >"Gemini 1.5 Pro (...) matches or surpasses Gemini 1.0 Ultra's
       | state-of-the-art performance across a broad set of benchmarks."
       | 
       | So Pro is better than Ultra, but only if the version numbers are
       | higher?
        
         | denysvitali wrote:
         | Yes, but you'd have to wait for Gemini Pro Max next year to see
         | the real improvements
        
         | renewiltord wrote:
         | Isn't that usually the case with many products? Like the M3 Pro
         | CPU in the new Macs is more powerful than the M1 Max in the old
         | Macs.
         | 
         | The Nano < Pro < Ultra is an in-revision thing. For their LLMs
         | it's a size thing. Then there's newer releases of Nano, Pro,
         | and Ultra. Some Pro might be better than some older Ultra.
         | 
         | A lot of people seem confused about this but it feels so easy
         | to understand that it's confusing to me that anyone could have
         | trouble.
        
           | devindotcom wrote:
           | Apple didn't release the M3 Pro a week after the M1 Max
        
             | renewiltord wrote:
             | Adam Osborne's wife was one of my dad's patients so I'm not
             | unacquainted with the risk of early announcements. But
             | surely they do not prevent comprehension.
        
       | thiago_fm wrote:
       | I like that they are rushing with this and don't care enough to
       | make it Gemini 2 or even really release it, to me it looks like
       | they are concerned to share progress.
       | 
       | Hope they do a good job and once OpenAI releases GPT 5 they are
       | competitive with it with their offerings, it will be better for
       | everyone.
        
       | kaspermarstal wrote:
       | Incredible. RAG will be obsolete in a year or two.
        
         | hackernoteng wrote:
         | It's already obsolete. It doesn't work except for trivial cases
         | which have no real value.
        
         | jeanloolz wrote:
         | Obsolete if you don't take cost in consideration. Having 10
         | millions of token going through each layer of the LLM is going
         | to cost a lot of money each time. At gpt4 rate that could mean
         | 200 dollars for each inference
        
       | scarmig wrote:
       | The technical report: https://storage.googleapis.com/deepmind-
       | media/gemini/gemini_...
        
       | jpeter wrote:
       | OpenAI has no Moat
        
         | jklinger410 wrote:
         | They only have a head start, and the lead is closing
        
         | seydor wrote:
         | hence why it's Open
        
         | rvz wrote:
         | This. He's right you know.
         | 
         | OpenAI is extremely overvalued and Google is closing their lead
         | rapidly.
        
           | fnordpiglet wrote:
           | Is there any meaningful valuation on OpenAI? It's not for
           | sale, there is no market.
           | 
           | Google ... has no ability to commercialize anything. Their
           | only commercial successes are ads and YouTube. Doing
           | deceptive launches and flailing around with Gemini isn't
           | helping their product prospects. I wouldn't take a bet
           | between open ai and anyone, but I also wouldn't take a bet on
           | Google succeeding commercially on anything other than
           | pervasive surveillance and adware.
        
         | wrsh07 wrote:
         | A reference to the good doc:
         | https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...
         | 
         | While I'm linking semianalysis, though, it's probably worth
         | talking about how everyone except Google is GPU poor:
         | https://www.semianalysis.com/p/google-gemini-eats-the-world-...
         | (paid)
         | 
         | > Whether Google has the stomach to put these models out
         | publicly without neutering their creativity or their existing
         | business model is a different discussion.
         | 
         | Google has a serious GPU (well, TPU) build out, and the fact
         | that they're able to train moe models on it means there aren't
         | any technical barriers preventing them from competing at the
         | highest levels
        
           | Keyframe wrote:
           | they also have internet.zip and all of its repo history as
           | well as usenet and mails etc.. which others don't.
        
         | anonyfox wrote:
         | but GPT-4 is nearly a year old now, I'd wait for the next
         | release of OAI before judgement. Probably rather soonish now I
         | would expect.
        
       | gpjanik wrote:
       | 0 trust to what they put out until I see it live. After the last
       | "launch" video which was fundamentally a marketing edit not
       | showing the real product, I don't trust anything coming out of
       | Google that isn't an instantly testable input form.
        
       | losvedir wrote:
       | If I understand correctly, they're releasing this for Pro but not
       | Ultra, which I think is akin to GPT 3.5 vs 4? Sigh, the naming is
       | confusing...
       | 
       | But my main takeaway is the huge context window! Up to a million,
       | with more than 100k tokens right now? Even just GPT 3.5 level
       | prediction with such a huge context window opens up a lot of
       | interesting capabilities. RAG can be super powerful with that
       | much to work with.
        
         | danpalmer wrote:
         | The announcement suggests that 1.5 Pro is similar to 1.0 Ultra.
        
           | benopal64 wrote:
           | I am reaching a bit, however, I think its a bit of a
           | marketing technique. The Pro 1.5 being compared to the Ultra
           | 1.0 model seems to imply that they will be releasing a Ultra
           | 1.5 model which will presumably have similar characteristics
           | to the new Pro 1.5 model (MOE architecture w/ a huge context
           | window).
        
             | danpalmer wrote:
             | Apparently the technical report implies that Ultra 1.5 is a
             | step-up again, I'm not sure it's just context length, that
             | seems to be orthogonal in everything I've read so far.
        
         | ygouzerh wrote:
         | So Pro and Ultra are from my understanding link to the number
         | of parameters. More parameters means more reasonning
         | capabilities, but more compute needed.
         | 
         | So Pro is like the light and fast version and Ultra the
         | advanced and expensive one.
        
         | cchance wrote:
         | It's sizes
         | 
         | Nano/Pro/Ultra are model SIZES. 1.0/1.5 is generations of the
         | architecture.
        
         | amf12 wrote:
         | Maybe this analogy would help: iPhone 15, iPhone Pro 15, iPhone
         | Pro Max 15 and then iPhone Pro 15.5
        
       | golergka wrote:
       | In one of the demos, it successfully navigates a threejs demo and
       | finds the place to change in response to a request.
       | 
       | How long until it shows similar results on middle-sized and large
       | codebases? And do the job adequately?
        
         | kypro wrote:
         | 1-2 years probably. There will still be a question around who
         | determines what "adequately" is for a while though. Presumably
         | even if an LLM can do something in theory you wouldn't actually
         | want it doing anything without human oversight.
         | 
         | And we should keep in mind that to understand a code change in
         | depth is often just as much work as making the change. When
         | review PRs I don't really know exactly what every change is
         | doing. I certain haven't tested it to be 100% certain I
         | understand fully. I'm just checking the logic looks mostly
         | right and that I don't see anything clearly wrong, and even
         | then I'll often need to ask for clarifications why something
         | was done.
         | 
         | I can't imagine LLMs being used in most large code bases for a
         | while yet. They'd probably need to be 99.9% reliable before we
         | can start trusting them to make changes without verifying every
         | line.
        
         | simon_kun wrote:
         | Today.
        
       | pryelluw wrote:
       | Gemini (or whatever google ai) will be all about ads. I'm not
       | adopting this shit. Their whole business model is ads. Why would
       | I adopt a product from a company that only cares about selling
       | more ads?
        
         | Alifatisk wrote:
         | Google One's business model is not ads?
         | 
         | I mention Google One because you can access Gemini Ultra
         | through it.
        
           | imp0cat wrote:
           | All their services are just a way to get more information
           | about their users so they can serve them ads.
           | 
           | Those Gemini queries will be no exception.
        
             | sodality2 wrote:
             | Not true - Gemini looks to be marketed towards companies,
             | where it's far more profitable to just charge thousands of
             | dollars. Ads wouldn't fund AI usage anyway. GPU's are
             | extremely expensive (even Google's fancy TPU's).
        
         | snapcaster wrote:
         | Agreed, people continually forget that Google has fundamentally
         | failed at everything besides selling ads despite decades of
         | moonshots and other attempts to shift the business. Very
         | skeptical that any company getting 80% revenue from ads will be
         | able to resist the pressure to advertise
        
       | royletron wrote:
       | Is there a reason this isn't available in the
       | UK/France/Germany/Spain but is in available in Jersey... and
       | Tuvalu?
        
         | onlyrealcuzzo wrote:
         | EU regulations and fines.
        
         | vibrolax wrote:
         | Probably because EU/national governments have regulations with
         | respect to the safety and privacy of the users, and the
         | purveyors must evaluate the performance of their products
         | against the regulatory standards.
        
       | seydor wrote:
       | Onwards to a billion tokens
        
       | fernandotakai wrote:
       | i saw this announcement on twitter and i was excited to check it
       | out, only to see that "we're offering a limited preview of 1.5
       | Pro to developers and enterprise customers via AI Studio and
       | Vertex AI".
       | 
       | please google, only announce things when people can actually use
       | it.
        
       | xyzzy_plugh wrote:
       | I miss when I didn't have to scroll to read a single tweet.
        
         | ComputerGuru wrote:
         | Twitter has that functionality natively now, but I don't know
         | if you have to be a pro user to access. It's the book icon in
         | the upper-right corner of the first tweet in a series. Links to
         | this, but it looks different when I view it in incognito vs
         | logged in:
         | https://twitter.com/JeffDean/thread/1758146022726041615
        
       | og_kalu wrote:
       | >Finally, we highlight surprising new capabilities of large
       | language models at the frontier; when given a grammar manual for
       | Kalamang, a language with fewer than 200 speakers worldwide, the
       | model learns to translate English to Kalamang at a similar level
       | to a person learning from the same content.
       | 
       | Results - https://imgur.com/a/qXcVNOM
       | 
       | From the technical report
       | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
         | poulpy123 wrote:
         | > at a similar level to a person learning from the same
         | content.
         | 
         | That's an incredibly low bar
        
           | ithkuil wrote:
           | It's incredible how fast goalposts are moving.
           | 
           | The same feat one year ago would have been almost
           | unbelievable.
        
           | KeplerBoy wrote:
           | Since when are we expecting super-human capabilities?
        
             | andsoitis wrote:
             | And in fact it already is super human. Show me a single
             | human who can translate amongst 10+ languages across
             | specialized domains in the blink of an eye.
        
               | empath-nirvana wrote:
               | Chat GPT has been super human in a lot of tasks even
               | since 3.5.
               | 
               | People point out mistakes it makes that no human would
               | make, but that doesn't negate the super-human performance
               | it has at other tasks -- and the _breadth_ of what it can
               | do is far beyond any single person.
        
               | KeplerBoy wrote:
               | Where exactly does it have super-human performance? Above
               | average and expert-level? Sure, I'd agree, but I haven't
               | experienced anything above that.
        
           | elevatedastalt wrote:
           | :muffled sounds of goalposts being shifted in the distance:
           | 
           | Just a few years ago we used to clap if an NLP model could
           | handle negation reliably or could generate even a paragraph
           | of text in English that was natural sounding.
           | 
           | Now we are at a stage where it is basically producing reams
           | of natural sounding text, performing surprisingly well on
           | reasoning problems and translation of languages with barely
           | any data despite being a markov chain on steroids, and what
           | does it hear? "That's an incredibly low bar".
        
             | glenstein wrote:
             | I'm going to keep beating this dead horse, but if you were
             | a philosophy nerd in the 80s, 90s, 00s etc you may know
             | that debates RAGED over whether computers could ever, even
             | in principle do things that are now being accomplished on a
             | weekly basis.
             | 
             | And as you say, the goalposts keep getting moved. It used
             | to be claimed that computers could never play chess at the
             | highest levels because that required "insight". And
             | whatever a computer could do, it could never to that extra
             | special thing, that could only be described in magical
             | undefined terms.
             | 
             | I just hope there's a moment of reckoning for decades upon
             | decades of arguments, deemed academically respectable, that
             | insisted that days like these would never come.
        
               | elevatedastalt wrote:
               | Honestly. I am ok with having greater and greater goals
               | to accomplish but this sort of dismissive attitude really
               | puts me off.
        
               | empath-nirvana wrote:
               | Forget goalpost shifting, people frequently refuse to
               | admit that it can do things that it obviously does,
               | because they've never used it themselves.
        
               | mewpmewp2 wrote:
               | Listen, you little ...
        
           | zacmps wrote:
           | > The author (the human learner) has some formal experience
           | in linguistics and has studied a variety of languages both
           | formally and informally, though no Austronesian or Papuan
           | languages
           | 
           | From the language benchmark (parentheses mine).
        
           | JyB wrote:
           | Jarring you're not adding more context to your comment.
        
         | seydor wrote:
         | what if we ask it to translate an undeciphered language
        
           | dougmwne wrote:
           | It produces basically random translations. This is covered in
           | the 0-shot case where no translation manual was included in
           | the context. Due to how rare this language is, it's
           | essentially untranslated in the training corpus.
        
           | og_kalu wrote:
           | If you mean to dump random passages of text with no parallel
           | corpora or grammar instructions then it won't do better than
           | random.
           | 
           | That said, I think that if you gave a LLM language text to
           | predict during training, I believe that even if no parallel
           | corpora exists during training, we could have a LLM that
           | could still translate that language to some other language it
           | also trained on.
        
             | seydor wrote:
             | What if we added a bunch of linguistic analysis books or
             | something
        
       | uptownfunk wrote:
       | Google is a public company. Anything and everything will be
       | scrutinized very heavily by shareholders. Of course how Zuck
       | operates very different than Sindar.
       | 
       | What are they doing with their free cash is my question. Are they
       | waiting for the LLM bubble to pop to buy some of these companies
       | at a discount?
        
       | ComputerGuru wrote:
       | The context window size - if it really works as advertised - is
       | pretty ground-breaking. It would replace the need to RAG or fine
       | tune for one-off (or few-off) analys{is,es} of input streams
       | cheaper and faster. I wonder how they got past the input token
       | stuffing problems everyone else runs into.
        
         | jcuenod wrote:
         | It won't remove the use of RAG at all. That's like saying,
         | "wow, now that I've upgraded my 128GB HDD to 1TB, I'll never
         | run out of space again."
        
           | madisonmay wrote:
           | It's more like saying "I've upgraded to 128GB of RAM, I'll
           | never use my disk again".
        
           | sebzim4500 wrote:
           | 10 TB for an accurate proportion.
           | 
           | And I think people who buy a laptop with a 1TB SSD generally
           | don't run out of space, at least I don't.
        
         | lumost wrote:
         | They are almost certainly using some form of sparse attention.
         | If you linearize the attention operation, you can scale up to
         | around 1-10M tokens depending on hardware before hitting memory
         | constraints. Linearization works off the assumption that for a
         | subsequence of X tokens out M tokens, where M os much greater
         | than X there are likely only K tokens which are useful for the
         | attention operation.
         | 
         | There are a bunch of techniques to do this, but it's unclear
         | how well any of them scale.
        
           | ein0p wrote:
           | Not "almost", but certainly. Dense attention is quadratic,
           | not even Google would be able to run it at an acceptable
           | speed. Their model is not recurrent - they did not have the
           | time yet (or resources - believe it or not, Google of 2023-24
           | is very compute constrained) to train newer SSM or recurrent
           | based models at practical parameter counts. Then there's the
           | fact that those models are far harder to train due to
           | instabilities, which is one of the reasons why you don't yet
           | see FOSS recurrent/SSM models that are SOTA at their size or
           | tokens/sec. With sparse attention, however, long context
           | recall will be far from perfect, and the longer the context
           | the worse the recall. That's better than no recall at all (as
           | in a fully dense attention model which will simply lop off
           | the preceding parts of the conversation), but not by a hell
           | of a lot.
        
         | popinman322 wrote:
         | vs RAG: RAG is good for searching across >billions of tokens
         | and providing up-to-date information to a static model. Even
         | with huge context lengths it's a good idea to submit high
         | quality inputs to prevent the model from going off on tangents,
         | getting stuck on contradictory information, etc..
         | 
         | vs fine tuning: smaller, fine-tuned models can perform better
         | than huge models in a decent number of tasks. Not strictly
         | fine-tuning, but for throughput limited tasks it'll likely
         | still be better to prune a 70B model down to 2B, keeping only
         | the components you need for accurate inference.
         | 
         | I can see this model being good for taking huge inputs and
         | compressing them down for smaller models to use.
        
         | nbardy wrote:
         | RAG will stick around, at some point you want to retrieve
         | grounded information samples to inject in the context window.
         | RAG+long context just gives you more room for grounded context.
         | 
         | Think building huge relevant context on topics before
         | answering.
        
         | torginus wrote:
         | Tbh, I haven't read the paper, but I think it's pretty self-
         | evident that large contexts aren't cheap - the AI has to comb
         | through every word of the context for each successive generated
         | token at least once, so it's going to be at least linear.
        
       | Alifatisk wrote:
       | I remember one of the biggest advantages with Google Bard was the
       | heavily limited context window. I am glad Google is now actually
       | delivering some exciting news now with Gemini and this gigantic
       | token size.
       | 
       | Sure it's a bummer that they slap the "Join the waiting list",
       | but it's still interesting to read about their progress and
       | competing with ClosedAi (OpenAi).
       | 
       | One last thing I hope they fix is the heavily morally and
       | ethically guardrail, sometimes I can barely ask proper questions
       | without it triggering Gemini to educate me about what's right and
       | wrong. And when I try the same prompt with ChatGPT and Bing ai,
       | they happily answer.
        
         | elevatedastalt wrote:
         | "biggest advantages with Google Bard"
         | 
         | Did you mean disadvantages?
        
       | CrypticShift wrote:
       | Most data accumulates gradually (e.g., one email at a time, one
       | line of text at a time across various documents). Is this huge
       | 10M scale of context window relevant to a gradual, yet constant,
       | influx of data (like a prompt over a whole google workspace) ?
        
       | Imnimo wrote:
       | This is the first time I've been legitimately impressed by one of
       | Google's LLMs (with the obvious caveat that I'm taking the
       | results reported in their tech report at face value).
        
       | sremani wrote:
       | I have Gemini Advanced, do I get access to this? Google is giving
       | Microsoft run for money for branding confusion.
        
         | Alifatisk wrote:
         | Not yet, Gemini advanced is using Gemini Ultra, not Gemini pro.
        
           | Ecstatify wrote:
           | Gemini advanced is terrible.
           | 
           | I asked it to rephrase "Are the original stated objectives
           | still relevant?"
           | 
           | It's starts going on about Ukraine and Russia.
           | 
           | https://g.co/gemini/share/ddb3887f79e2
        
             | Alifatisk wrote:
             | I think it took the whole context of the converstion into
             | consideration, you should create a new converstaion instead
             | and see if it responds differently.
             | 
             | Or you could be more specific, like "Rephrase the following
             | sentence: 'Are the original stated objectives still
             | relevant?' in a formal way, respond with one option only."
        
               | Ecstatify wrote:
               | It was a new conversation. I've never mentioned Russia or
               | Ukraine in any conversation ever.
        
               | Alifatisk wrote:
               | That's so weird, yet interesting. What happens if you
               | open a new convo again and enter the same prompt?
        
           | piva00 wrote:
           | I thought I wouldn't but I'm getting really, really confused
           | with the naming and branding of what Gemini is a model and
           | which is a product. Advanced, Pro, Ultra, seemingly Pro is
           | getting better than Ultra? And Advanced is the product using
           | the Ultra underlying model?
           | 
           | Ugh, my brain.
        
           | tapoxi wrote:
           | I've read this sentence three times, wow what horrible
           | branding.
        
       | vessenes wrote:
       | The white paper is worth a read. The things that stand out to me
       | are:
       | 
       | 1. They don't talk about how they get to 10M token context
       | 
       | 2. They don't talk about how they get to 10M token context
       | 
       | 3. The 10M context ability wipes out most RAG stack complexity
       | immediately. (I imagine creating caching abilities is going to be
       | important for a lot of long token chatting features now, though).
       | This is going to make things much, much simpler for a lot of use
       | cases.
       | 
       | 4. They are pretty clear that 1.5 Pro is better than GPT-4 in
       | general, and therefore we have a new LLM-as-judge leader, which
       | is pretty interesting.
       | 
       | 5. It seems like 1.5 Ultra is going to be highly capable. 1.5 Pro
       | is already very very capable. They are running up against very
       | high scores on many tests, and took a minute to call out some
       | tests where they scored badly as mostly returning false
       | negatives.
       | 
       | Upshot, 1.5 Pro looks like it _should_ set the bar for a bunch of
       | workflow tasks, if we can ever get our hands on it. I 've found
       | 1.0 Ultra to be very capable, if a bit slow. Open models
       | downstream should see a significant uptick in quality using it,
       | which is great.
       | 
       | Time to dust out my coding test again, I think, which is: "here
       | is a tarball of a repository. Write a new module that does X".
       | 
       | I really want to know how they're getting to 10M context, though.
       | There are some intriguing clues in their results that this isn't
       | just a single ultra-long vector; for instance, their audio and
       | video "needle" tests, which just include inserting an image that
       | says "the magic word is: xxx", or an audio clip that says the
       | same thing, have perfect recall across up to 10M tokens. The text
       | insertion occasionally fails. I'd speculate that this means there
       | is some sort of compression going on; a full video frame with
       | text on it is going to use a lot more tokens than the text
       | needle.
        
         | CharlieDigital wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | Remains to be seen.
         | 
         | Large contexts are not always better. For starters, it takes
         | longer to process. But secondly, even with RAG and the large
         | context of GPT4 Turbo, providing it a more relevant and
         | accurate context always yields better output.
         | 
         | What you get with RAG is faster response times and more
         | accurate answers by pre-filtering out the noise.
        
           | behnamoh wrote:
           | Don't forget that Gemini also has access to the internet, so
           | a lot of RAGging becomes pointless anyway.
        
             | beppo wrote:
             | Internet search _is_ a form of RAG, though. 10M tokens is
             | very impressive, but you 're not fitting a database, let
             | alone the entire internet into a prompt anytime soon.
        
               | behnamoh wrote:
               | You shouldn't fit an entire database in the context
               | anyway.
               | 
               | btw, 10M tokens is 78 times more context window than the
               | newest GPT-4-turbo (128K). In a way, you don't need 78
               | GPT-4 API calls, only one batch call to Gemini 1.5.
        
               | rvnx wrote:
               | Well it's nice, just sad nobody can use it
        
               | cchance wrote:
               | I don't get this why is it people think that you need to
               | put an entire database in the short-term memory of the AI
               | to be useful? When you work with a DB are you memorizing
               | the entire f*cking database, no, you know the summaries
               | of it and how to access and use it.
               | 
               | People also seem to forget that the average is 1b words
               | that are read by people in their entire LIFETIME, and at
               | 10m, with nearly 100% recall thats pretty damn amazing,
               | i'm pretty sure I don't have perfect recall of 10m words
               | myself lol
        
               | Qwero wrote:
               | It increases the use cases.
               | 
               | It can also be a good alternative for fine-tuning.
               | 
               | And the use case of a code base is a good example: if the
               | ai understands the whole context, it can do basically
               | everything.
               | 
               | Let me pay 5EUR for a android app rewritten into iOS.
        
             | CharlieDigital wrote:
             | This may be useful in a generalized use case, but a problem
             | is that many of those results again will add noise.
             | 
             | For any use case where you want contextual results, you
             | need to be able to either filter the search scope or use
             | RAG to pre-define the acceptable corpus.
        
               | panarky wrote:
               | _> you need to be able to either filter the search scope
               | or use RAG ..._
               | 
               | Unless you can get nearly perfect recall with millions of
               | tokens, which is the claim made here.
        
           | killerstorm wrote:
           | Hopefully we can get a better RAG out of it. Currently people
           | do incredibly primitive stuff like chunking text into chunks
           | of a fixed size and adding them to vector DB.
           | 
           | An actually useful RAG would be to convert text to Q&A and
           | use Q's embeddings as an index. Large context can make use of
           | in-context learning to make better Q&A.
        
             | mediaman wrote:
             | A lot of people in RAG already do this. I do this with my
             | product: we process each page and create lists of potential
             | questions that the page would answer, and then embed that.
             | 
             | We also embed the actual text, though, because I found that
             | only doing the questions resulted in inferior performance.
        
               | CharlieDigital wrote:
               | So in this case, what your workflow might look like is:
               | 1. Get text from page/section/chunk         2. Generate
               | possible questions related to the page/section/chunk
               | 3. Generate an embedding using { each possible question +
               | page/section/chunk }         4. Incoming question targets
               | the embedding and matches against { question + source }
               | 
               | Is this roughly it? How many questions do you generate?
               | Do you save a separate embedding for each question? Or
               | just stuff all of the questions back with the
               | page/section/chunk?
        
         | cs702 wrote:
         | > 1. They don't talk about how they get to 10M token context
         | 
         | > 2. They don't talk about how they get to 10M token context
         | 
         | Yes. I wonder if they're using a "linear RNN" type of model
         | like Linear Attention, Mamba, RWKV, etc.
         | 
         | Like Transformers with standard attention, these models train
         | efficiently in parallel, but their compute is O(N) instead of
         | O(N2), so _in theory_ they can be extended to much longer
         | sequences much efficiently. They have shown a lot of promise
         | recently at smaller model sizes.
         | 
         | Does anyone here have any insight or knowledge about the
         | internals of Gemini 1.5?
        
           | candiodari wrote:
           | They do give a hint:
           | 
           | "This includes making Gemini 1.5 more efficient to train and
           | serve, with a new Mixture-of-Experts (MoE) architecture."
           | 
           | One thing you could do with MoE is giving each expert
           | different subsets of the input tokens. And that would
           | definitely do what they claim here: it would allow search.
           | You want to find where someone said "the password is X" in a
           | 50 hour audio file, this would be perfect.
           | 
           | If your question is "what is the first AND last thing person
           | X said" ... it's going to suck badly. Anything that requires
           | taking 2 things into account that aren't right next to
           | eachother is just not going to work.
        
             | declaredapple wrote:
             | > One thing you could do with MoE is giving each expert
             | different subsets of the input tokens.
             | 
             | Don't MoE's route tokens to experts after the attention
             | step? That wouldn't solve the n^2 issue the attention step
             | has.
             | 
             | If you split the tokens _before_ the attention step, that
             | would mean those tokens would have no relationship to each
             | other - it would be like inferring two prompts in parallel.
             | That would defeat the point of a 10M context
        
             | deskamess wrote:
             | Is MOE then basically divide and conquer? I have no deep
             | knowledge of this so I assumed MOE was where each expert
             | analyzed the problem in a different way and then there was
             | some map-reduce like operation on the generated expert
             | results. Kinda like random forest but for inference.
        
             | spott wrote:
             | > Anything that requires taking 2 things into account that
             | aren't right next to eachother is just not going to work.
             | 
             | They kinda address that in the technical report[0]. On page
             | 12 they show results from a "multiple needle in a haystack"
             | evaluation.
             | 
             | https://storage.googleapis.com/deepmind-
             | media/gemini/gemini_...
        
           | sebzim4500 wrote:
           | The fact they are getting perfect recall with millions of
           | tokens rules out any of the existing linear attention
           | methods.
        
         | usaar333 wrote:
         | > They are pretty clear that 1.5 Pro is better than GPT-4 in
         | general, and therefore we have a new LLM-as-judge leader, which
         | is pretty interesting.
         | 
         | They try to push that, but it's not the most convincing. Look
         | at Table 8 for text evaluations (math, etc.) - they don't even
         | attempt a comparison with GPT-4.
         | 
         | GPT-4 is higher than any Gemini model on both MMLU and GSM8K.
         | Gemini Pro seems slightly better than GPT-4 original in Human
         | Eval (67->71). Gemini Pro does crush naive GPT-4 on math
         | (though not with code interpreter and this is the original
         | model).
         | 
         | All in 1.5 Pro seems maybe a bit better than 1.0 Ultra. Given
         | that in the wild people seem to find GPT-4 better for say
         | coding than Gemini Ultra, my current update is Pro 1.5 is about
         | equal to GPT-4.
         | 
         | But we'll see once released.
        
           | cchance wrote:
           | I mean i don't see GPT4 watching a 44 minute movie and being
           | able to exactly pinpoint a guy taking a paper out of his
           | pocket..
        
           | panarky wrote:
           | _> people seem to find GPT-4 better for say coding than
           | Gemini Ultra_
           | 
           | For my use cases, Gemini Ultra performs significantly better
           | than GPT-4.
           | 
           | My prompts are long and complex, with a paragraph or two
           | about the general objective followed by 15 to 20 numbered
           | requirements. Often I'll include existing functions the new
           | code needs to work with, or functions that must be refactored
           | to handle the new requirements.
           | 
           | I took 20 prompts that I'd run with GPT-4 and fed them to
           | Gemini Ultra. Gemini gave a clearly better result in 16 out
           | of 20 cases.
           | 
           | Where GPT-4 might miss one or two requirements, Gemini
           | usually got them all. Where GPT-4 might require multiple chat
           | turns to point out its errors and omissions and tell it to
           | fix them, Gemini often returned the result I wanted in one
           | shot. Where GPT-4 hallucinated a method that doesn't exist,
           | or had been deprecated years ago, Gemini used correct
           | methods. Where GPT-4 called methods of third-party packages
           | it assumed were installed, Gemini either used native code or
           | explicitly called out the dependency.
           | 
           | For the 4 out of 20 prompts where Gemini did worse, one was a
           | weird rejection where I'd included an image in the prompt and
           | Gemini refused to work with it because it had unrecognizable
           | human forms in the distance. Another was a simple bash script
           | to split a text file, and it came up with a technically
           | correct but complex one-liner, while GPT-4 just used split
           | with simple options to get the same result.
           | 
           | For now I subscribe to both. But I'm using Gemini for almost
           | all coding work, only checking in with GPT-4 when Gemini
           | stumbles, which isn't often. If I continue to get solid
           | results I'll drop the GPT-4 subscription.
        
             | sho_hn wrote:
             | I have a very similar prompting style to yours and share
             | this experience.
             | 
             | I am an experienced programmer and usually have a fairly
             | exact idea of what I want, so I write detailed requirements
             | and use the models more as typing accelerators.
             | 
             | GPT-4 is useful in this regard, but I also tried about a
             | dozen older prompts on Gemini Advanced/Ultra recently and
             | in every case preferred the Ultra output. The code was
             | usually more complete and prod-ready, with higher
             | sophistication in its construction and somewhat higher
             | density. It was just closer to what I would have hand-
             | written.
             | 
             | It's increasingly clear though LLM use has a couple of
             | different major modes among end-user behavior. Knowledge
             | base vs. reasoning, exploratory vs. completion, instruction
             | following vs. getting suggestions, etc.
             | 
             | For programming I want an obedient instruction-following
             | completer with great reasoning. Gemini Ultra seems to do
             | this better than GPT-4 for me.
        
               | sjwhevvvvvsj wrote:
               | I'm going to have to try Gemini for code again. It just
               | occurred to me as a Xoogler that if they used Google's
               | code base as the training data it's going to be
               | unbeatable. Now did they do that? No idea, but quality
               | wins over quantity, even with LLM.
        
               | barrkel wrote:
               | There is no way NTK data is in the training set, and
               | google3 is NTK.
        
             | Dayshine wrote:
             | Is there any chance you could share an example of the kind
             | of prompt you're writing?
             | 
             | I'm always reluctant to write long prompts because I often
             | find GPT4 just doesn't get it, and then I've wasted ten
             | minutes writing a prompt
        
           | spott wrote:
           | > Gemini Pro seems slightly better than GPT-4 original in
           | Human Eval (67->71).
           | 
           | Though they talk a bunch about how hard it was to filter out
           | Human Eval, so this probably doesn't matter much.
        
         | swalsh wrote:
         | "The 10M context ability wipes out most RAG stack complexity
         | immediately."
         | 
         | I'm skeptical, my past experience is just becaues the context
         | has room to stuff whatever you want in it, the more you stuff
         | in the context the less accurate your results are. There seems
         | to be this balance of providing enough that you'll get high
         | quality answers, but not too much that the model is
         | overwhelmed.
         | 
         | I think a large part of developing better models is not just a
         | better architectures that support larger and larger context
         | sizes, but also capable models that can properly leverage that
         | context. That's the test for me.
        
           | HereBePandas wrote:
           | They explicitly address this in page 11 of the report.
           | Basically perfect recall for up to 1M tokens; way better than
           | GPT-4.
        
             | westoncb wrote:
             | I don't think recall really addresses it sufficiently: the
             | main issue I see is answers getting "muddy". Like it's
             | getting pulled in too many directions and averaging.
        
               | a_wild_dandan wrote:
               | I'd urge caution in extending generalizations about
               | "muddiness" to a new context architecture. Let's use the
               | thing first.
        
               | westoncb wrote:
               | I'm not saying it applies to the new architecture, I'm
               | saying that's a big issue I've observed in existing
               | models and that so far we have no info on whether it's
               | solved in the new one (i.e. accurate recall doesn't imply
               | much in that regard).
        
               | westoncb wrote:
               | Would be awesome if it is solved but seems like a much
               | deeper problem tbh.
        
               | a_wild_dandan wrote:
               | Ah, apologies for the misunderstanding. What tests would
               | you suggest to evaluate "muddiness"?
               | 
               | What comes to my mind: run the usual gamut of tests, but
               | with the excess context window saturated with
               | irrelevant(?) data. Measure test answer
               | accuracy/verbosity as a function of context saturation
               | percentage. If there's little correlation between these
               | two variables (e.g. 9% saturation is just as
               | accurate/succinct as 99% saturation), then "muddiness"
               | isn't an issue.
        
               | danielmarkbruce wrote:
               | Manual testing on complex documents. A big legal contract
               | for example. An issue can be referred to in 7 different
               | places in a 100 page document. Does it give a coherent
               | answer?
               | 
               | A handful of examples show whether it can do it. For
               | example, GPT-4 turbo is downright awful at something like
               | that.
        
               | smeagull wrote:
               | I believe that's a limitation of using vectors of high
               | dimensions. It'll be muddy.
        
           | swyx wrote:
           | also costs are always based on context token, you dont want
           | to put in 10m of context for every request (its just nice to
           | have that option when you want to do big things that dont
           | scale)
        
             | 1024core wrote:
             | How much would a lawyer charge to review your 10M-token
             | legal document?
        
           | chuckcode wrote:
           | Would like to see the latency and cost of parsing entire 10M
           | context before throwing out the RAG stack which is relatively
           | cheap and fast.
        
           | tkellogg wrote:
           | costs rise on a per-token basis. So you _CAN_ use 10M tokens,
           | but it 's probably not usually a good idea. A database lookup
           | is still better than a few billion math operations.
        
             | sjwhevvvvvsj wrote:
             | I think the unspoken goal is to just lay off your employees
             | and dump every doc and email they've ever written as one
             | big context.
             | 
             | Now that Google has tasted the previously forbidden fruit
             | of layoffs themselves, I think their primary goal in ML is
             | now headcount reduction.
        
           | theolivenbaum wrote:
           | Also unless they significantly change their pricing model,
           | we're talking about 0.5$ per API call at current prices
        
           | aik wrote:
           | Have to consider cost for all of this. Big value of RAG
           | already even given the size of GPT-4'a largest context size
           | is it decreases cost very significantly.
        
         | freedomben wrote:
         | Is 10M token context correct? The blog post I see 1M but I'm
         | not sure if these are different things
         | 
         | Edit: Ah, I see, it's 1M reliably in production, up to 10M in
         | research:
         | 
         | > _Through a series of machine learning innovations, we've
         | increased 1.5 Pro's context window capacity far beyond the
         | original 32,000 tokens for Gemini 1.0. We can now run up to 1
         | million tokens in production._
         | 
         | > _This means 1.5 Pro can process vast amounts of information
         | in one go -- including 1 hour of video, 11 hours of audio,
         | codebases with over 30,000 lines of code or over 700,000 words.
         | In our research, we've also successfully tested up to 10
         | million tokens._
        
           | huytersd wrote:
           | I know how I'm going to evaluate this model. Upload my
           | codebase and ask it to "find all the bugs".
        
         | tbruckner wrote:
         | How do you know it isn't RAG?
        
         | tveita wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | The video queries they show take around 1 minute each, this
         | probably burns a ton of GPU. I appreciate how clearly they
         | highlight that the video is sped up though, they're clearly
         | trying to avoid repeating the "fake demo" fiasco from the
         | original Gemini videos.
        
         | theGnuMe wrote:
         | For #1 and #2 it is some version of mixture of experts. This is
         | mentioned in the blog post. So each expert only sees a subset
         | of the tokens.
         | 
         | I imagine they have some new way to route tokens to the experts
         | that probably computes a global context. One scalable way to
         | compute a global context is by a state space model. This would
         | act as a controller and route the input tokens to the MoEs.
         | This can be computed by convolution if you make some
         | simplifying assumptions. They may also still use transformers
         | as well.
         | 
         | I could be wrong but there are some Mamba-MoEs papers that
         | explore this idea.
        
         | resouer wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | This may not be true. My experience of the complexity of RAG
         | lays in how to properly connect to various unstructured data
         | sources and perform data transformation pipeline for large
         | scale data set (which means GB, TB or even PB). It's in the
         | critical path rather a "nice to have", because the quality of
         | data and the pipeline is a major factor for the final generated
         | the result. i.e., in RAG, the importance of R >>> G.
        
         | jorvi wrote:
         | I just hope at some point we get access to mostly uncensored
         | models. Both GPT-4 and Gemini are extremely shackled, and a
         | slightly inferior model that hasn't been hobbled by a very
         | restricting preprompt would handily outperform them.
        
           | ShamelessC wrote:
           | You can customize the system prompt with ChatGPT or via the
           | completions API, just fyi.
        
         | ren_engineer wrote:
         | RAG would still be useful for cost savings assuming they charge
         | per token, plus I'm guessing using the full-context length
         | would be slower than using RAG to get what you need for a
         | smaller prompt
        
           | nostrebored wrote:
           | This is going to be the real differentiator.
           | 
           | HN is very focused on technical feasibility (which remains to
           | be seen!), but in every LLM opportunity, the CIO/CFO/CEO are
           | going to be concerned with the cost modeling.
           | 
           | The way that LLMs are billed now, if you can densely pack the
           | context with relevant information, you will come out ahead
           | commercially. I don't see this changing with the way that LLM
           | inference works.
           | 
           | Maybe this changes with managed vector search offerings that
           | are opaque to the user. The context goes to a preprocessing
           | layer, an efficient cache understands which parts haven't
           | been embedded (new bloom filter use case?), embeds the other
           | chunks, and extracts the intent of the prompt.
        
             | mediaman wrote:
             | Agreed with this.
             | 
             | The leading ability AI (in terms of cognitive power) will,
             | generally, cost more per token than lower cognitive power
             | AI.
             | 
             | That means that at a given budget you can choose more
             | cognitive power with fewer tokens, or less cognitive power
             | with more tokens. For most use cases, there's no real point
             | in giving up cognitive power to include useless tokens that
             | have no hope of helping with a given question.
             | 
             | So then you're back to the question of: how do we reduce
             | the number of tokens, so that we can get higher cognitive
             | power?
             | 
             | And that's the entire field of information retrieval, which
             | is the most important part of RAG.
        
             | golol wrote:
             | The way that LLMs are billed now, if you can densely pack
             | the context with relevant information, you will come out
             | ahead commercially. I don't see this changing with the way
             | that LLM inference works.
             | 
             | Really? Because to my understanding the compute necessary
             | to generate a token grows linearly with the context, and
             | doesn't the OpenAI billing reflect that by seperating
             | prompt and output tokens?
        
         | cchance wrote:
         | The youtube video of the Multimodal analysis of a video is
         | insane, imagine feeding in movies or tv shows and being able to
         | autosummary or find information about them dynamically, how the
         | hell is all this possible already? AI is moving insanely fast.
        
         | zitterbewegung wrote:
         | RAG doesn't go away at 10 Million tokens if you do esoteric
         | sources like shodan API queries.
        
         | kylerush wrote:
         | I assume using this large of a context window instead of RAG
         | would mean the consumption of many orders of magnitude more
         | GPU.
        
         | karmasimida wrote:
         | Even 1m tokens eliminate the need for RAG, unless it is for
         | cost.
        
           | sroussey wrote:
           | Or accuracy
        
           | 7734128 wrote:
           | 1 million might sound like a lot, but it's only a few
           | megabytes. I would want RAG, somehow, to be able to process
           | gigabytes or terabytes of material in a streaming fashion.
        
             | karmasimida wrote:
             | RAG will not change how many tokens LLM can produce at
             | once.
             | 
             | Longer context on the other hand, could put some RAG use
             | cases to sleep, if your instructions are like, literally a
             | manual long, then there is no need for rag.
        
         | localhost wrote:
         | RE: RAG - they haven't released pricing, but if input tokens
         | are priced at GPT-4 levels - $0.01/1K then sending 10M tokens
         | will cost you $100.
        
           | s-macke wrote:
           | If you think the current APIs will stay that way, then you're
           | right. But when they start offering dedicated chat instances
           | or caching options, you could be back in the penny region.
           | 
           | You probably need a couple GB to cache a conversation. That's
           | not so easy at the moment because you have to transfer that
           | data to and from the GPUs and store the data somewhere.
        
         | TweedBeetle wrote:
         | Regarding how they're getting to 10M context, I think it's
         | possible they are using the new SAMBA architecture.
         | 
         | Here's the paper: https://arxiv.org/abs/2312.00752
         | 
         | And here's a great podcast episode on it:
         | https://www.cognitiverevolution.ai/emergency-pod-mamba-memor...
        
           | LightMachine wrote:
           | As a Brazilian, I approve that choice. Vambora amigos!
        
         | renonce wrote:
         | > They don't talk about how they get to 10M token context
         | 
         | I don't know how either but maybe
         | https://news.ycombinator.com/item?id=39367141
         | 
         | Anyway I mean, there is plenty of public research on this so
         | it's probably just a matter of time for everyone else to catch
         | up
        
           | albertzeyer wrote:
           | Why do you think this specific variant (RingAttention)? There
           | are so many different variants for this.
           | 
           | As far as I know, the problem in most cases is that while the
           | context length might be high in theory, the actual ability to
           | use it is still limited. E.g. recurrent networks even have
           | infinite context, but they actually only use 10-20 frames as
           | context (longer only in very specific settings; or maybe if
           | you scale them up).
        
         | AaronFriel wrote:
         | There will always be more data that _could_ be relevant than
         | fits in a context window, and especially for multi-turn
         | conversations, huge contexts incur huge costs.
         | 
         | GPT-4 Turbo, using its full 128k context, costs around $1.28
         | per API call.
         | 
         | At that pricing, 1m tokens is $10, and 10m tokens is an eye-
         | watering $100 per API call.
         | 
         | Of course prices will go down, but the price advantage of
         | working with less will remain.
        
           | 7734128 wrote:
           | Would the price really increase linearly? Isn't the demands
           | on compute and memory increasing steeper than that as a
           | function of context length?
        
           | elorant wrote:
           | I don't see a problem with this pricing. At 1m tokens you can
           | upload the whole proceedings of a trial and ask it to draw an
           | analysis. Paying $10 for that sounds like a steal.
        
             | AaronFriel wrote:
             | Of course, if you get exactly the answer you want in the
             | first reply.
        
             | staticman2 wrote:
             | While it's hard to say what's possible on the cutting edge,
             | historically models tend to get dumber as the context size
             | gets bigger. So you'd get a much more intelligent analysis
             | of a 10,000 token excerpt of the trial than a million token
             | complete transcript of the trial. I have not spent the
             | money testing big token sizes in GPT 4 turbo, but it would
             | not surprise me if it gets dumber. Think of it this way, if
             | the model is limited to 3,000 token replies, if an analysis
             | would require a more detailed response than 3,000 tokens,
             | it cannot provide it, it'll just give you insufficient
             | information. What it'll probably do is ignore parts of the
             | trial transcript because it can't analyze all that
             | information in 3,000 tokens. And asking a followup question
             | is another million tokens.
        
         | qwerty_clicks wrote:
         | FYI, MM is the standard for million. 10MM not 10M I'm reading
         | all these comments confused as heck why you are excited about
         | 10M tokens
        
         | a_vanderbilt wrote:
         | After their giant fib with the Gemini video a few weeks back
         | I'm not believing anything til I see it used by actual people.
         | I hope it's that much better than GPT-4, but I'm not holding my
         | breath there isn't an asterisk or trick hiding somewhere.
        
         | nborwankar wrote:
         | Re RAG aren't you ignoring the fact that no one wants to put
         | confidential company data into such LLM's. Private RAG
         | infrastructure remains a need for the same reason that privacy
         | of data of all sorts remains a need. Huge context solves the
         | problem for large open source context material but that's only
         | part of the picture.
        
         | outside1234 wrote:
         | It takes 60 seconds to process all of that context in their
         | three.js demo, which is, I will say, not super interactive. So
         | there is still room for RAG and other faster alternatives to
         | narrow the context.
        
         | aubanel wrote:
         | > They are pretty clear that 1.5 Pro is better than GPT-4 in
         | general, and therefore we have a new LLM-as-judge leader, which
         | is pretty interesting
         | 
         | I fully disagree, they compare Gemini 1.5 Pro and GPT4 only on
         | context length, not on other tasks where they compare it only
         | to other Gemini which is a strange self-own.
         | 
         | I'm convinced that if they do not show the results against
         | GPT4/Claude, it is because they do not look good.
        
         | kristjansson wrote:
         | For other's reference, the paper:
         | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
         | joshsabol46 wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | RAG is needed for the same reason you don't `SELECT *` all of
         | your queries.
        
       | cubefox wrote:
       | I think Anthropic and OpenAI could also have offered a one
       | million context window a while ago. The relevant architecture
       | breakthrough was probably when a linear increase in context
       | length only required a linear increase in inference compute
       | instead of a quadratic one. Anthropic and then OpenAI achieved
       | linear context compute scaling before an architecture for it was
       | published publicly (MAMBA paper).
        
         | bearjaws wrote:
         | The problem is, the 128k window performed terribly and showed
         | that attention was mostly limited to the first and last 20%.
         | 
         | Increasing it to 1M just means even more data is ignored.
        
           | cubefox wrote:
           | Maybe their architecture wasn't as good as MAMBA and Google
           | could use the better architecture thanks to being late to the
           | game...
        
       | zippothrowaway wrote:
       | I've always been suspicious of any announcement from Demis
       | Hassabis since way back in his video game days when he did a
       | monthly article in Edge magainze about the game he was
       | developing. "Infinite Polygons" became a running joke in the
       | industry because of his obvious snake-oil. The game itself,
       | Republic [1], was an uninteresting failure.
       | 
       | He learned how to promote himself from working for Peter "Project
       | Milo" Molyneux and I see similar patterns of hype.
       | 
       | [1]
       | https://en.wikipedia.org/wiki/Republic:_The_Revolution#Marke...
        
         | pradn wrote:
         | The line between delusional and visionary is thin! I know I'm
         | too grounded in "expected value" math to do super outlier stuff
         | like starting a video game company...
        
         | Qwero wrote:
         | Funny read about his game.
         | 
         | Nonetheless while still underwhelming in comparison to gpt-4
         | (excluding this announcement as I haven't tried it yet), alpha
         | go, zero and especially fold were tremendous!
        
       | obblekk wrote:
       | Very impressive if the benchmarks replicate. Some questions:
       | 
       | * token cost? In multiples of Gemini pro 1
       | 
       | * memory usage? Does already scarce gpu memory become even more
       | of a bottleneck?
       | 
       | * video resolution? Sherlock Jr (1924) is their test video -
       | black and white, 45min, low res
       | 
       | Most curious about the video... I wonder if RAG within video will
       | become the next battlefront
        
       | technics256 wrote:
       | Does anyone actually have access to Ultra yet? It's a lame blog
       | post where it says "it's available!" but the fine print says "by
       | whitelist".
       | 
       | Ok, whatever that means.
       | 
       | OpenAI at least releases it all at once, to everyone.
        
         | Szpadel wrote:
         | oh, openai had a lot of waitlists also, gpt4 API, large context
         | versions etc
        
       | sonium wrote:
       | I just watched the demo with the Apollo 11 transcript. (sidenote:
       | maybe Gemini is named after the space program?).
       | 
       | Wouldn't the transcript or at least a timeline of Apollo 11 be
       | part of the training corpus? So even without the 400 pages in the
       | context window just given the drawing I would assume a prompt
       | like "In the context of Apoll 11, what moment does the drawing
       | refer to?" would yield the same result.
        
         | technics256 wrote:
         | Gemini is named that way because of the collaboration between
         | Google brain and deep mind
        
         | singularity2001 wrote:
         | Correct except that it spits out the timestamp
        
         | torginus wrote:
         | Gemini is named after the spacecraft that put the second person
         | into orbit - pretty aptly named, but not sure if this was the
         | intention.
        
         | empath-nirvana wrote:
         | i asked chatgpt4 to identify three humorous moments in the
         | apollo 11 transcript and it hallucinated all 3 of them (i think
         | -- i can't find what it's referring to). Presumably it's in
         | it's corpus, too.
         | 
         | > The "Snoopy" Moment: During the mission, the crew had a
         | small, black-and-white cartoon Snoopy doll as a semi-official
         | mascot, representing safety and mission success. At one point,
         | Collins joked about "Snoopy" floating into his view in the
         | spacecraft, which was a light moment reflecting the camaraderie
         | and the use of humor to ease the intense focus required for
         | their mission.
         | 
         | The "Biohazard" Joke: After the successful moon landing and
         | upon preparing for re-entry into Earth's atmosphere, the crew
         | humorously discussed among themselves the potential of being
         | quarantined back on Earth due to unknown lunar pathogens. They
         | joked about the extensive debriefing they'd have to go through
         | and the possibility of being a biohazard. This was a light-
         | hearted take on the serious precautions NASA was taking to
         | prevent the hypothetical contamination of Earth with lunar
         | microbes.
         | 
         | The "Mailbox" Comment: In the midst of their groundbreaking
         | mission, there was an exchange where one of the astronauts
         | joked about expecting to find a mailbox on the Moon, or asking
         | where they should leave a package, playing on the surreal
         | experience of being on the lunar surface, far from the ordinary
         | elements of Earthly life. This comment highlighted the
         | astronauts' ability to find humor in the extraordinary
         | circumstances of their journey.
        
       | htrp wrote:
       | > Gemini 1.5 delivers dramatically enhanced performance. It
       | represents a step change in our approach, building upon research
       | and engineering innovations across nearly every part of our
       | foundation model development and infrastructure. This includes
       | making Gemini 1.5 more efficient to train and serve, with a new
       | Mixture-of-Experts (MoE) architecture.
       | 
       | Looks like they fine tuned across use cases and grabbed the
       | mixtral architecture?
        
         | sebzim4500 wrote:
         | There's no way that's all it is, scaling mixtral to a context
         | length of 10M while maintaining any level of reasoning ability
         | would be extremely slow. If the only purpose of the model was
         | to produce this report then maybe that's possible, but if they
         | plan on actually deploying this to end users then there is no
         | way they can run quadratic attention on 10M tokens.
        
       | joak wrote:
       | <<We'll also introduce 1.5 Pro with a standard 128,000 token
       | context window when the model is ready for a wider release>>
       | 
       | So actually they are lagging: their 128k model is yet to be
       | released while OpenAI released theirs some months ago.
        
         | joak wrote:
         | Their 10M tokens demo is impressive though. They "released" a
         | demo. Confusing...
        
         | kyrra wrote:
         | See: https://blog.google/technology/ai/google-gemini-next-
         | generat...
         | 
         | > Gemini 1.5 Pro comes with a standard 128,000 token context
         | window. But starting today, a limited group of developers and
         | enterprise customers can try it with a context window of up to
         | 1 million tokens via AI Studio and Vertex AI in private
         | preview.
        
           | joak wrote:
           | Gemini 1.5 Pro is not yet released: <<Starting today, we're
           | offering a limited preview of 1.5 Pro to developers and
           | enterprise customers via AI Studio and Vertex AI>>
           | 
           | Something like an alpha version.
           | 
           |  _Limited preview_ in their jargon.
        
       | iamgopal wrote:
       | AI race is amazing, Nvidia reaping the benefits now, but soon the
       | world.
        
       | cubefox wrote:
       | The whitepaper says the Buster Keaton film was reduced to 1 FPS
       | before being fed in. Apparently multi-modal language models can
       | only read individual pictures, so videos have to be reduced to a
       | series of frames. I assume animal brains are more efficient than
       | that. E.g. by only feeding the "changes/difference over time"
       | instead of a sequence of time slices.
        
         | riku_iki wrote:
         | it will probably eventually be improved by adding some encoder
         | on top of LLM, which will encode 60 frames into 1 while
         | attempting to preserve information..
        
       | freedomben wrote:
       | > _Our teams continue pushing the frontiers of our latest models
       | with safety at the core._
       | 
       | They're not kidding, Gemini (at least what's currently available)
       | is so safe that it's not all that useful.
       | 
       | The "safety" permeates areas where you wouldn't even expect it,
       | like refusing to answer questions about "unsafe" memory
       | management in C. It interjects lectures about safety in answers
       | when you didn't even ask it to do that in the question.
       | 
       | For example, I clicked on _one of the four example questions that
       | Gemini proposes to help you get started_ and it was something
       | like  "Write an SMS calling in sick. It's a big presentation day
       | and I'm sad to let the team down." Gemini decided to tell me that
       | it can't impersonate positions of trust like medical
       | professionals or employers (which is not at all what I asking it
       | to do).
       | 
       | The other things I asked it, it gave me wrong and obviously wrong
       | answers. The funniest (though glad it was obviously wrong) was
       | when I asked it "I'm flying from Karachi to Denver. Will I need
       | to pick up my bags in Newark?" and it told me "no, because
       | Karachi to Newark is a domestic flight"
       | 
       | Unless they stop putting "safety at the core," or figure out how
       | to do it in a way that isn't unnecessarily inhibiting, annoying,
       | and frankly insulting (protip: humans don't like to be accused of
       | asking for unethical things, especially when they weren't asking
       | for them. when other humans do that to us, we call that assuming
       | the worst and it's a negative personality trait), any
       | announcements/releases/breakthroughs from Google are going to be
       | a "meh" for me.
        
       | dghlsakjg wrote:
       | This is incredible if it isn't just hype!
       | 
       | I hope the demos aren't fudged/scripted like Google did with
       | Gemini 1.0
        
         | amf12 wrote:
         | These demos seem to be videos from AI studio, and which display
         | the time in seconds. Hopefully not fudged.
        
       | EZ-E wrote:
       | Remember AI Dungeon and how it was frustrating about how it would
       | forget what happened previously? With a 10M context window, am I
       | right to assume it would be possible to weave a story which would
       | span with multiple multiple books worth of content? (more or less
       | 1400 pages)
        
         | dougmwne wrote:
         | Pretty much! Check out this demo of finding a scene in a 1400
         | page book based on a stick figure drawing. Mind blowing, right?
         | 
         | https://twitter.com/JeffDean/status/1758148159942091114
        
         | VikingCoder wrote:
         | Dear Google,
         | 
         | Teach Gemini how to be a Dungeon Master, and run free
         | adventures at Comic Con.
         | 
         | Then offer it up as a subscription.
         | 
         | Sincerely,
         | 
         | Everyone
        
       | eigenvalue wrote:
       | Based on what I've seen so far, I think the probability that this
       | is actually better than GPT4 on the kind of real world coding
       | tasks that I use it for is less than 1%. Literally everything
       | from Google on this has been vaporware or laughably bad in actual
       | practice in my personal experience. Which is totally insane to me
       | given their financial resources, human resources, and multi-year
       | lead in AI/DL research, but that's what seems to have happened. I
       | certainly hope that they can develop and actually release a
       | capable model, but at this point, I think you have to be deeply
       | skeptical of everything they say until such a model is available
       | for real by the public and you can try it on actual, real tasks
       | and not fake benchmark nonsense and waitlists.
        
       | scarmig wrote:
       | One interesting tidbit from the technical report:
       | 
       | >HumanEval is an industry standard open-source evaluation
       | benchmark (Chen et al., 2021), but we found controlling for
       | accidental leakage on webpages and open-source code repositories
       | to be a non-trivial task, even with conservative filtering
       | heuristics. An analysis of the test data leakage of Gemini 1.0
       | Ultra showed that continued pretraining on a dataset containing
       | even a single epoch of the test split for HumanEval boosted
       | scores from 74.4% to 89.0%, highlighting the danger of data
       | contamination. We found that this sharp increase persisted even
       | when examples were embedded in extraneous formats (e.g. JSON,
       | HTML). We invite researchers assessing coding abilities of these
       | models head-to-head to always maintain a small set of truly held-
       | out test functions that are written in-house, thereby minimizing
       | the risk of leakage. The Natural2Code benchmark, which we
       | announced and used in the evaluation of Gemini 1.0 series of
       | models, was created to fill this gap. It follows the exact same
       | format of HumanEval but with a different set of prompts and
       | tests.
        
       | llm_trw wrote:
       | Yeah. I'll believe that when I can use it.
        
       | DidISayTooMuch wrote:
       | How can I fine tune these models for my use? Their docs isn't
       | clear whether the Gemini models are fine tuneable.
        
       | qwertox wrote:
       | As a sidenote, it's worth clicking the play button and then
       | checking how they're highlighting the current paragraph and word
       | in the inspector.
        
       | aubanel wrote:
       | For reference, here is the technical report:
       | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
       | dumbmachine wrote:
       | It would, probably, be cost prohibitive to use 10M context to
       | it's fullest each time.
       | 
       | I instead hope for to have an api to access to the context as a
       | datastore, so like RAG we can control what to store but unlike
       | rag all data stays within context.
        
       | killthebuddha wrote:
       | 10M tokens is an absolute game changer, especially if there's no
       | noticeable decay in quality with prompt size. We're going to see
       | things like entire domain specific languages embedded in prompts.
       | IMO people will start thinking of the prompt itself as a sort of
       | runtime rather than a static input.
       | 
       | Back when OpenAI still supported raw text completion with text-
       | davinci-003 I spent some time experimenting with tiny prompt-
       | embedded DSLs. The results were very, very, interesting IMO. In a
       | lot of ways, text-davinci-003 with embedded functions still feels
       | to me like the "smartest" language model I've ever interacted
       | with.
       | 
       | I'm not sure how close we are to "superintelligence" but for
       | baseline general intelligence we very well could have already
       | made the prerequisite technological breakthroughs.
        
         | empath-nirvana wrote:
         | It's pretty slow, though looks like up to 60 seconds for some
         | of the answers, and uses god knows how much compute, so there's
         | probably going to be some trade offs -- you're going to want to
         | make sure that that much context is actually useful for what
         | you want.
        
           | drusepth wrote:
           | TBF: when talking about the first "superintelligence", I'd
           | expect it to take unreasonable amounts of compute and/or be
           | slow -- that can always be optimized. Bringing it into
           | existence in the first place is the hardest part.
        
             | unshavedyak wrote:
             | Yea. Of course for some tasks we need speed, but i've been
             | kinda surprised that we haven't seen very slow models which
             | perform far better than faster models. We're treading new
             | territory, and everyone seems to make models that are "fast
             | enough".
             | 
             | I wanna see how far this tech can scale, regardless of
             | speed. I don't care if it takes 24h to formulate a
             | response. Are there "easy" variables which drastically
             | improve output?
             | 
             | I suspect not. I imagine people have tried that. Though i'm
             | still curious as to why.
        
       | Yusefmosiah wrote:
       | I see a lot of talk about retrieval over long context. Some even
       | think this replaces RAG.
       | 
       | I don't care if the model can tell me which page in the book or
       | which code file has a particular concept. RAG already does this.
       | I want the model to notice how a concept is distributed
       | throughout a text, and be able to connect, compare, contrast,
       | synthesize, and understand all the ways that a book touches on a
       | theme, or to rewrite multiple code files in one pass, without
       | introducing bugs.
       | 
       | How does Gemini 1.5's reasoning compare to GPT-4? GPT-4 already
       | has superhuman memory; its bottleneck is its relatively weak
       | reasoning.
        
         | sinuhe69 wrote:
         | In my experience (I work mostly and deeply with Bard/Gemini),
         | the reasoning capability of Gemini is quite good. Gemini Pro is
         | already much better than ChatGPT 3.5, but they still make quite
         | a few mistakes along the way. What is more worrying is that
         | when these models made mistakes, they tried really hard to
         | justify their reasoning (errors), practically misleading the
         | users. Because of their high mimicry ability, users really have
         | to pay attention to validate and eventually spot the errors. Of
         | course, this is still far below the human level, so I'm not
         | sure whether they add value or are more of a burden.
        
         | og_kalu wrote:
         | The most impressive demonstration of long context is this in my
         | opinion,
         | 
         | https://imgur.com/a/qXcVNOM
         | 
         | Testing language translation abilities of an extremely obscure
         | language after passing in one grammar book as context.
        
       | petargyurov wrote:
       | Version number suggests they're waiting to announce something
       | bigger already?
        
       | bloopernova wrote:
       | Hooray for competition.
        
       | luke-stanley wrote:
       | Still no Ultra model API available to UK devs? Considering
       | Deepmind's London base, this is kinda strange. Maybe they could
       | ask Ultra how to roll it out faster?
        
       | bobvanluijt wrote:
       | Demo with Google AI Studio:
       | https://twitter.com/bobvanluijt/status/1758185143116730875
        
       | processing wrote:
       | just wade through documentation to access it?
       | 
       | clicking on the AI studio link doesn't show me the app page - it
       | redirects to a document on early access. I do as required - go
       | back and try clicking on the AI studio link and I'm redirected to
       | the document on turning early access.
       | 
       | frustrating.
        
       | robertlagrant wrote:
       | Slightly surprisingly I can't get to AI Studio from the UK. It is
       | available in quite a few countries, but not here.
        
       | ChildOfChaos wrote:
       | Is this just more nonsense from Google though? I expect big
       | things from Google, but they need to shut up and actually release
       | stuff instead of saying how amazing there stuff is and then
       | release potato ai, nothing they have done in the AI space
       | recently has lived up to any of the hype, they should stay silent
       | for a bit then release something that kills GPT4 if they honestly
       | are able but instead they are just full of hype.
        
         | sinuhe69 wrote:
         | Yeah, their Gemini demo was a disaster. But they have released
         | their Ultra model for the general audience, so you can test
         | them yourself. Talking about killing the competitor is a little
         | funny, considering they are all generative LLM based on the
         | same principles (and general architecture) with their inherent
         | flaws and shortcomings. All of them can not even execute a
         | basic plan like a cheap human assistant. So their values are
         | very limited.
         | 
         | Breakthrough will only come with a next generation
         | architecture. LLM for special domains is currently the most
         | promising approach.
        
           | ChildOfChaos wrote:
           | Yeah but even with ultra they kept saying how it was better
           | than GPT4 and then when it actually got released it was
           | awful.
        
       | jeffbee wrote:
       | A little off-topic I guess, but is anyone else seeing what I am
       | seeing: a total inability to actually upgrade to paid Gemini?
       | Every time I try to sign up it serves me an error page: "We're
       | sorry - Google One storage plans aren't available right now."
        
       | DrNosferatu wrote:
       | Did they say a general availability date?
       | 
       | (a bit confused)
        
       | dang wrote:
       | There's also
       | https://twitter.com/JeffDean/status/1758146022726041615
       | 
       | (via https://news.ycombinator.com/item?id=39383593, but we merged
       | those comments hither)
        
       | topicseed wrote:
       | 1 million tokens?? This is wild and a lot of RAG can be removed.
        
       | topicseed wrote:
       | Is this going to be only for consumer Gemini app or for
       | API/Vertex too? The context window is..... Simply lovely.
        
       | summerlight wrote:
       | One interesting proposal here is a multiple NIAH retrieval
       | benchmark. When they put 100 needles, then the recall rate
       | becomes considerably lower, something around 60~70%. Not sure
       | what's the exact configuration of this benchmark, but intuitively
       | this makes sense and should be a critical metric for the model's
       | reliability.
        
       | reissbaker wrote:
       | The long context length is of course incredible, but I'm more
       | shocked that the _Pro_ model is now on par with Ultra (~GPT-4, at
       | least the original release). That implies when they release 1.5
       | Ultra, we 'll finally have a GPT-4 killer. And assuming that 1.5
       | Pro is priced similarly to the current Pro, that's a 4x price
       | advantage per-token.
       | 
       | Not surprising that OpenAI shipped a blog post today about their
       | video generation -- I think they're feeling considerable heat
       | right now.
        
         | topicseed wrote:
         | Gemini 1 Ultra was also said to be on par with ChatGPT 4 and
         | it's not really there so let's see for ourselves when we can
         | get our hands on it.
        
           | reissbaker wrote:
           | Ultra benchmarked around the original release of GPT-4, not
           | the current model. My understanding is that was fairly
           | accurate -- it's close to current GPT-4 but not quite equal.
           | However, close-to-GPT-4 but 4x cheaper and 10x context length
           | would be very impressive and IMO useful.
        
       | m3kw9 wrote:
       | imagine sending 5-10 mbs over the network per request, and the
       | cost per token. You may accidently go broke after a big lag.
        
       | system2 wrote:
       | Let's hope this lowers the pricing of GPT-4 to GPT3.5 levels.
       | Because of Open AI's ridiculous pricing, we can't use it
       | regularly as it would cost us thousands of dollars per month.
        
       | stolsvik wrote:
       | So, this has native image/video modality. I wonder whether that
       | gives it an edge in physical / world understanding? That is,
       | handling and navigating our 3/4 dimensions? Cause and effect and
       | so on?
        
       | animanoir wrote:
       | Google is so finished, they are so late on this.
        
       | tmaly wrote:
       | Is there a $20 a month option for 1.5 Ultra?
       | 
       | If there is, where do I sign up?
        
       | ancorevard wrote:
       | Is this a blog post of did they actually ship?
        
       | jstummbillig wrote:
       | Imagine a day, when your new recording setting 10M token context
       | model is not enough to make it to hn #1
       | 
       | Wild times.
        
       | thot_experiment wrote:
       | I gotta say, I've been trying out Gemini recently and it's
       | embarrassingly bad. I can't take anything google puts out
       | seriously when their current offerings are so so much worse than
       | ChatGPT (or even local llama!).
       | 
       | As a particularly egregious example, yesterday night I gave
       | Gemini a list of drinks and other cocktail ingredients I had
       | laying around and asked for some recommendations for cute drinks
       | that I could make. It's response:
       | 
       | > I'm just a language model, so I can't help you with that.
       | 
       | ChatGPT 3.5 came up with several delicious options with clear
       | instructions, but it's not just this instance, I've NEVER gotten
       | a response from Gemini that I even felt was _more useful_ than
       | just a freaking bing search! Much less better than ChatGPT. I 'm
       | just going to assume they're using cherrypicked metrics to make
       | themselves feel better until proven otherwise. I have zero
       | confidence in Google's AI plays, and I assume all their competent
       | talent is now at OpenAI or Anthropic.
        
       | mikeweiss wrote:
       | Does anyone know what kinds of GPUs/Chips Google is using for
       | Gemini? They aren't using Nvidia correct?
        
         | sackfield wrote:
         | TPUs: https://cloud.google.com/tpu?hl=en
        
       | jakub_g wrote:
       | Very off-topic but I can't help, the pace of change reminds of
       | the "Bates 4000" sketch from The Onion Movie:
       | 
       | https://m.youtube.com/watch?v=fw7FniaeaSo
        
       ___________________________________________________________________
       (page generated 2024-02-15 23:00 UTC)