[HN Gopher] Our next-generation model: Gemini 1.5
       ___________________________________________________________________
        
       Our next-generation model: Gemini 1.5
        
       Author : todsacerdoti
       Score  : 1210 points
       Date   : 2024-02-15 15:02 UTC (1 days ago)
        
 (HTM) web link (blog.google)
 (TXT) w3m dump (blog.google)
        
       | crakenzak wrote:
       | Technical report: https://storage.googleapis.com/deepmind-
       | media/gemini/gemini_...
       | 
       | The 1 million token context window + Gemini 1.0 Ultra level
       | performance seems like it'll unlock a wide range of incredible
       | use cases!
       | 
       | HN, what are you going to use/build with this?
        
         | volkk wrote:
         | was this posted by an AI bot
        
           | crakenzak wrote:
           | Lol nope I'm a normal person. Gimme a captcha and I'll
           | (hopefully) solve it ;)
        
             | scarmig wrote:
             | Just gotta make sure the captcha requires a >1M token
             | context length to solve...
        
             | throwaway918274 wrote:
             | How do we know you're not an AI bot that figured out how to
             | hire someone from fiverr to solve captchas for you?
        
           | mrkstu wrote:
           | No, they're just applying their Twitter style engagement
           | strategy to HN for some reason...
        
       | code51 wrote:
       | Dear Google, please fix your names and versioning.
       | 
       | Gemini Pro, Gemini Ultra... but was 1.0?
       | 
       | now upgraded but again Gemini Pro? jumping from 1.0 to 1.5?
       | 
       | wait but not Gemini Pro 1.5... Gemini "1.5" Pro
       | 
       | What actually happened between 1.0 and 1.5?
        
         | lairv wrote:
         | This naming is terrible, if I understand correctly this is the
         | release of Gemini 1.5 Pro, but not Gemini 1.5 Ultra right ?
        
           | goalonetwo wrote:
           | Looks like the former PM of chat at google found a new job.
        
           | cchance wrote:
           | How is that hard to understand? Yes its gemini 1.5 pro, they
           | haven't released ultra or nano, like this isn't rocket
           | science they didnt introduce Gemini 1.5 ProLight or
           | something, lol its the Pro size model's 1.5 version.
        
             | lairv wrote:
             | The name of the blog post is "Our next-generation model:
             | Gemini 1.5", how am I supposed to infer from this that it
             | is only the 1.5 pro and not ultra ?
        
         | jjbinx007 wrote:
         | They can't decide on a single name for a chat application so I
         | think expecting them to come up with a sensible naming
         | suggestion is optimistic at best.
        
         | aqme28 wrote:
         | Furthermore, is a minor version upgrade two months later really
         | "next generation"?
        
           | philote wrote:
           | Well if it's from 1 to 1.5 then it's really 5 minor version
           | upgrades at once. And since 1.5 is halfway to 2 and you round
           | up, it's next generation!
        
           | AndroTux wrote:
           | Maybe it's not a "next generation" model, but rather their
           | next model for text generation ;)
        
             | cchance wrote:
             | I mean i don't see any other models watching and answering
             | questions about a 44 minute video lol
        
         | nkozyra wrote:
         | Their inability to name things sensibly has been called out for
         | years and it doesn't look like they care?
         | 
         | I'm not sure what the deal is, it has to be a marketing
         | hinderance as every major tech company is trying to claw their
         | way up the AI service mountain. Seems like the first step would
         | be cogent naming.
        
           | data-ottawa wrote:
           | It would have been better as Gemini Lite, Gemini, Gemini Pro,
           | and then v1, v1.5 for model bumps.
           | 
           | Ultra vs pro vs nano with Ultra unlocked by buying Gemini
           | Advanced is confusing.
           | 
           | I'm also not sure why they make base Gemini available after
           | you have Advanced, because presumably there's no reason to
           | use a worse model.
        
         | Alifatisk wrote:
         | I understod the transition as following.
         | 
         | Google Bard to Google Gemini is what they call Gemini 1.0.
         | 
         | Gemini consists of Gemini Nano, Gemini Pro, & Gemini Ultra.
         | 
         | Gemini Nano is for embedded and portable devices I guess? The
         | free version of Gemini (gemini.google.com) is Gemini Pro. The
         | paid version, called Gemini Advanced is using Gemini Ultra.
         | 
         | What we're reading now is about Gemini Pro version 1.0
         | switching to version 1.5 as of today.
        
           | meowface wrote:
           | That just made my head spin even more. (Like, I get it, but
           | it's just a very tortuous naming system.) The free version is
           | called Pro, Gemini Advanced is actually Gemini Ultra, the
           | less powerful version upgraded to the more powerful model but
           | the more powerful version is on the less powerful model.
           | 
           | People make fun of OpenAI for not using product names and
           | just calling it "GPT" but at least it's straightforward: 2,
           | 3, 3.5, 4. (On the API side it's a little more complicated
           | since there's "turbo" and "instruct" but that isn't exposed
           | to users, and turbo is basically the default.)
        
             | kweingar wrote:
             | But you don't pay for GPT-4, you pay for a product called
             | ChatGPT Plus, which allows you to write 40 messages to
             | GPT-4 within a three-hour time window, after which you need
             | to switch to 3.5 in the menu.
        
           | code51 wrote:
           | but if Vertex AI is using Gemini Ultra, then why makersuite
           | (aisuite now? hmmm) showing only "Gemini 1.0 Pro 001" (001: a
           | version inside a version)
           | 
           | and why have makersuite/aisuite in the first place, if Vertex
           | AI is the center for all things AI? and why aitestkitchen?
           | 
           | I'm seeing only Gemini 1.0 Pro on Vertex AI. So even if I
           | enabled Google Gemini Advanced (Ultra?), enabled Vertex AI
           | API access, I have to first be blessed by Google to access
           | advanced APIs.
           | 
           | It seems paying for their service doesn't mean anything to
           | Google at this point. As a developer, you have to jump
           | through hoops first.
        
             | Alifatisk wrote:
             | I think this answers why you can't see Ultra.
             | 
             | "Gemini 1.0 Ultra, our most sophisticated and capable model
             | for complex tasks, is now generally available on Vertex AI
             | for customers via allowlist."
             | 
             | https://cloud.google.com/blog/products/ai-machine-
             | learning/g...
        
           | growt wrote:
           | It was probably not a wise choise to give the model itself
           | and the product the same name: "Gemini Advanced is using
           | Gemini Ultra". Also: "The free version ... is Gemini Pro" -
           | is not what you usually see out there.
        
         | sho_hn wrote:
         | It's not that difficult.
         | 
         | Their LLM brand is now Gemini. Gemini comes in three different
         | sizes, Nano/Pro/Ultra.
         | 
         | They recently released 1.0 versions of each, most recently (a
         | few months after Nano and Pro) Ultra.
         | 
         | Today they are introducing version 1.5, starting with the Pro
         | size. They say 1.5 Pro offers comparable performance to 1.0
         | Ultra, along with new abilities (token window size).
         | 
         | (I agree Small/Medium/Large would be better.)
        
           | apwell23 wrote:
           | What you described is difficult.
        
             | mcmcmc wrote:
             | It's really not. Substitute Gemini for iPhone. Apple
             | releases an iPhone model in mini, standard, and pro lines.
             | They announce iPhone model+1 but are releasing the pro
             | version first. Still difficult?
        
               | apwell23 wrote:
               | > Apple releases an iPhone model in mini, standard, and
               | pro lines.
               | 
               | not an iphone user but just looked at iphone 15. Don't
               | see any mini version. I am guess 'standard' is called
               | just 'iphone' ? Is pro same thing as plus ?
               | 
               | https://www.apple.com/shop/buy-iphone/iphone-15
               | 
               | > Still difficult?
               | 
               | yes your example made it even more confusing.
        
               | mcmcmc wrote:
               | Now you're being intentionally difficult. Do you want it
               | to be cars? Last year $Automaker released $Sedan 2023 in
               | basic, standard, and luxury trims. This year $Automaker
               | announced $Sedan 2024 but so far have only announced the
               | standard trim. If I had meant the iPhone 15 specifically
               | I would've said iPhone 15. I think the 12 was the last
               | mini? The point is product families are often released in
               | generations (versions in the case of Gemini) and with
               | different available specs (ultra/pro/nano etc) that may
               | not all be released at the same time.
        
               | dpkirchner wrote:
               | Apple discontinued mini phones two generations back,
               | unfortunately.
        
               | sho_hn wrote:
               | I think it's the "iPhone +1 Mini is as fast as the old
               | Standard" that confuses people here. This is obvious and
               | expected but not how it's usually marketed I guess ...
        
               | chatmasta wrote:
               | So Google will be upgrading the version number of each
               | model at the same time? Based on other comments here,
               | that's not the case - some are 1.5 and some are 1?
               | 
               | Apple doesn't announce the iPhone 12 Mini and compare it
               | to the iPhone 11 Pro.
        
               | iamdelirium wrote:
               | Uhh, yes they do?
               | 
               | Did you watch the announcements for the M2 and M3 pros?
               | They compared it to the previous generations all the
               | time.
        
             | huytersd wrote:
             | How? Three models Nano/Pro/Ultra currently at 1.0. New
             | upgrades just increment the version number.
        
           | Alifatisk wrote:
           | They should remove the name Gemini Advanced and just stick to
           | one name
        
             | sho_hn wrote:
             | Agreed.
             | 
             | Gemini Advanced seems to be the brand name for the higher
             | price tier for the end-user frontend that gets you Ultra
             | access, similar how ChatGPT Plus gets you ChatGPT 4.
             | 
             | I get it, but it does beg the question whether you will
             | need Advanced now to get 1.5 Pro. Or does everyone get Pro,
             | making it useless to pay for 1.0 Ultra?
             | 
             | I still don't think it's _confusing_ , but that part is
             | definitely messy.
        
           | OJFord wrote:
           | > , starting with the Pro size
           | 
           | This is where it gets confusing IMO.
           | 
           | It's like if Apple announced macOS Blabahee, starting with
           | Mini, not long after releasing Pro and Air touting benefits
           | of Sonoma.
           | 
           | Also, just.. this is how TFA _begins_ :
           | 
           | > Last week, we rolled out our most capable model, Gemini 1.0
           | Ultra, [...] Our teams continue pushing the frontiers of our
           | latest models with safety at the core. They are making rapid
           | progress. [...] 1.5 Pro achieves comparable quality to 1.0
           | Ultra
           | 
           | Last week! And now we have next generation. And the wow is
           | that it's comparable to the best of the previous generation.
           | Ok fine at a smaller size, but also that's all we get anyway.
           | Oh and the _most_ capable remains the last generation one. As
           | long as it 's the biggest one.
        
             | crazygringo wrote:
             | It's almost exactly like Apple, actually, with their M1 and
             | M2 chips available in different sizes, launching at
             | different times in different products.
             | 
             | It's really not that confusing. There are different sizes
             | and different generations, coming out at different times.
             | This pattern is practically as old as computing itself.
             | 
             | I can't even imagine what alternative naming scheme would
             | be an improvement.
        
               | OJFord wrote:
               | Don't go thinking I'm an Apple 'fanboy', I don't have any
               | Apple devices at the moment, but I really can't imagine
               | them launching a next gen product that isn't better than
               | the best of the last gen.
               | 
               | I doubt they launched M2 MBAs while the MBP was running
               | M1, for example. Or more directly, a low-mid spec M3 MBP
               | while the top-spec M2 MBP (I assume that would out-
               | benchmark it?) still for sale and no comparable M3 chip
               | yet.
               | 
               | It's not having the matrix of size/power & generation
               | that's confusing, it's the 'next generation' one
               | initially launched not being the best. I think that's
               | mainly it for me anyway.
        
               | crazygringo wrote:
               | > _but I really can 't imagine them launching a next gen
               | product that isn't better than the best of the last gen._
               | 
               | But they have. The baseline M2 is significantly less
               | powerful than the M1 Max.
               | 
               | What Google's doing is basically exactly like that. It
               | happens all the time that the mid tier of the next
               | generation isn't as good as the top tier of the previous
               | generation. It might even be the norm.
        
               | OJFord wrote:
               | Sure but did they release the baseline M2 first, before
               | higher end M2s were available?
        
               | crazygringo wrote:
               | I don't understand what that has to do with anything.
               | 
               | There isn't a set order to things. Sometimes companies
               | release a higher powered version first and then the
               | budget version later, sometimes an entry-level version
               | first and a pro version after. Sometimes both
               | simultaneously. All of these are normal, and can even
               | follow different orders generation to generation.
        
               | astrange wrote:
               | More powerful isn't the same thing as better. Among other
               | things, better means performance/battery life tradeoff.
        
             | matwood wrote:
             | > Last week! And now we have next generation.
             | 
             | Google got caught completely flat footed by OpenAI. I'm
             | going to cut them some slack that they want to show the
             | world a bit of flex with their AI chops as soon as they
             | have results.
        
           | Keyframe wrote:
           | What's Advanced then, chat? Also, by that, 1.5 Ultra is then
           | still to come and it'll show even bigger guns.
        
             | sho_hn wrote:
             | Yes, my understanding is also there will be a 1.5 Ultra.
             | 
             | It's however nowhere explicitly said that I could find. The
             | Technical Report PDF also avoids even hinting at it.
             | 
             | Advanced is a price/service tier for the end-user frontend.
             | At the moment it gets you 1.0 Ultra access vs. 1.0 Pro for
             | the free version. Similar to how ChatGPT Plus gives you 4
             | instead of 3.5.
             | 
             | I agree this part is messy. Does everyone who had Pro
             | already get 1.5 Pro? If 1.5 Pro is better than 1.0 Ultra,
             | why pay for Advanced? Is 1.5 Pro behind the Advanced
             | paywall? etc.
        
               | Keyframe wrote:
               | Ok, so from what I've gathered then from all of the
               | comments so far, primary confusion is that both Chat
               | service and llm models are named the same.
               | 
               | There are three models: nano/pro/ultra and all are at
               | v1.0
               | 
               | There are two tiers of chat service: basic and pro
               | 
               | There is AIStudio from google through which you can
               | interact with / use directly gemini llms.
               | 
               | Chat service Gemini basic (free) uses Gemini Pro 1.0 llm.
               | 
               | Chat service Gemini advanced uses Gemini Ultra 1.0 llm.
               | 
               | What was shown is ~~Ultra~~ Pro 1.5 LLM which is / will
               | be available to select few for preview to be used via
               | AIStudio.
               | 
               | That leaves a question, what's nano for, and is it only
               | used via AIStudio/API?
               | 
               | Jesus, Google..
        
               | sho_hn wrote:
               | No, what they showed is Pro 1.5. Only via API and on a
               | waitlist.
               | 
               | How this relates to the end-user chat service/price tiers
               | is still unknown.
               | 
               | The best scenario would be that they just move Gemini
               | free and Advanced tiers to Pro 1.5 and Ultra 1.5, I
               | guess.
        
               | Keyframe wrote:
               | Yes, you are right. I meant Pro. Let's see then.
        
               | j16sdiz wrote:
               | Nano is the on-device (Pixel phone) model.
        
           | mvkel wrote:
           | So there's Nano 1.0, Pro 1.5, Ultra 1.0, but Pro 1.5 can only
           | be accessed if you're a Vertex AI user (wtf is Vertex)?
           | 
           | That's very difficult.
        
             | sho_hn wrote:
             | It's a bit similar to how new OpenAI stuff is initially
             | usually partner-only or waitlisted.
             | 
             | Vertex AI is their developer API platform.
             | 
             | I agree OpenAI is a bit better at launching for customers
             | on ChatGPT alongside API.
        
           | pentagrama wrote:
           | Thank you, is more clear to me now. But I also read in some
           | Google announcement about "Gemini Advanced", do you know what
           | is that and the relation with the Nano/Pro/Ultra levels?
        
             | sho_hn wrote:
             | Gemini is also the brand name for the end-user web and
             | phone chatbot apps, think ChatGPT (app) vs. GPT-# (model).
             | 
             | Gemini Advanced is the paid subscription service tier that
             | at the moment gets you access to the Ultra model, similar
             | to how a ChatGPT Plus subscription gets you access to
             | GPT-4.
             | 
             | Honestly, they should have called this part Gemini Chat and
             | Gemini Chat Plus, but of course ego won't let them follow
             | the competitor's naming scheme.
        
               | pentagrama wrote:
               | Oh, I understand, thank you. To me with the "Gemini
               | Advanced" they screwed the naming scheme.
               | 
               | With an already complex naming for regular consumers
               | (Nano/Pro/Ultra each one with a 1.x), adding this
               | Advanced thing it becomes and spaghetti.
               | 
               | I understand that for most people may be just a chat
               | input and don't care, but if people will consider to pay,
               | they will research a bit and is confusing.
        
           | screye wrote:
           | Gemini ultra 1.0 never went GA. So it is wierd that they'd
           | release 1.5 when most can't even get their hands on 1.0
           | ultra.
        
             | gkbrk wrote:
             | Isn't the paid version on https://gemini.google.com Gemini
             | 1.0 Ultra?
        
           | mrcwinn wrote:
           | I'm sure they had a discussion that size-based identifiers
           | might imply models are primarily differentiated on the amount
           | of knowledge they have. From that standpoint I don't agree
           | S/M/L would have been better.
        
         | gmuslera wrote:
         | Maybe they should take a hint on Windows versions name scheme
         | and call the next version Gemini Meh.
        
           | apapapa wrote:
           | Are you talking about Xbox one?
        
             | mring33621 wrote:
             | No. Gemini Purple Plus Platinum Advanced Home Version
             | 11.P17
        
             | gmuslera wrote:
             | You didn't know about Windows Meh? Not sure about the
             | spelling.
        
         | kccqzy wrote:
         | Dear OpenAI please fix your names and versioning. Why do you
         | have GPT-3 and GPT-3.5? What happened between 3 and 3.5? And
         | why isn't GPT-3 a single model? Why are there variations like
         | GPT-3-6.7B and GPT-3-175b? And why is there now a turbo
         | version? How does turbo compared to 4? And what's the
         | relationship between the end-user product ChatGPT and a
         | specific GPT model?
         | 
         | You see this problem isn't unique to Google.
        
         | lordswork wrote:
         | See https://news.ycombinator.com/item?id=39385230
        
         | cchance wrote:
         | This just means we'll be getting a Nano 1.5 and Ultra 1.5
         | 
         | and if Pro 1.5 is this good holy shit what will Ultra be...
         | 
         | Nano/Pro/Ultra are the model sizes, 1.0 or 1.5 is the version
        
       | hamburga wrote:
       | "One of the key differentiators of this model is its incredibly
       | long context capabilities, supporting millions of tokens of
       | multimodal input. The multimodal capabilities of the model means
       | you can interact in sophisticated ways with entire books, very
       | long document collections, codebases of hundreds of thousands of
       | lines across hundreds of files, full movies, entire podcast
       | series, and more."
        
         | skywhopper wrote:
         | This is nice, but it's hard to judge how nice without knowing
         | more about how much compute and memory is involved in that
         | level of processing. Obviously Google isn't going to tell us,
         | but without having some idea it's impossible to judge whether
         | this is an economically sustainable technology on which to
         | start building dependencies in my own business.
        
           | criddell wrote:
           | Sustainable? The countdown to cancellation on this project is
           | already underway.
           | 
           | "Does it make sense today?" is really the only question you
           | can ask and then build dependencies with the understanding
           | that the entire thing will go away in 3-7 years.
        
       | freediver wrote:
       | It would do Google a lot of service if every such announcement is
       | not met with 'join the waitlist' and 'talk to your vertex ai
       | team'.
        
         | baq wrote:
         | Yeah compared to e.g. Apple's 'here's our new iWidget 42 pro,
         | you can buy it now' it's at best disappointing.
        
           | apozem wrote:
           | Apple is good about only announcing real products you can
           | buy. They don't do tech demos. It's always, "here's a
           | problem. the new apple watch solves it. here're five other
           | things the watch does. $399."
        
             | erkt wrote:
             | The verdict is not yet out on the Vision Pro but otherwise
             | your point stands.
        
             | amarant wrote:
             | Apple is indeed masterful at advertising. Google, somewhat
             | ironically, is really bad at it.
        
               | matwood wrote:
               | Apple is masterful at product, not just the advertising
               | part. Google builds cool technology then fails and the
               | product side.
        
             | xnx wrote:
             | I agree that Apple does a better job, but wasn't Apple
             | Vision Pro announced 240 days before you could get it? I
             | think it's a pretty safe bet that Gemini 1.5 (or something
             | better) will be available anyone who wants to use it in the
             | next 240 days.
        
               | nacs wrote:
               | AI software release cycles are incredibly short right
               | now. Every month, there is some major development
               | released in a _usable right now_ form.
               | 
               | The first of it's type AR/VR hardware has,
               | understandably, a longer release cycle. Also, Apple
               | announced early to drive up developer interest.
        
               | manquer wrote:
               | AVP was the exception than norm.
               | 
               | Apple aggressively keeps products under wraps before
               | launch fires employees and vendors for leaking any sort
               | of news to the press .
               | 
               | Also an hardware product that is miles ahead of
               | competition in terms of components and also needs complex
               | setup workflow (for head and eyes) something apple has
               | not done before being 7-8 months after announcing is not
               | really comparable with a SaaS API in terms of delays
        
         | brianjking wrote:
         | 100%, I can't even use Imagen despite being an early tester of
         | Vertex.
        
         | belval wrote:
         | They can't do that because only they are the incorruptible
         | stewards empowered with the ability to develop these models,
         | making them accessible to the unwashed masses would be
         | irresponsible!
        
           | ethanbond wrote:
           | The victim complex on this topic is getting really old.
           | 
           | They're an enterprise software company doing an enterprise
           | sales motion.
        
             | belval wrote:
             | If that was true, they wouldn't have named it Gemini 1.5 to
             | follow the half-point increment of ChatGPT, they
             | desperately want "people" to care about their product to
             | gain back their mindshare.
             | 
             | Anthropic's Claude targets mostly business use cases and
             | you don't see them write self-congratulating articles about
             | Claude v2.1, they just pushed the product.
        
               | eropple wrote:
               | Mindshare is part of enterprise sales, yes.
               | 
               | I work at a very large company and everyone knows about
               | ChatGPT and Gemini (in part because we for our sins have
               | a good chunk of GCP stuff), but I doubt anyone here not
               | doing some LLM-flavored development has ever even heard
               | of Anthropic, let alone Claude.
        
               | KirinDave wrote:
               | And look at how well it's going for Claude. Their primary
               | claim to fame is being called "an annoying coworker" and
               | that's it.
               | 
               | Why would anyone look to form a contract with Anthropic
               | right now? I'd say they're in danger here, because their
               | models and offerings don't have clear value propositions
               | to customers.
        
               | ac29 wrote:
               | Claude 2.1 certainly got a news post when it was
               | released: https://www.anthropic.com/news/claude-2-1
               | 
               | Seems reasonably similar in tone to the Google post.
        
             | dkjaudyeqooe wrote:
             | > They're an enterprise software company
             | 
             | Really? Someone ought to tell them.
        
         | stavros wrote:
         | I'm generally an excited early adopter, but this kills my
         | excitement immediately. I don't know if Gemini is out (or which
         | Gemini is out) because I've associated Google with "you can't
         | try their stuff", so I've learned to just ignore everything
         | about Gemini.
        
           | hbn wrote:
           | Google is really good at diluting any possible anticipation
           | hardcore users might have for new stuff they do. 10 years ago
           | I loved when there was a big update to one of their Android
           | apps and I could sideload the apk from the internet to try it
           | out early. Then they made all those changes A/B tests
           | controlled by server side flags that would randomly turn
           | themselves on and off, and there was no way to opt in or out.
           | That was one of the (many) moves that contributed to my
           | becoming disenchanted with Android.
        
           | petre wrote:
           | There is a Gemini service that you can use with your Google
           | account, but it's kind of meh as it repeats your input, makes
           | all sorts of assumptions. I am confused as well about the
           | version. There's a link to another premium version (1.5?) on
           | its page, to which I don't have access to without completing
           | a quest which likely ends with a credit card input. That
           | kills it for me.
        
             | yborg wrote:
             | Or can't use ... I have a newish work account and
             | downloaded Gemini on a Pixel 8 Pro and get "Gemini isn't
             | available" and "Try again later" with no explanation of why
             | not and when.
        
               | petre wrote:
               | This is it. Not a phone app, did not install anything.
               | Maybe your account is not old enough? You're not missing
               | anything anyway.
               | 
               | https://gemini.google.com/
               | 
               | Look, it now has totally useless suggestions like it was
               | trained on burned out woke IT workers. I asked it about
               | the weather, sea temperature and wave height and period
               | in Malaga, which is much less boring than the choices it
               | came up with. First it tried to talk me out of it waving
               | away responsibility, then it provided useful climate
               | data, which I would have wasted too much time doing a
               | Google search on. I guess it's good for checking on the
               | weather if you can put up with the waivers. Also it knows
               | fishing for garfish in Denmark in May is not a total
               | waste of your time, a great way to experience local
               | culture and a sustainable activity.
               | 
               | I also asked it about the version: "I am currently
               | running on the Gemini Pro 1.01.5 model".
        
           | skybrian wrote:
           | I think the way to understand this is to realize that this
           | isn't targeted at a Hacker News audience and they don't care
           | what we think. The world doesn't revolve around us.
           | 
           | What's the goal? Maybe, being able to work with partners
           | without it being a secret project that will inevitably leak,
           | resulting in inaccurate stories in the press. What are non-
           | goals? Driving sales or creating anticipation with a mass
           | audience, like a movie trailer or an Apple product launch.
           | 
           | So they have to announce something, but most people don't
           | read Hacker News and won't even hear about it until later,
           | and that's fine with them.
        
         | jpeter wrote:
         | and not having to wait months if you live in EU
        
           | lxgr wrote:
           | What's worse is that I can't seem to find a way to let Google
           | know where I actually live (as opposed to where I am
           | temporarily traveling, what country my currently inserted SIM
           | card is from etc). And apparently there is no way to do this
           | at all without owning an Android device!
           | 
           | Apple at least lets me change this by moving my iTunes/App
           | Store account, which is its own ordeal and far from ideal,
           | but at least there's a defined process: Tell us where you
           | think you live, provide a form of payment from that place,
           | maybe we'll believe you.
        
             | TillE wrote:
             | Yeah Google aggressively uses geolocation throughout their
             | services, regardless of your language settings. The
             | flipside of that is that it's really easy to access the
             | latest Gemini or whatever by just using a VPN.
        
               | lxgr wrote:
               | Wait, does that mean if I subscribe to Gemini Pro in
               | country A where it's available (e.g. the US) but travel
               | to Europe, I can't use it?
               | 
               | I'm really frustrated by Google's attitude of "we know
               | better where you are than you do". People travel
               | sometimes and that's not the same thing as moving!
        
               | FergusArgyll wrote:
               | I signed up for all of their AI products when I was in
               | the US, some of them work while I'm out of country some
               | don't. I can't tell what the rule is...
        
               | lxgr wrote:
               | I really, really hate all of these geo heuristics. Sure,
               | don't advertise services to people outside of your
               | market, I get that. Do ask for a payment method from that
               | country too to provide your market-specific pricing if
               | you must.
               | 
               | But once I'm a paying customer, I want to use the thing
               | I'm paying for from where I am without jumping through
               | ridiculous hoops!
               | 
               | The worst variant of this I've seen is when you can
               | neither use _nor cancel the subscription_ from outside a
               | supported market.
        
               | FergusArgyll wrote:
               | To be clear, I didn't pay for any of them. I just signed
               | up for early access to every product that uses some form
               | of ML that can remotely be called "AI"...
               | 
               | Once I got accepted, some of them work outside of the US
               | and some don't
        
         | hobofan wrote:
         | Eh, I think it's about as bad as the OpenAI method of
         | officially announcing something and then "continuously rolling
         | it out to all subscribers" which may be anything between a few
         | days and months.
        
         | addandsubtract wrote:
         | Remember when Gmail was new and you needed an invite to join? I
         | guess Google is stuck in 2004.
        
           | bobchadwick wrote:
           | I'm embarrassed to admit that I bought a Gmail invite on eBay
           | for $6 when it was still invite-only.
        
             | agumonkey wrote:
             | Yielding a priceless anecdote
        
             | blagie wrote:
             | _shrug_ It probably gave you months of fun.
        
             | jprete wrote:
             | That's not entirely a waste, it would have given you a
             | better chance for an email address you wanted.
        
               | CydeWeys wrote:
               | Yeah. I ended up with an eight letter @gmail.com because
               | I dithered, but if I'd signed up by any means necessary
               | when I'd first heard of it, I would've gotten a four
               | letter one.
        
             | rocketbop wrote:
             | Nothing to be ashamed of. I think I might have bought a
             | Google Wave invite a couple of years later :/
        
             | spiffytech wrote:
             | I bartered on gmailswap.com, sending someone a bicentennial
             | 50C/ US coin in exchange for an invite.
             | 
             | The envelope made it to the recipient, but the coin fell
             | out in transit because I was young and had no idea how to
             | mail coinage. They graciously gave me the invite anyway.
        
               | ssteeper wrote:
               | Ah, to be young and clueless about coinage mailing.
        
             | LouisSayers wrote:
             | Well they did promise unlimited space - remember how it
             | kept growing? I guess until it didn't...
             | 
             | But still, compared to Hotmail etc the free storage space
             | (something like 1GB vs 10MB) was well worth $6
        
           | moffkalast wrote:
           | They don't seem to remember when that literally sunk Google+
           | because people had no use for a social network without their
           | friends on it.
        
         | bachmeier wrote:
         | This is bad practice across the board IMO. There seems to be an
         | idea that this builds anticipation for new products. Sounds
         | good in a PowerPoint presentation by an MBA but doesn't work in
         | practice. Six months (or more!) after joining a waitlist, I'm
         | not seeing it for the first time, so I don't really care when
         | yet another email selling me something hits my inbox. I may not
         | even open the email. This could be mitigated somewhat by at
         | least offering a demo, but that's rare.
        
           | bushbaba wrote:
           | Likely they have limited capacity and are alloting things for
           | highest paying and strategic customers
        
             | eitally wrote:
             | As someone who worked in Google Cloud's partnerships team,
             | the way the Early Access Program, not to mention the Alpha
             | --> Beta --> GA launch process for AI products, works, is
             | really dysfunctional. Inevitably what happens is that a few
             | strategic customers or partners get exceptionally early
             | (Alpha) access and work directly with the product team to
             | refine things, fix bugs and iron out kinks. This is great
             | and the way market driven product development _should_
             | work.
             | 
             | The issues arise with the subsequent stagegate graduation
             | processes, requirements and launches to less restricted
             | markets. It's inconsistent, the QoS pre-GA customers
             | receive is often spotty and the products come with no SLAs,
             | and -- just like Gmail on the consumer side -- things
             | frequently stay in EAP/Beta phase for years with no
             | reliable timeline for launch. ... and then often they're
             | killed before they get to GA, even though they may have
             | been being used by EAP customers for upwards of 1-2 years.
             | 
             | I drafted a new EAP model a few years ago when Google's
             | Cloud AI & Industry Solutions org was in the process of
             | productizing things like the retail recommendation engine
             | and Manufacturing Data Engine, and had all the buy-ins from
             | stakeholders on the GTM side ... but the CAIIS GM never
             | signed off. Subsequently, both the GM & VP Product of that
             | org have been forced out.
             | 
             | In my opinion, this is something Microsoft does very well
             | and Google desperately needs to learn. If they pick up
             | anything from their hyperscaler competitors it should be 1)
             | how to successfully become a market driven engineering
             | company from MSFT and 2) how to never kill products (and
             | not punish employees for only doing KTLO work) from AMZN.
        
             | moralestapia wrote:
             | So tactical, wow. Meanwhile OpenAI and others will eat
             | their lunch _again_.
        
               | bushbaba wrote:
               | Agreed. OpenAI also doesn't need to grock with
               | Shareholders fearing a GDPR like-fine. Sadly the larger
               | you are the bigger the pain is from small mistakes.
        
           | justrealist wrote:
           | One PM in 2005 knocked it out of the park with Gmail and
           | every Google PM since then has cargo-culted it.
        
         | kkzz99 wrote:
         | Its because they don't want you to actually use it and see how
         | far behind they are compared to other companies. These
         | announcement are meant to placate investors. "See, we are doing
         | a lot of SotA AI too".
        
           | Keyframe wrote:
           | You might be right, but other things from Google tell the
           | same story. For example, I recently tried to get ahold of
           | Pixel 8 Pro. Had to import one from UK, and when I did, turns
           | out that new feature of using thermometer on humans isn't
           | available outside of USA. It doesn't even seem that process
           | to certificate it outside of USA is in play. Google and
           | sales/support just aren't a thing like with Apple, as a
           | contrast. Which is a total shame. I know Google is strong, if
           | not strongest in the game of tech, they just need to get
           | their act together and I believe in them succeeding in that,
           | but sales and support was never in their DNA. Not sure if
           | that can be changed.
           | 
           | I'm more than happy to transfer my monthly $20 to google from
           | OpenAI, on top of my youtube and google one subscription.
           | It's up to Google to take it.
        
         | quatrefoil wrote:
         | It lets the company control the narrative, without the
         | distraction of fifty tech bloggers test-driving it and posting
         | divergent opinions or findings. Instead, the conversation is
         | anchored to what the company claims about the product.
         | 
         | It's interesting that it's the opposite of the gaming industry.
         | There, because the reviewers dictate the narrative, the
         | industry is better at ferreting out bogus claims. On the flip
         | side, loud voices sometimes steamroll over decent products
         | because of some ideological vendetta.
        
         | anonzzzies wrote:
         | And region based. Yawn.
        
         | mil22 wrote:
         | Totally agree with this. I can see the desire to show off, but
         | I don't understand how anyone can believe this is good
         | marketing strategy. Any initial excitement I get from reading
         | such announcements will be immediately extinguished when I
         | discover I can't use the product yet. The primary impression I
         | receive of the product is "vaporware." By the time it does get
         | released I'll already have forgotten the details of the
         | announcement, lost enthusiasm, and invested my time in a
         | different product. When I'm choosing between AI services, I'll
         | be thinking "no, I can't choose Gemini Pro 1.5 because it's not
         | available yet, and who knows when it will be available or how
         | good it'll be." Then when they make their next announcement,
         | I'll be even less likely to give it any attention.
        
         | bobvanluijt wrote:
         | I have access and will share some learnings soon
        
         | whywhywhywhy wrote:
         | After the complete farce that was the last 90% faked video of
         | their tech, maybe just give us a text box we can talk to the
         | thing and see it working ourselves next time.
         | 
         | Like it's shocking to me, are management really so clueless
         | they don't realize how far behind they are? This isn't 2010
         | Google, your not the company that made your success anymore and
         | in a decade the only two sure fire things that will still exist
         | are android and chrome. Search, Maps, Youtube are all in
         | precarious positions that the right team could dethrone.
        
         | summerlight wrote:
         | I believe this is a standard practice in Google whenever they
         | need to launch a change expected to consume huge resources and
         | they cannot reasonably predict the demand. Though I agree that
         | this is a bad PR practice; waitlist should be considered as a
         | compromise, not a PR technique.
        
         | crazygringo wrote:
         | These announcements are mainly for investors and other people
         | interested in planning purposes. It's important to know the
         | roadmap. More information is better.
         | 
         | I get that it's frustrating not to be able to play with it
         | immediately, but that's just life. Announcing things in advance
         | is still a valuable service for a lot of people.
         | 
         | Plus tons of people have been claiming that Google has somehow
         | fallen behind in the AI race, so it's important for them to
         | counteract that narrative. Making their roadmap more visible is
         | a legitimate strategy for that.
        
         | dpkirchner wrote:
         | I wrote off the PS5 because of waitlists. I was surprised to
         | learn just yesterday that they are now actually, honestly
         | purchasable (what I would consider "released").
         | 
         | I guess I let my original impression anchor my long-term
         | feelings about the product. Oh well.
        
         | TheFragenTaken wrote:
         | It's probably going to be dead/deprecated in a year, so maybe
         | there's a silver lining to how hard it is to get to use the
         | service. I, for one, wouldn't "build with Gemini".
        
         | animex wrote:
         | I don't think I've ever engaged with a product after "joining
         | their waitlist". By the time they end up utilizing that funnel,
         | competitors have already released feature upgrades or new
         | products cannibalizing their offering.
        
       | alphabetting wrote:
       | Massive whoa if true from technical report
       | 
       | "Studying the limits of Gemini 1.5 Pro's long-context ability, we
       | find continued improvement in next-token prediction and near-
       | perfect retrieval (>99%) up to at least 10M tokens"
       | 
       | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
         | stavros wrote:
         | Until I can talk to it, I care exactly zero.
        
           | peterisza wrote:
           | you can buy their stock if you think they'll make a lot of
           | money with their tech
        
             | HarHarVeryFunny wrote:
             | Well that's really the right question .. what can, and
             | will, Google do with this that can move their corporate
             | earnings needle in a meaningful way? Obviously they can
             | sell API access and integrate it into their Google docs
             | suite, as well as their new Project IDX IDE, but do any of
             | these have potential to make a meaningful impact ?
             | 
             | It's also not obvious how these huge models will fare
             | against increasingly capable open source ones like Mixtral,
             | perhaps especially since Google are confirming here that
             | MoE is the path forward, which perhaps helps limit how big
             | these models need to be.
        
               | plaidfuji wrote:
               | In the long run it could move the needle in enterprise
               | market share of Workspace and GCP. They have a lot of
               | room to grow and IMO have a far superior product to
               | O365/Azure which could be exacerbated by strong AI
               | products. Only problem is this sales cycle can take a
               | decade or more, and Google hasn't historically been
               | patient or strategic about things like this.
        
         | megaman821 wrote:
         | So, will this outperform any RAG approach as long as the data
         | fits inside the context window?
        
           | ArcaneMoose wrote:
           | Cost would still be a big concern
        
           | saliagato wrote:
           | basically, yes. Pinecone? Dead. Azure AI Search? Dead.
           | Quadrant? Dead.
        
             | _boffin_ wrote:
             | Prompt token cost still a variable.
        
           | TheGeminon wrote:
           | Outperform is dependent on the RAG approach (and this would
           | be a RAG approach anyways, you can already do this with
           | smaller context sizes). A simplistic one, probably, but
           | dumping in data that you don't need dilutes the useful
           | information, so I would imagine there would be at least
           | _some_ degradation.
           | 
           | But there is also the downside of "tuning" the RAG to return
           | less tokens you will miss extra context that could be useful
           | to the model.
        
             | megaman821 wrote:
             | Doesn't their needle/haystack benchmark seem to suggest
             | there is almost no dilution? They pushed that demo out to
             | 10M tokens.
        
           | CuriouslyC wrote:
           | A perfect RAG system would probably outperform everything in
           | a larger context due to prompt dilution, but in the real
           | world putting everything in context will win a lot of the
           | time. The large context system will also almost certainly be
           | more usable due to elimination of retrieval latency. The
           | large context system might lose on price/performance though.
        
           | chasd00 wrote:
           | are you going to upload 10M tokens to Gemini on every
           | request? That's a lot of data moving around when the user is
           | expecting a near realtime response. Seems like it would still
           | be better to only set the context with information relevant
           | to the user's prompt which is what plain rag does.
        
         | Workaccount2 wrote:
         | 10M tokens is absolutely jaw dropping. For reference, this is
         | approximately thirty books of 500 pages each.
         | 
         | Having 99% retrieval is nuts too. Models tend to unwind pretty
         | badly as the context (tokens) grows.
         | 
         | Put these together and you are getting into the territory of
         | dumping all your company documents, or all your departments
         | documents into a single GPT (or whatever google will call it)
         | and everyone working with that. Wild.
        
           | kranke155 wrote:
           | Seems like Google caught up. Demis is again showing an
           | incredible ability to lead a team to make groundbreaking
           | work.
        
             | huytersd wrote:
             | If any of this is remotely true, not only did it catch up,
             | it's wiping the floor with how useful it can be compared to
             | GPT4. Not going to make a judgement until I can actually
             | try it out though.
        
               | singularity2001 wrote:
               | In the demo videos gemini needs about a minute to answer
               | long context questions. Which is better than reading
               | thousands of pages yourself. But if it has to compete
               | with classical search and skimming it might need some
               | optimization.
        
               | huytersd wrote:
               | That's a compute problem, something that involves just
               | throwing money at the problem.
        
               | a_wild_dandan wrote:
               | Replacing grep or `ctrl+F` with Gemini would be the
               | user's fault, not Gemini's. If classical search for a job
               | already a performant solution, _use classical search_.
               | Save your tokens for jobs worthy of solving with a
               | general intelligence!
        
               | IanCal wrote:
               | If you had this for your business could this approach be
               | faster than RAG?
               | 
               | Input is parsed one token at a time right? Can you cache
               | the state after the initial prompt has been provided?
        
         | matsemann wrote:
         | Could you (or someone) explain what this means?
        
           | FergusArgyll wrote:
           | The input you give it can be very long. This can
           | qualitatively change the experience. Imagine, for example,
           | copy pasting the entire lord of the rings plus another 100
           | books you like and asking it to write a similar book...
        
             | teaearlgraycold wrote:
             | I doubt it's smart enough to write another (coherent, good)
             | book based on 103 books. But you could ask it questions
             | about the books and it would search and synthesize good
             | answers.
        
             | HarHarVeryFunny wrote:
             | I just googled it, and the LOTR trilogy apparently has a
             | total of 480,000 words, which brings home how huge 1M is!
             | It'd be fascinating to see how well Gemini could summarize
             | the plot or reason about it.
             | 
             | One point I'm unclear on is how these huge context sizes
             | are implemented by the various models. Are any of them the
             | actual raw "width of the model" that is propagated through
             | it, or are these all hierarchical summarization and chunk
             | embedding index lookup type tricks?
        
               | mburns wrote:
               | For another reference, Shakespeare's complete works are
               | ~885k words.
               | 
               | The Encyclopedia Britannica is ~44M words.
        
             | staticman2 wrote:
             | Reading Lord of the Rings, and writing a quality book in
             | the same style, are almost wholly unrelated tasks. Over 150
             | million copies of Lord of the Rings have been sold, but few
             | readers are capable of "writing a similar book" in terms of
             | quality. There's no reason to think this would work well.
        
               | pfooti wrote:
               | I mean, Terry Brooks did it with the Sword of Shannara.
               | (/s)
        
           | ehsankia wrote:
           | It's how much text it can consider at a time when generating
           | a response. Basically the size of the prompt. A token is not
           | quite a word but you can think of it as roughly that.
           | Previously, the best most LLMs could do is around 32K. This
           | new model does 1M, and in testing they could put it up to 10M
           | with near perfect retrieval.
           | 
           | As the other comment mentions, you can paste the content of
           | entire books or documents and ask very pointed question about
           | it. Last year, Anthropic was showing off their 100K context
           | window, and that's exactly what they did, they gave it the
           | content of The Great Gatsby and asked it questions about
           | specific lines of the book.
           | 
           | Similarly, imagine giving it hundreds of documents and asking
           | it to spot some specific detail in there.
        
             | liamYC wrote:
             | Awesome explanation, thanks for the comparison
        
         | og_kalu wrote:
         | Another whoa for me
         | 
         | >Finally, we highlight surprising new capabilities of large
         | language models at the frontier; when given a grammar manual
         | for Kalamang, a language with fewer than 200 speakers
         | worldwide, the model learns to translate English to Kalamang at
         | a similar level to a person learning from the same content.
         | 
         | Results - https://imgur.com/a/qXcVNOM
        
           | usaar333 wrote:
           | I think this somewhat is mostly due to the ability to handle
           | high context lengths better. Note how Claude 2.1 already
           | highly outperforms GPT-4 on this task.
        
             | a_wild_dandan wrote:
             | GPT-4V turbo outperforms Claude on long contexts, IIRC.
             | Unless that's mistaken, I'd suspect a different explanation
             | for that task.
        
         | cchance wrote:
         | Did you watch the video of the Gemini 1.5 video recall after it
         | processed the 44 minute video... holy shit
        
       | ranulo wrote:
       | > This new generation also delivers a breakthrough in long-
       | context understanding. We've been able to significantly increase
       | the amount of information our models can process -- running up to
       | 1 million tokens consistently, achieving the longest context
       | window of any large-scale foundation model yet.
       | 
       | Sweet, this opens up so many possibilities.
        
       | tsunamifury wrote:
       | Google is like a nervous and insecure engineer -- blowing their
       | value by rushing the narrative and releasing too much too
       | confusingly fast.
        
         | sho_hn wrote:
         | When OpenAI raced through 3/3.5/4 it was "this team ships" and
         | excitement.
         | 
         | This cargo-cult hate train is getting tiresome. Half the
         | comments on anything Google-related are like this now, and it
         | doesn't add anything to the conversation.
        
           | epiccoleman wrote:
           | The difference, though, as someone who really doesn't have a
           | particular dog in this fight, is that I can go _use_ GPT-4
           | right now, and see for myself whether it 's as exciting as
           | the marketing materials say.
        
             | sho_hn wrote:
             | When OpenAI launched GPT-4, API access was initially behind
             | a waitlist. And they released multiple demo stills of LMM
             | capacilities on launch day that for months were in a
             | limited partner program before they became generally
             | available only 7 months later.
             | 
             | I also want the shiny immediately when I read about it, but
             | I also know when I am acting entitled and don't go spam
             | comment threads about it.
             | 
             | But really, mostly I mean this: It's fine to criticize
             | things, but when half a dozen people have already raised a
             | point in a thread, we don't need more dupes. It really
             | changes signal-to-noise.
        
           | mynameisvlad wrote:
           | Gemini Ultra was announced two months ago. It just launched
           | in the last week. It literally is still the featured post on
           | the AI section of their blog, above this announcement.
           | https://blog.google/technology/ai/
           | 
           | There's "this team ships" and there's "ok maybe wait until at
           | least a few people have used your product before you change
           | it all".
        
             | sho_hn wrote:
             | OpenAI announced GPT-4 image input in mid-March 2023 and
             | made it generally available on the API in November 2023.
             | 
             | Google announced a fancy model two months early and
             | released it in the promised timeframe.
             | 
             | Seems par for the course.
        
               | mynameisvlad wrote:
               | Did OpenAI then announce GPT-5 two weeks after launching
               | GPT-4?
               | 
               | No, of course they didn't. And you're comparing one
               | specific feature (image input) and equating it to a whole
               | model's release date.
               | 
               | Maybe compare apples to apples next time.
               | 
               | People pointing out release/announcement burnout is a
               | reasonable thing; people in general can only deal with
               | the "next new thing" with some breaks to process
               | everything.
        
               | sho_hn wrote:
               | I made the comparison because both companies demonstrated
               | advanced/extended abilities (model size, image input) and
               | shipped it delayed.
        
           | moralestapia wrote:
           | >"this team ships"
           | 
           | Because they actually shipped ... (!)
        
             | moralestapia wrote:
             | ... and they literally just did it again.
             | 
             | https://openai.com/sora
        
       | SushiHippie wrote:
       | Does this mean gemini ultra 1.0 -> gemini ultra 1.5 is the same
       | as gpt-4 -> gpt-4-turbo?
        
         | hackerlight wrote:
         | There's no Gemini Ultra 1.5 yet. Gemini Pro 1.5 is a smaller
         | model than Gemini Ultra 1.0.
        
       | prakhar897 wrote:
       | Can anyone explain how context length is tested? Do they prompt
       | something like:
       | 
       | "Remember val="XXXX" .........10M tokens later....... Print val"
        
         | NhanH wrote:
         | Yep, that's actually a common one
        
         | blovescoffee wrote:
         | Very simplified There are arrays (matrices) that are length 10M
         | inside the model.
         | 
         | It's difficult to make that array longer because training time
         | explodes.
        
         | halflings wrote:
         | Yep that's pretty much it! That's what they call needle in a
         | haystack. See:
         | https://github.com/gkamradt/LLMTest_NeedleInAHaystack
        
         | cchance wrote:
         | yep they hide things throughout the prompt and then ask it
         | about that specific thing, imagine hiding passwords in a giant
         | block of text and then being like, what was bobs password 10
         | million tokens later.
         | 
         | According to this it's remembering with 99% accuracy, which if
         | you think about it is NUTS, can you imagine reading a 22x 1000
         | page books, and remembering every single word that was said
         | with 100% accuracy lol
        
           | foota wrote:
           | Interestingly, there's a decent chance I'd remember if there
           | was an out of context passage saying "the password is
           | FooBar". I wonder if it would be better to test with minor
           | edits? E.g., "what color shirt was X wearing when..."
        
             | ambichook wrote:
             | i feel you would recognise that more as a quirk of how
             | humans think, remember that LLMs think fundamentally
             | differently to you and i. i would be curious about someone
             | making a benchmark like that and using it to compare as an
             | experiment however
        
               | foota wrote:
               | I'm not trying to anthropomorphize the model, but it's
               | not hard to imagine that a model would attribute
               | significance to something completely out of context, and
               | hence "focus" on it when computing attention.
               | 
               | Another possible synthetic benchmark would be to present
               | a list of key value pairs and then ask it for the value
               | corresponding to different keys. Or present a long list
               | of distinct facts and then ask it about them. This latter
               | one could probably be sourced from something like a
               | trivia question and answers data set. I bet there's
               | something like that from Jeopardy.
        
             | kenjackson wrote:
             | I think instead you could just do a full doc of
             | relationships. "Tina and Chris have five children named
             | ..."
             | 
             | Then you can ask it who is Tina's (great)^57 grandmother's
             | twice removed cousin on her father's side?
             | 
             | It would have to be able to remember the context of the
             | relationships up and down the document and there'd be
             | nothing to key into as you could ask about any
             | relationship.
        
       | phoe18 wrote:
       | The branding is very confusing, shouldn't this be Gemini Pro 1.5
       | since the most capable model is called Ultra 1.0?
        
         | macawfish wrote:
         | Extremely confusing!
        
           | butler14 wrote:
           | Maybe they use their own generative AI to do their branding
        
         | dkjaudyeqooe wrote:
         | Can anyone lay out the various models and their features or
         | point to a resource?
         | 
         | I asked the free model (whatever that is) and it wasn't very
         | helpful, alterating betweens a sales bot for Ultra and being
         | somewhat confused itself.
         | 
         | Edit: apparently it goes 1.0 Pro, 1.0 Ultra, 1.5 Pro, 1.5 Ultra
         | and so on.
        
           | Alifatisk wrote:
           | Here's the models,
           | https://news.ycombinator.com/item?id=39304270 This is about
           | Gemini Pro going from version 1.0 to 1.5, nothing else.
           | 
           | Gemini ultra 1.0 is still on version 1.0
        
             | dkjaudyeqooe wrote:
             | That isn't right. The Pro/Ultra exists within each version.
             | 
             | If you look at the Gemini report it refers to "Gemini 1.5",
             | then refers to "Gemini 1.5 Pro" and "Gemini 1.0 Pro" and
             | "Gemini 1.5 Pro".
        
               | Alifatisk wrote:
               | Okey, so if I understand this correctly:
               | 
               | - Gemini 1.5 is the new version of the model Gemini.
               | 
               | - They are at the moment testing it on Gemini Pro and
               | calling it Gemini Pro 1.5
               | 
               | - The testing has shown that Gemini Pro 1.5 is delivering
               | the same quality as Gemini Ultra 1.0 while using less
               | computing power
               | 
               | - Gemini Ultra is still using Gemini 1.0 at the moment
        
             | lordswork wrote:
             | Here's an updated table, with version numbers included and
             | their status:                  Gemini Models
             | gemini.google.com
             | ------------------------------------        Gemini 1.0 Nano
             | Gemini 1.0 Pro        -> Gemini (free)        Gemini 1.0
             | Ultra      -> Gemini Advanced ($20/month)        Gemini 1.5
             | Pro        -> announced on 2024-02-15 [1]        Gemini 1.5
             | Ultra      -> no public announcements (assuming it's
             | coming)
             | 
             | [1]: https://storage.googleapis.com/deepmind-
             | media/gemini/gemini_...
             | 
             | For history of pre-Gemini models at Google, see:
             | https://news.ycombinator.com/item?id=39304441
        
               | Alifatisk wrote:
               | Oh, it's you again! Thanks for the update
        
         | UncleMeat wrote:
         | Google is somehow truly awful at this. I thought it was funny
         | when branding messes happened in 2017. I cried when they
         | announced "Google Meet (original)." Now I don't even know what
         | to do.
         | 
         | I'm stunned that Google hasn't appointed some "name veto
         | person" that can just say "no, you aren't allowed to have three
         | different things called 'Gemini Advanced', 'Gemini Pro', and
         | 'Gemini Ultra.'" Like surely it just takes Sundar saying "this
         | is the stupidest fucking thing I've ever seen" to some SVP to
         | fix this.
        
           | meowface wrote:
           | And somehow the more advanced one is still on 1.0 (for now)
           | and the less advanced one is on 1.5.
        
             | kccqzy wrote:
             | That's like saying it doesn't make sense for Apple to
             | release M3 Pro without simultaneously releasing M3 Ultra.
        
               | meowface wrote:
               | That's very different.
        
               | kccqzy wrote:
               | The only thing that's different is the standard people
               | apply to different companies due to their biases. There
               | are more Apple fanboys on HN than Google fans (Of course,
               | since Google's reputation has been going down for quite a
               | while). Therefore Apple gets a pass. Classic double
               | standard.
        
               | wrasee wrote:
               | It's different because Apple didn't release the M1 Ultra
               | at the same time as the M2 Pro. That would be confusing
               | to buyers because it wouldn't be immediately obvious
               | which one is the better purchase, both being new
               | offerings presented to customers at the same time.
               | 
               | It's understandable that later generations are better and
               | higher tiers are also better, but usually there is some
               | period of time in between generations to help
               | differentiate them. Here we have Google advancing
               | capability on two axes at the same time.
               | 
               | I give them a pass as this field is advancing rapidly. So
               | good for them. But I think it's a legitimate call that it
               | adds complexity to their branding. It is different.
        
         | seydor wrote:
         | We will ask what its real name is as soon as it becomes
         | sentient
        
         | iamdelirium wrote:
         | No? Do you call it the iPhone Pro 15 or the iPhone 15 Pro?
         | Their naming makes sense if you follow most consumer
         | technology.
        
         | summerlight wrote:
         | This is something close to CPU versioning. You have two axis;
         | performance branding and its generation. Nano, Pro and Ultra is
         | something similar to i3, i5 and i7. The numbered versions 1.0,
         | 1.5, ... can be mapped to 13th gen, 14th gen, ... so on. And
         | people usually don't need to understand the generation part
         | this unless they're enthusiasts.
        
       | arange wrote:
       | signup on mobile too big, doesn't fit submit button :\
        
       | guybedo wrote:
       | looks interesting enough that i wanted to give Gemini a try and
       | join the waitlist.
       | 
       | And i thought it would be easy, what a rookie mistake.
       | 
       | Looks like "France" isn't on the list of available regions for Ai
       | Studio ?
       | 
       | Now i'm trying to use Vertex AI, not even sure what's the
       | difference with Ai Studio, but it seems it's available.
       | 
       | So far i've been struggling for 15 minutes through a maze of
       | google cloud pages: console, docs, signups. No end in sight,
       | looks like i won't be able to try it out
        
         | IanCal wrote:
         | It's not available outside of a private preview yet. The page
         | says you can use 1.0 ultra in vertex but it's not available to
         | me in the UK.
         | 
         | I can't get on the waitlist, because the waitlist link
         | redirects to aistudio and I can't use that.
         | 
         | I should stop expecting that I can use literally anything
         | google announces.
        
       | simonw wrote:
       | I'd love to know how much a 1 million token prompt is likely to
       | cost - both in terms of cash and in terms of raw energy usage.
        
         | bearjaws wrote:
         | Cannot emphasize enough, even with the improvements in context
         | handling I imagine 128k tokens costs as much as 16k tokens did
         | previously.
         | 
         | So 1M tokens is going to be astronomical.
        
         | empath-nirvana wrote:
         | When you account for this, you have to consider how much it
         | would cost to have a human perform the same task.
        
       | foliveira wrote:
       | >"Gemini 1.5 Pro (...) matches or surpasses Gemini 1.0 Ultra's
       | state-of-the-art performance across a broad set of benchmarks."
       | 
       | So Pro is better than Ultra, but only if the version numbers are
       | higher?
        
         | denysvitali wrote:
         | Yes, but you'd have to wait for Gemini Pro Max next year to see
         | the real improvements
        
         | renewiltord wrote:
         | Isn't that usually the case with many products? Like the M3 Pro
         | CPU in the new Macs is more powerful than the M1 Max in the old
         | Macs.
         | 
         | The Nano < Pro < Ultra is an in-revision thing. For their LLMs
         | it's a size thing. Then there's newer releases of Nano, Pro,
         | and Ultra. Some Pro might be better than some older Ultra.
         | 
         | A lot of people seem confused about this but it feels so easy
         | to understand that it's confusing to me that anyone could have
         | trouble.
        
           | devindotcom wrote:
           | Apple didn't release the M3 Pro a week after the M1 Max
        
             | renewiltord wrote:
             | Adam Osborne's wife was one of my dad's patients so I'm not
             | unacquainted with the risk of early announcements. But
             | surely they do not prevent comprehension.
        
       | thiago_fm wrote:
       | I like that they are rushing with this and don't care enough to
       | make it Gemini 2 or even really release it, to me it looks like
       | they are concerned to share progress.
       | 
       | Hope they do a good job and once OpenAI releases GPT 5 they are
       | competitive with it with their offerings, it will be better for
       | everyone.
        
       | kaspermarstal wrote:
       | Incredible. RAG will be obsolete in a year or two.
        
         | hackernoteng wrote:
         | It's already obsolete. It doesn't work except for trivial cases
         | which have no real value.
        
         | jeanloolz wrote:
         | Obsolete if you don't take cost in consideration. Having 10
         | millions of token going through each layer of the LLM is going
         | to cost a lot of money each time. At gpt4 rate that could mean
         | 200 dollars for each inference
        
       | scarmig wrote:
       | The technical report: https://storage.googleapis.com/deepmind-
       | media/gemini/gemini_...
        
       | jpeter wrote:
       | OpenAI has no Moat
        
         | jklinger410 wrote:
         | They only have a head start, and the lead is closing
        
         | seydor wrote:
         | hence why it's Open
        
         | rvz wrote:
         | This. He's right you know.
         | 
         | OpenAI is extremely overvalued and Google is closing their lead
         | rapidly.
        
           | fnordpiglet wrote:
           | Is there any meaningful valuation on OpenAI? It's not for
           | sale, there is no market.
           | 
           | Google ... has no ability to commercialize anything. Their
           | only commercial successes are ads and YouTube. Doing
           | deceptive launches and flailing around with Gemini isn't
           | helping their product prospects. I wouldn't take a bet
           | between open ai and anyone, but I also wouldn't take a bet on
           | Google succeeding commercially on anything other than
           | pervasive surveillance and adware.
        
             | rvz wrote:
             | > Is there any meaningful valuation on OpenAI? It's not for
             | sale, there is no market.
             | 
             | Its shares are already for sale on private markets for
             | accredited investors and for a valuation of over $100BN
             | lead by Thrive Capital.
             | 
             | > Google ... has no ability to commercialize anything.
             | 
             | Absolute nonsense.
             | 
             | So Google Cloud, Android (Play Store) are not already
             | commercialized? You well know that they are.
             | 
             | > Doing deceptive launches and flailing around with Gemini
             | isn't helping their product prospects.
             | 
             | Gemini already caught up to (and surpassed) GPT-4V. What is
             | your point?
             | 
             | > I wouldn't take a bet between open ai and anyone, but I
             | also wouldn't take a bet on Google succeeding commercially
             | on anything other than pervasive surveillance and adware.
             | 
             | OpenAI's greatest competitor is Google DeepMind which has
             | the advantage of Google's infrastructure to scale up their
             | models quickly and they have direct access to Google's
             | billions. OpenAI cannot afford to make mistakes or delay
             | anything and a single mistake can cost them hundreds of
             | millions of dollars. The majority of the investment from
             | Microsoft is in Azure credits and not in dollars. [0]
             | 
             | [0] https://www.semafor.com/article/11/18/2023/openai-has-
             | receiv...
        
         | wrsh07 wrote:
         | A reference to the good doc:
         | https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...
         | 
         | While I'm linking semianalysis, though, it's probably worth
         | talking about how everyone except Google is GPU poor:
         | https://www.semianalysis.com/p/google-gemini-eats-the-world-...
         | (paid)
         | 
         | > Whether Google has the stomach to put these models out
         | publicly without neutering their creativity or their existing
         | business model is a different discussion.
         | 
         | Google has a serious GPU (well, TPU) build out, and the fact
         | that they're able to train moe models on it means there aren't
         | any technical barriers preventing them from competing at the
         | highest levels
        
           | Keyframe wrote:
           | they also have internet.zip and all of its repo history as
           | well as usenet and mails etc.. which others don't.
        
         | anonyfox wrote:
         | but GPT-4 is nearly a year old now, I'd wait for the next
         | release of OAI before judgement. Probably rather soonish now I
         | would expect.
        
           | jpeter wrote:
           | You were right i guess :)
        
       | gpjanik wrote:
       | 0 trust to what they put out until I see it live. After the last
       | "launch" video which was fundamentally a marketing edit not
       | showing the real product, I don't trust anything coming out of
       | Google that isn't an instantly testable input form.
        
         | tfsh wrote:
         | The videos shown in these demos have clearly learnt from that
         | as they're using a real live product, filmed on their computers
         | with timers in the bottom showing how long the computations
         | take.
        
         | frays wrote:
         | I completely share the same views as you after their last video
         | - and it appears that they've learnt their lesson this time.
         | 
         | If you watch the videos in the blog post, you can see it's a
         | screen recording on a computer without any editing/stitching of
         | different scenes together.
         | 
         | It's good to be sceptical but as engineers we should all remain
         | open.
        
         | replwoacause wrote:
         | 100%. Google continues to underwhelm. Not buying it until I can
         | try it.
        
         | dingclancy wrote:
         | Essentially, the focus seems to be on leveraging the media buzz
         | around Gemini 1.0 by highlighting the development of version
         | 1.5. While GPT-4's position relative to Gemini 1.5 remains
         | unclear, and the specifics of ChatGPT 4.5 are yet to be
         | disclosed, it's worth noting that no official release has taken
         | place until the functionality is directly accessible in user
         | chats.
         | 
         | Google appears to be making strides in catching up.
         | 
         | When it comes to my personal workflow and accomplishing tasks,
         | I still find ChatGPT to be the most effective tool. My
         | familiarity with its features has made it indispensable. The
         | integration of mentions and tailored GPTs seamlessly enhances
         | my workflow.
         | 
         | While Gemini may match the foundational capabilities of LLMs,
         | it falls short in delivering a product that efficiently aids in
         | task completion.
        
           | vacuumcl wrote:
           | I don't mean this in a bad way, but when I read a comment
           | like yours which includes phrases like "seamlessly enhances
           | my workflow" and "efficiently aids in task completion", I
           | can't help but feel like it's ChatGPT-generated, and if so I
           | think it's a shame, just write like yourself.
           | 
           | But maybe you do, and I am seeing patterns in sand.
        
             | icoder wrote:
             | Niet OP maar als ik als mezelf schrijf, dan denk ik niet
             | dat je me zomaar begrijpt ;)
        
               | vacuumcl wrote:
               | Ik begrijp het prima hoor ;)
        
               | JoeEasy23 wrote:
               | #ikook
        
       | losvedir wrote:
       | If I understand correctly, they're releasing this for Pro but not
       | Ultra, which I think is akin to GPT 3.5 vs 4? Sigh, the naming is
       | confusing...
       | 
       | But my main takeaway is the huge context window! Up to a million,
       | with more than 100k tokens right now? Even just GPT 3.5 level
       | prediction with such a huge context window opens up a lot of
       | interesting capabilities. RAG can be super powerful with that
       | much to work with.
        
         | danpalmer wrote:
         | The announcement suggests that 1.5 Pro is similar to 1.0 Ultra.
        
           | benopal64 wrote:
           | I am reaching a bit, however, I think its a bit of a
           | marketing technique. The Pro 1.5 being compared to the Ultra
           | 1.0 model seems to imply that they will be releasing a Ultra
           | 1.5 model which will presumably have similar characteristics
           | to the new Pro 1.5 model (MOE architecture w/ a huge context
           | window).
        
             | danpalmer wrote:
             | Apparently the technical report implies that Ultra 1.5 is a
             | step-up again, I'm not sure it's just context length, that
             | seems to be orthogonal in everything I've read so far.
        
         | ygouzerh wrote:
         | So Pro and Ultra are from my understanding link to the number
         | of parameters. More parameters means more reasonning
         | capabilities, but more compute needed.
         | 
         | So Pro is like the light and fast version and Ultra the
         | advanced and expensive one.
        
         | cchance wrote:
         | It's sizes
         | 
         | Nano/Pro/Ultra are model SIZES. 1.0/1.5 is generations of the
         | architecture.
        
         | amf12 wrote:
         | Maybe this analogy would help: iPhone 15, iPhone Pro 15, iPhone
         | Pro Max 15 and then iPhone Pro 15.5
        
       | golergka wrote:
       | In one of the demos, it successfully navigates a threejs demo and
       | finds the place to change in response to a request.
       | 
       | How long until it shows similar results on middle-sized and large
       | codebases? And do the job adequately?
        
         | kypro wrote:
         | 1-2 years probably. There will still be a question around who
         | determines what "adequately" is for a while though. Presumably
         | even if an LLM can do something in theory you wouldn't actually
         | want it doing anything without human oversight.
         | 
         | And we should keep in mind that to understand a code change in
         | depth is often just as much work as making the change. When
         | review PRs I don't really know exactly what every change is
         | doing. I certain haven't tested it to be 100% certain I
         | understand fully. I'm just checking the logic looks mostly
         | right and that I don't see anything clearly wrong, and even
         | then I'll often need to ask for clarifications why something
         | was done.
         | 
         | I can't imagine LLMs being used in most large code bases for a
         | while yet. They'd probably need to be 99.9% reliable before we
         | can start trusting them to make changes without verifying every
         | line.
        
         | simon_kun wrote:
         | Today.
        
       | pryelluw wrote:
       | Gemini (or whatever google ai) will be all about ads. I'm not
       | adopting this shit. Their whole business model is ads. Why would
       | I adopt a product from a company that only cares about selling
       | more ads?
        
         | Alifatisk wrote:
         | Google One's business model is not ads?
         | 
         | I mention Google One because you can access Gemini Ultra
         | through it.
        
           | imp0cat wrote:
           | All their services are just a way to get more information
           | about their users so they can serve them ads.
           | 
           | Those Gemini queries will be no exception.
        
             | sodality2 wrote:
             | Not true - Gemini looks to be marketed towards companies,
             | where it's far more profitable to just charge thousands of
             | dollars. Ads wouldn't fund AI usage anyway. GPU's are
             | extremely expensive (even Google's fancy TPU's).
        
               | imp0cat wrote:
               | I find that hard to belive. Ads most probably already
               | funded all the research, development and manufacturing
               | required to produce those TPUs.
               | 
               | But we'll see, maybe Gemini will become profitable
               | eventually.
        
         | snapcaster wrote:
         | Agreed, people continually forget that Google has fundamentally
         | failed at everything besides selling ads despite decades of
         | moonshots and other attempts to shift the business. Very
         | skeptical that any company getting 80% revenue from ads will be
         | able to resist the pressure to advertise
        
       | royletron wrote:
       | Is there a reason this isn't available in the
       | UK/France/Germany/Spain but is in available in Jersey... and
       | Tuvalu?
        
         | onlyrealcuzzo wrote:
         | EU regulations and fines.
        
         | vibrolax wrote:
         | Probably because EU/national governments have regulations with
         | respect to the safety and privacy of the users, and the
         | purveyors must evaluate the performance of their products
         | against the regulatory standards.
        
       | seydor wrote:
       | Onwards to a billion tokens
        
       | fernandotakai wrote:
       | i saw this announcement on twitter and i was excited to check it
       | out, only to see that "we're offering a limited preview of 1.5
       | Pro to developers and enterprise customers via AI Studio and
       | Vertex AI".
       | 
       | please google, only announce things when people can actually use
       | it.
        
       | xyzzy_plugh wrote:
       | I miss when I didn't have to scroll to read a single tweet.
        
         | ComputerGuru wrote:
         | Twitter has that functionality natively now, but I don't know
         | if you have to be a pro user to access. It's the book icon in
         | the upper-right corner of the first tweet in a series. Links to
         | this, but it looks different when I view it in incognito vs
         | logged in:
         | https://twitter.com/JeffDean/thread/1758146022726041615
        
           | xyzzy_plugh wrote:
           | The functionality I'm talking about is tweets not being walls
           | of text that require scrolling to read. I have no idea what
           | you're describing.
        
       | og_kalu wrote:
       | >Finally, we highlight surprising new capabilities of large
       | language models at the frontier; when given a grammar manual for
       | Kalamang, a language with fewer than 200 speakers worldwide, the
       | model learns to translate English to Kalamang at a similar level
       | to a person learning from the same content.
       | 
       | Results - https://imgur.com/a/qXcVNOM
       | 
       | From the technical report
       | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
         | poulpy123 wrote:
         | > at a similar level to a person learning from the same
         | content.
         | 
         | That's an incredibly low bar
        
           | ithkuil wrote:
           | It's incredible how fast goalposts are moving.
           | 
           | The same feat one year ago would have been almost
           | unbelievable.
        
           | KeplerBoy wrote:
           | Since when are we expecting super-human capabilities?
        
             | andsoitis wrote:
             | And in fact it already is super human. Show me a single
             | human who can translate amongst 10+ languages across
             | specialized domains in the blink of an eye.
        
               | empath-nirvana wrote:
               | Chat GPT has been super human in a lot of tasks even
               | since 3.5.
               | 
               | People point out mistakes it makes that no human would
               | make, but that doesn't negate the super-human performance
               | it has at other tasks -- and the _breadth_ of what it can
               | do is far beyond any single person.
        
               | KeplerBoy wrote:
               | Where exactly does it have super-human performance? Above
               | average and expert-level? Sure, I'd agree, but I haven't
               | experienced anything above that.
        
               | newzisforsukas wrote:
               | indeed, or a human who can analyze a hundred page text
               | document in less than a minute and provide answers in
               | less than a second.
               | 
               | the issue remains on accuracy. i think a human in that
               | scenario is still more accurate with their responses, and
               | i do not yet see that being overcome in this multi-year
               | llm battle.
        
             | coffeebeqn wrote:
             | The model does already have superhuman ability by knowing
             | hundreds of languages
        
           | elevatedastalt wrote:
           | :muffled sounds of goalposts being shifted in the distance:
           | 
           | Just a few years ago we used to clap if an NLP model could
           | handle negation reliably or could generate even a paragraph
           | of text in English that was natural sounding.
           | 
           | Now we are at a stage where it is basically producing reams
           | of natural sounding text, performing surprisingly well on
           | reasoning problems and translation of languages with barely
           | any data despite being a markov chain on steroids, and what
           | does it hear? "That's an incredibly low bar".
        
             | glenstein wrote:
             | I'm going to keep beating this dead horse, but if you were
             | a philosophy nerd in the 80s, 90s, 00s etc you may know
             | that debates RAGED over whether computers could ever, even
             | in principle do things that are now being accomplished on a
             | weekly basis.
             | 
             | And as you say, the goalposts keep getting moved. It used
             | to be claimed that computers could never play chess at the
             | highest levels because that required "insight". And
             | whatever a computer could do, it could never to that extra
             | special thing, that could only be described in magical
             | undefined terms.
             | 
             | I just hope there's a moment of reckoning for decades upon
             | decades of arguments, deemed academically respectable, that
             | insisted that days like these would never come.
        
               | elevatedastalt wrote:
               | Honestly. I am ok with having greater and greater goals
               | to accomplish but this sort of dismissive attitude really
               | puts me off.
        
               | empath-nirvana wrote:
               | Forget goalpost shifting, people frequently refuse to
               | admit that it can do things that it obviously does,
               | because they've never used it themselves.
        
               | mewpmewp2 wrote:
               | Listen, you little ...
        
           | zacmps wrote:
           | > The author (the human learner) has some formal experience
           | in linguistics and has studied a variety of languages both
           | formally and informally, though no Austronesian or Papuan
           | languages
           | 
           | From the language benchmark (parentheses mine).
        
           | JyB wrote:
           | Jarring you're not adding more context to your comment.
        
           | newzisforsukas wrote:
           | you are insane if you actually think this.
        
           | lukemelas wrote:
           | Author here (of the Kalamang paper). One of my coauthors was
           | the human baseline and he spent many months reading the
           | grammar book (and is incredibly talented at learning
           | languages). It's really a very high bar.
        
         | seydor wrote:
         | what if we ask it to translate an undeciphered language
        
           | dougmwne wrote:
           | It produces basically random translations. This is covered in
           | the 0-shot case where no translation manual was included in
           | the context. Due to how rare this language is, it's
           | essentially untranslated in the training corpus.
        
           | og_kalu wrote:
           | If you mean to dump random passages of text with no parallel
           | corpora or grammar instructions then it won't do better than
           | random.
           | 
           | That said, I think that if you gave a LLM language text to
           | predict during training, I believe that even if no parallel
           | corpora exists during training, we could have a LLM that
           | could still translate that language to some other language it
           | also trained on.
        
             | seydor wrote:
             | What if we added a bunch of linguistic analysis books or
             | something
        
           | lukemelas wrote:
           | Author of the Kalamang paper here. We've thought about this a
           | good amount (e.g. there are interesting Mesoamerican
           | scripts), but ultimately we decided to work on low-resource
           | languages, as they're much more useful. It's also possible to
           | use them as an evaluation benchmark, which isn't really
           | possible if nobody speaks the language. We'd like to expand
           | the scope beyond Kalamang, though, and maybe at some point
           | we'll investigate an undeciphered language as well.
        
       | uptownfunk wrote:
       | Google is a public company. Anything and everything will be
       | scrutinized very heavily by shareholders. Of course how Zuck
       | operates very different than Sindar.
       | 
       | What are they doing with their free cash is my question. Are they
       | waiting for the LLM bubble to pop to buy some of these companies
       | at a discount?
        
       | ComputerGuru wrote:
       | The context window size - if it really works as advertised - is
       | pretty ground-breaking. It would replace the need to RAG or fine
       | tune for one-off (or few-off) analys{is,es} of input streams
       | cheaper and faster. I wonder how they got past the input token
       | stuffing problems everyone else runs into.
        
         | jcuenod wrote:
         | It won't remove the use of RAG at all. That's like saying,
         | "wow, now that I've upgraded my 128GB HDD to 1TB, I'll never
         | run out of space again."
        
           | madisonmay wrote:
           | It's more like saying "I've upgraded to 128GB of RAM, I'll
           | never use my disk again".
        
           | sebzim4500 wrote:
           | 10 TB for an accurate proportion.
           | 
           | And I think people who buy a laptop with a 1TB SSD generally
           | don't run out of space, at least I don't.
        
         | lumost wrote:
         | They are almost certainly using some form of sparse attention.
         | If you linearize the attention operation, you can scale up to
         | around 1-10M tokens depending on hardware before hitting memory
         | constraints. Linearization works off the assumption that for a
         | subsequence of X tokens out M tokens, where M os much greater
         | than X there are likely only K tokens which are useful for the
         | attention operation.
         | 
         | There are a bunch of techniques to do this, but it's unclear
         | how well any of them scale.
        
           | ein0p wrote:
           | Not "almost", but certainly. Dense attention is quadratic,
           | not even Google would be able to run it at an acceptable
           | speed. Their model is not recurrent - they did not have the
           | time yet (or resources - believe it or not, Google of 2023-24
           | is very compute constrained) to train newer SSM or recurrent
           | based models at practical parameter counts. Then there's the
           | fact that those models are far harder to train due to
           | instabilities, which is one of the reasons why you don't yet
           | see FOSS recurrent/SSM models that are SOTA at their size or
           | tokens/sec. With sparse attention, however, long context
           | recall will be far from perfect, and the longer the context
           | the worse the recall. That's better than no recall at all (as
           | in a fully dense attention model which will simply lop off
           | the preceding parts of the conversation), but not by a hell
           | of a lot.
        
             | kiraaa wrote:
             | maybe they are using ring attention, on top of their 128k
             | model.
        
               | ein0p wrote:
               | More likely some clever take on RAG. There's no way that
               | 1M context is all available at all times. More likely
               | parts of it are retrievable on demand. Hence the
               | retrieval-like use cases you see in the demos. The goal
               | is to find a thing, not to find patterns at a distance
        
               | kiraaa wrote:
               | could be true, we can only speculate.
        
         | popinman322 wrote:
         | vs RAG: RAG is good for searching across >billions of tokens
         | and providing up-to-date information to a static model. Even
         | with huge context lengths it's a good idea to submit high
         | quality inputs to prevent the model from going off on tangents,
         | getting stuck on contradictory information, etc..
         | 
         | vs fine tuning: smaller, fine-tuned models can perform better
         | than huge models in a decent number of tasks. Not strictly
         | fine-tuning, but for throughput limited tasks it'll likely
         | still be better to prune a 70B model down to 2B, keeping only
         | the components you need for accurate inference.
         | 
         | I can see this model being good for taking huge inputs and
         | compressing them down for smaller models to use.
        
         | nbardy wrote:
         | RAG will stick around, at some point you want to retrieve
         | grounded information samples to inject in the context window.
         | RAG+long context just gives you more room for grounded context.
         | 
         | Think building huge relevant context on topics before
         | answering.
        
         | torginus wrote:
         | Tbh, I haven't read the paper, but I think it's pretty self-
         | evident that large contexts aren't cheap - the AI has to comb
         | through every word of the context for each successive generated
         | token at least once, so it's going to be at least linear.
        
         | Havoc wrote:
         | Saw testing earlier that suggested the context does indeed work
         | right
        
       | Alifatisk wrote:
       | I remember one of the biggest advantages with Google Bard was the
       | heavily limited context window. I am glad Google is now actually
       | delivering some exciting news now with Gemini and this gigantic
       | token size.
       | 
       | Sure it's a bummer that they slap the "Join the waiting list",
       | but it's still interesting to read about their progress and
       | competing with ClosedAi (OpenAi).
       | 
       | One last thing I hope they fix is the heavily morally and
       | ethically guardrail, sometimes I can barely ask proper questions
       | without it triggering Gemini to educate me about what's right and
       | wrong. And when I try the same prompt with ChatGPT and Bing ai,
       | they happily answer.
        
         | elevatedastalt wrote:
         | "biggest advantages with Google Bard"
         | 
         | Did you mean disadvantages?
        
           | Alifatisk wrote:
           | Yes, thanks.
        
       | CrypticShift wrote:
       | Most data accumulates gradually (e.g., one email at a time, one
       | line of text at a time across various documents). Is this huge
       | 10M scale of context window relevant to a gradual, yet constant,
       | influx of data (like a prompt over a whole google workspace) ?
        
       | Imnimo wrote:
       | This is the first time I've been legitimately impressed by one of
       | Google's LLMs (with the obvious caveat that I'm taking the
       | results reported in their tech report at face value).
        
         | replwoacause wrote:
         | It's just marketing at this point, nothing to be impressed by.
         | It's a mistake to take at face value.
        
       | sremani wrote:
       | I have Gemini Advanced, do I get access to this? Google is giving
       | Microsoft run for money for branding confusion.
        
         | Alifatisk wrote:
         | Not yet, Gemini advanced is using Gemini Ultra, not Gemini pro.
        
           | Ecstatify wrote:
           | Gemini advanced is terrible.
           | 
           | I asked it to rephrase "Are the original stated objectives
           | still relevant?"
           | 
           | It's starts going on about Ukraine and Russia.
           | 
           | https://g.co/gemini/share/ddb3887f79e2
        
             | Alifatisk wrote:
             | I think it took the whole context of the converstion into
             | consideration, you should create a new converstaion instead
             | and see if it responds differently.
             | 
             | Or you could be more specific, like "Rephrase the following
             | sentence: 'Are the original stated objectives still
             | relevant?' in a formal way, respond with one option only."
        
               | Ecstatify wrote:
               | It was a new conversation. I've never mentioned Russia or
               | Ukraine in any conversation ever.
        
               | Alifatisk wrote:
               | That's so weird, yet interesting. What happens if you
               | open a new convo again and enter the same prompt?
        
               | Ecstatify wrote:
               | Now it gives a normal answer. I rated the response as
               | 'Bad Response' so maybe that had an impact.
        
           | piva00 wrote:
           | I thought I wouldn't but I'm getting really, really confused
           | with the naming and branding of what Gemini is a model and
           | which is a product. Advanced, Pro, Ultra, seemingly Pro is
           | getting better than Ultra? And Advanced is the product using
           | the Ultra underlying model?
           | 
           | Ugh, my brain.
        
           | tapoxi wrote:
           | I've read this sentence three times, wow what horrible
           | branding.
        
       | vessenes wrote:
       | The white paper is worth a read. The things that stand out to me
       | are:
       | 
       | 1. They don't talk about how they get to 10M token context
       | 
       | 2. They don't talk about how they get to 10M token context
       | 
       | 3. The 10M context ability wipes out most RAG stack complexity
       | immediately. (I imagine creating caching abilities is going to be
       | important for a lot of long token chatting features now, though).
       | This is going to make things much, much simpler for a lot of use
       | cases.
       | 
       | 4. They are pretty clear that 1.5 Pro is better than GPT-4 in
       | general, and therefore we have a new LLM-as-judge leader, which
       | is pretty interesting.
       | 
       | 5. It seems like 1.5 Ultra is going to be highly capable. 1.5 Pro
       | is already very very capable. They are running up against very
       | high scores on many tests, and took a minute to call out some
       | tests where they scored badly as mostly returning false
       | negatives.
       | 
       | Upshot, 1.5 Pro looks like it _should_ set the bar for a bunch of
       | workflow tasks, if we can ever get our hands on it. I 've found
       | 1.0 Ultra to be very capable, if a bit slow. Open models
       | downstream should see a significant uptick in quality using it,
       | which is great.
       | 
       | Time to dust out my coding test again, I think, which is: "here
       | is a tarball of a repository. Write a new module that does X".
       | 
       | I really want to know how they're getting to 10M context, though.
       | There are some intriguing clues in their results that this isn't
       | just a single ultra-long vector; for instance, their audio and
       | video "needle" tests, which just include inserting an image that
       | says "the magic word is: xxx", or an audio clip that says the
       | same thing, have perfect recall across up to 10M tokens. The text
       | insertion occasionally fails. I'd speculate that this means there
       | is some sort of compression going on; a full video frame with
       | text on it is going to use a lot more tokens than the text
       | needle.
        
         | CharlieDigital wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | Remains to be seen.
         | 
         | Large contexts are not always better. For starters, it takes
         | longer to process. But secondly, even with RAG and the large
         | context of GPT4 Turbo, providing it a more relevant and
         | accurate context always yields better output.
         | 
         | What you get with RAG is faster response times and more
         | accurate answers by pre-filtering out the noise.
        
           | behnamoh wrote:
           | Don't forget that Gemini also has access to the internet, so
           | a lot of RAGging becomes pointless anyway.
        
             | beppo wrote:
             | Internet search _is_ a form of RAG, though. 10M tokens is
             | very impressive, but you 're not fitting a database, let
             | alone the entire internet into a prompt anytime soon.
        
               | behnamoh wrote:
               | You shouldn't fit an entire database in the context
               | anyway.
               | 
               | btw, 10M tokens is 78 times more context window than the
               | newest GPT-4-turbo (128K). In a way, you don't need 78
               | GPT-4 API calls, only one batch call to Gemini 1.5.
        
               | rvnx wrote:
               | Well it's nice, just sad nobody can use it
        
               | cchance wrote:
               | I don't get this why is it people think that you need to
               | put an entire database in the short-term memory of the AI
               | to be useful? When you work with a DB are you memorizing
               | the entire f*cking database, no, you know the summaries
               | of it and how to access and use it.
               | 
               | People also seem to forget that the average is 1b words
               | that are read by people in their entire LIFETIME, and at
               | 10m, with nearly 100% recall thats pretty damn amazing,
               | i'm pretty sure I don't have perfect recall of 10m words
               | myself lol
        
               | Qwero wrote:
               | It increases the use cases.
               | 
               | It can also be a good alternative for fine-tuning.
               | 
               | And the use case of a code base is a good example: if the
               | ai understands the whole context, it can do basically
               | everything.
               | 
               | Let me pay 5EUR for a android app rewritten into iOS.
        
               | choilive wrote:
               | You certainly don't need that much context for it to be
               | useful, but it definitely opens up a LOT more
               | possibilities without the compromises of implementing
               | some type of RAG. In addition, don't we want our AI to
               | have superhuman capabilities? The ability to work on 10M+
               | tokens of context at a time could enable superhuman
               | performance in many tasks. Why stop at 10M tokens?
               | Imagine if AI could work on 1B tokens of context like you
               | said?
        
             | CharlieDigital wrote:
             | This may be useful in a generalized use case, but a problem
             | is that many of those results again will add noise.
             | 
             | For any use case where you want contextual results, you
             | need to be able to either filter the search scope or use
             | RAG to pre-define the acceptable corpus.
        
               | panarky wrote:
               | _> you need to be able to either filter the search scope
               | or use RAG ..._
               | 
               | Unless you can get nearly perfect recall with millions of
               | tokens, which is the claim made here.
        
           | killerstorm wrote:
           | Hopefully we can get a better RAG out of it. Currently people
           | do incredibly primitive stuff like chunking text into chunks
           | of a fixed size and adding them to vector DB.
           | 
           | An actually useful RAG would be to convert text to Q&A and
           | use Q's embeddings as an index. Large context can make use of
           | in-context learning to make better Q&A.
        
             | mediaman wrote:
             | A lot of people in RAG already do this. I do this with my
             | product: we process each page and create lists of potential
             | questions that the page would answer, and then embed that.
             | 
             | We also embed the actual text, though, because I found that
             | only doing the questions resulted in inferior performance.
        
               | CharlieDigital wrote:
               | So in this case, what your workflow might look like is:
               | 1. Get text from page/section/chunk         2. Generate
               | possible questions related to the page/section/chunk
               | 3. Generate an embedding using { each possible question +
               | page/section/chunk }         4. Incoming question targets
               | the embedding and matches against { question + source }
               | 
               | Is this roughly it? How many questions do you generate?
               | Do you save a separate embedding for each question? Or
               | just stuff all of the questions back with the
               | page/section/chunk?
        
               | mediaman wrote:
               | Right now I just throw the different questions together
               | in a single embedding for a given chunk, with the idea
               | that there's enough dimensionality to capture them all.
               | But I haven't tested embedding each question, matching on
               | that vector, and then returning the corresponding chunk.
               | That seems like it'd be worth testing out.
        
         | cs702 wrote:
         | > 1. They don't talk about how they get to 10M token context
         | 
         | > 2. They don't talk about how they get to 10M token context
         | 
         | Yes. I wonder if they're using a "linear RNN" type of model
         | like Linear Attention, Mamba, RWKV, etc.
         | 
         | Like Transformers with standard attention, these models train
         | efficiently in parallel, but their compute is O(N) instead of
         | O(N2), so _in theory_ they can be extended to much longer
         | sequences much efficiently. They have shown a lot of promise
         | recently at smaller model sizes.
         | 
         | Does anyone here have any insight or knowledge about the
         | internals of Gemini 1.5?
        
           | candiodari wrote:
           | They do give a hint:
           | 
           | "This includes making Gemini 1.5 more efficient to train and
           | serve, with a new Mixture-of-Experts (MoE) architecture."
           | 
           | One thing you could do with MoE is giving each expert
           | different subsets of the input tokens. And that would
           | definitely do what they claim here: it would allow search.
           | You want to find where someone said "the password is X" in a
           | 50 hour audio file, this would be perfect.
           | 
           | If your question is "what is the first AND last thing person
           | X said" ... it's going to suck badly. Anything that requires
           | taking 2 things into account that aren't right next to
           | eachother is just not going to work.
        
             | declaredapple wrote:
             | > One thing you could do with MoE is giving each expert
             | different subsets of the input tokens.
             | 
             | Don't MoE's route tokens to experts after the attention
             | step? That wouldn't solve the n^2 issue the attention step
             | has.
             | 
             | If you split the tokens _before_ the attention step, that
             | would mean those tokens would have no relationship to each
             | other - it would be like inferring two prompts in parallel.
             | That would defeat the point of a 10M context
        
             | deskamess wrote:
             | Is MOE then basically divide and conquer? I have no deep
             | knowledge of this so I assumed MOE was where each expert
             | analyzed the problem in a different way and then there was
             | some map-reduce like operation on the generated expert
             | results. Kinda like random forest but for inference.
        
               | declaredapple wrote:
               | > I assumed MOE was where each expert analyzed the
               | problem in a different way
               | 
               | Uh sorta but not like parent described at all. You have
               | multiple "experts" and you have a routing layer(s) that
               | decide which expert to send it to. Usually every token is
               | sent to at least 2. You can't just send half the tokens
               | to one expert and half to another.
               | 
               | Also the "experts" are not "domain experts" - there is
               | not a "programming expert" and an "essay expert".
        
             | spott wrote:
             | > Anything that requires taking 2 things into account that
             | aren't right next to eachother is just not going to work.
             | 
             | They kinda address that in the technical report[0]. On page
             | 12 they show results from a "multiple needle in a haystack"
             | evaluation.
             | 
             | https://storage.googleapis.com/deepmind-
             | media/gemini/gemini_...
        
           | sebzim4500 wrote:
           | The fact they are getting perfect recall with millions of
           | tokens rules out any of the existing linear attention
           | methods.
        
             | cs702 wrote:
             | I wouldn't be so sure perfect recall rules out linear RNNs,
             | because I haven't seen any conclusive data on their ability
             | to recall. Have you?
        
         | usaar333 wrote:
         | > They are pretty clear that 1.5 Pro is better than GPT-4 in
         | general, and therefore we have a new LLM-as-judge leader, which
         | is pretty interesting.
         | 
         | They try to push that, but it's not the most convincing. Look
         | at Table 8 for text evaluations (math, etc.) - they don't even
         | attempt a comparison with GPT-4.
         | 
         | GPT-4 is higher than any Gemini model on both MMLU and GSM8K.
         | Gemini Pro seems slightly better than GPT-4 original in Human
         | Eval (67->71). Gemini Pro does crush naive GPT-4 on math
         | (though not with code interpreter and this is the original
         | model).
         | 
         | All in 1.5 Pro seems maybe a bit better than 1.0 Ultra. Given
         | that in the wild people seem to find GPT-4 better for say
         | coding than Gemini Ultra, my current update is Pro 1.5 is about
         | equal to GPT-4.
         | 
         | But we'll see once released.
        
           | cchance wrote:
           | I mean i don't see GPT4 watching a 44 minute movie and being
           | able to exactly pinpoint a guy taking a paper out of his
           | pocket..
        
           | panarky wrote:
           | _> people seem to find GPT-4 better for say coding than
           | Gemini Ultra_
           | 
           | For my use cases, Gemini Ultra performs significantly better
           | than GPT-4.
           | 
           | My prompts are long and complex, with a paragraph or two
           | about the general objective followed by 15 to 20 numbered
           | requirements. Often I'll include existing functions the new
           | code needs to work with, or functions that must be refactored
           | to handle the new requirements.
           | 
           | I took 20 prompts that I'd run with GPT-4 and fed them to
           | Gemini Ultra. Gemini gave a clearly better result in 16 out
           | of 20 cases.
           | 
           | Where GPT-4 might miss one or two requirements, Gemini
           | usually got them all. Where GPT-4 might require multiple chat
           | turns to point out its errors and omissions and tell it to
           | fix them, Gemini often returned the result I wanted in one
           | shot. Where GPT-4 hallucinated a method that doesn't exist,
           | or had been deprecated years ago, Gemini used correct
           | methods. Where GPT-4 called methods of third-party packages
           | it assumed were installed, Gemini either used native code or
           | explicitly called out the dependency.
           | 
           | For the 4 out of 20 prompts where Gemini did worse, one was a
           | weird rejection where I'd included an image in the prompt and
           | Gemini refused to work with it because it had unrecognizable
           | human forms in the distance. Another was a simple bash script
           | to split a text file, and it came up with a technically
           | correct but complex one-liner, while GPT-4 just used split
           | with simple options to get the same result.
           | 
           | For now I subscribe to both. But I'm using Gemini for almost
           | all coding work, only checking in with GPT-4 when Gemini
           | stumbles, which isn't often. If I continue to get solid
           | results I'll drop the GPT-4 subscription.
        
             | sho_hn wrote:
             | I have a very similar prompting style to yours and share
             | this experience.
             | 
             | I am an experienced programmer and usually have a fairly
             | exact idea of what I want, so I write detailed requirements
             | and use the models more as typing accelerators.
             | 
             | GPT-4 is useful in this regard, but I also tried about a
             | dozen older prompts on Gemini Advanced/Ultra recently and
             | in every case preferred the Ultra output. The code was
             | usually more complete and prod-ready, with higher
             | sophistication in its construction and somewhat higher
             | density. It was just closer to what I would have hand-
             | written.
             | 
             | It's increasingly clear though LLM use has a couple of
             | different major modes among end-user behavior. Knowledge
             | base vs. reasoning, exploratory vs. completion, instruction
             | following vs. getting suggestions, etc.
             | 
             | For programming I want an obedient instruction-following
             | completer with great reasoning. Gemini Ultra seems to do
             | this better than GPT-4 for me.
        
               | sjwhevvvvvsj wrote:
               | I'm going to have to try Gemini for code again. It just
               | occurred to me as a Xoogler that if they used Google's
               | code base as the training data it's going to be
               | unbeatable. Now did they do that? No idea, but quality
               | wins over quantity, even with LLM.
        
               | barrkel wrote:
               | There is no way NTK data is in the training set, and
               | google3 is NTK.
        
               | cpeterso wrote:
               | What is "NTK"?
        
               | mjamaloney wrote:
               | "Need To Know" I.e. data that isn't open within the
               | company.
        
               | sjwhevvvvvsj wrote:
               | I dunno, leadership is desperate and they can de-NTK if
               | and when they feel like it.
        
               | lyu07282 wrote:
               | It constantly hallucinates APIs for me, I really wonder
               | why people's perceptions are so radically different. For
               | me it's basically unusable for coding. Perhaps I'm
               | getting a cheaper model because I live in a poorer
               | country.
        
               | sho_hn wrote:
               | Are you using Gemini Advanced? (The paid tier.) The free
               | one is indeed very bad.
        
               | oceanplexian wrote:
               | I asked Gemini Advanced, the paid one, to "Write a script
               | to delete some files" and it told me that it couldn't do
               | that because deleting files was unethical. At that point
               | I cancelled my subscription since even GPT-4 with all its
               | problems isn't nearly as broken as Gemini.
        
               | panarky wrote:
               | If you share your prompt I'm sure people here can help
               | you.
               | 
               | Here's a prompt I used and got a a script that not only
               | accomplishes the objective, but even has an option to
               | show what files will be deleted and asks for confirmation
               | before deleting them.
               | 
               |  _Write a bash script to delete all files with the
               | extension .log in the current directory and all
               | subdirectories of the current directory._
        
               | belter wrote:
               | Spent a few hours comparing Gemini Advanced with GPT-4.
               | 
               | Gemini Advanced is nowhere even close to GPT-4, either
               | for text generation, code generation or logical
               | reasoning.
               | 
               | Gemini Advanced is constantly asking for directions "What
               | are your thoughts on this approach?" even to create a
               | short task list of 10 items. Even when being told several
               | times to provide the full list, and not stop at every
               | three or four items and ask for directions. Is constantly
               | giving moral lessons or finishing the results with
               | annoying marketing style comments of the type "Let's make
               | this an awesome product!"
               | 
               | Code is more generic, solutions are less sophisticated.
               | On a discussion of Options Trading strategies Gemini
               | Advanced got core risk management strategies wrong and
               | apologized when errors were made clear to the model.
               | GPT-4 provided answers with no errors, and even went into
               | the subtleties of some exotic risk scenarios with no
               | mistakes.
               | 
               | Maybe 1.5 will be it, or maybe Google realized this quite
               | quickly and are trying the increased token size as a Hail
               | Mary to catch up. Why release so soon?
               | 
               | Quite curious to try the same prompts on 1.5.
        
             | Dayshine wrote:
             | Is there any chance you could share an example of the kind
             | of prompt you're writing?
             | 
             | I'm always reluctant to write long prompts because I often
             | find GPT4 just doesn't get it, and then I've wasted ten
             | minutes writing a prompt
        
             | qingcharles wrote:
             | I've found Gemini generally equal with the .Net and HTML
             | coding I've been doing.
             | 
             | I've never had Gemini give me a better result than GPT,
             | though, so it does not surpass it for my needs.
             | 
             | The UI is more responsive, though, which is worth
             | something.
        
             | TaylorAlexander wrote:
             | How do you interact with Gemini for coding work? I am
             | trying to paste my code in the web interface and when I hit
             | submit, the interface says "something went wrong" and the
             | code does not appear in the chat window. I signed up for
             | Gemini Advanced and that didn't help. Do you use AI Studio?
             | I am just looking in to that now.
        
             | koreth1 wrote:
             | > My prompts are long and complex, with a paragraph or two
             | about the general objective followed by 15 to 20 numbered
             | requirements. Often I'll include existing functions the new
             | code needs to work with, or functions that must be
             | refactored to handle the new requirements.
             | 
             | I guess this is a tough request if you're working on a
             | proprietary code base, but I would _love_ to see some
             | concrete examples of the prompts and the code they produce.
             | 
             | I keep trying this kind of prompting with various LLM tools
             | including GPT-4 (haven't tried Gemini Ultra yet, I admit)
             | and it nearly always takes me longer to explain the
             | detailed requirements and clean up the generated code than
             | it would have taken me to write the code directly.
             | 
             | But plenty of people seem to have an experience more like
             | yours, so I really wonder whether (a) we're just asking it
             | to write very different kinds of code, or (b) I'm bad at
             | writing LLM-friendly requirements.
        
               | vineyardmike wrote:
               | Not OP but here is a verbatim prompt I put into these
               | LLMs. I'm learning to make flutter apps, and I like to
               | try make various UIs so I can learn how to compose some
               | things. I agree that Gemini Ultra (aka the paid
               | "advanced" mode) is def better than ChatGPT-4 for this
               | prompt. Mine is a bit more terse than OP's huge prompt
               | with numbered requirements, but I still got a super valid
               | and meaningful response from Gemini, while GPT4 told me
               | it was a tricky problem, and gave me some generic code
               | snippets, that explicitly don't solve the problem asked.
               | 
               | > I'm building a note-taking app in flutter. I want to
               | create a way to link between notes (like a web hyperlink)
               | that opens a different note when a user clicks on it.
               | They should be able to click on the link while editing
               | the note, without having to switch modalities (eg. no
               | edit-save-view flow nor a preview page). How can I
               | accomplish this?
               | 
               | I also included a follow-up prompt after getting the
               | first answer, which again for Gemini was super
               | meaningful, and already included valid code to start
               | with. Gemini also showed me many more projects and
               | examples from the broader internet.
               | 
               | > Can you write a complete Widget that can implement this
               | functionality? Please hard-code the note text below:
               | <redacted from HN since its long>
        
               | koreth1 wrote:
               | This is useful, thanks. Since you're using this for
               | learning, would it be fair to characterize this as asking
               | the LLM to write code you don't already know how to write
               | on your own?
               | 
               | I've definitely had success using LLMs as a learning
               | tool. They hallucinate, but most often the output will at
               | least point me in a useful direction.
               | 
               | But my day-to-day work usually involves non-exploratory
               | coding where I already know exactly how to do what I
               | need. Those are the tasks where I've struggled to find
               | ways to make LLMs save me any time or effort.
        
               | vineyardmike wrote:
               | > would it be fair to characterize this as asking the LLM
               | to write code you don't already know how to write on your
               | own?
               | 
               | Yea absolutely. I also use it to just write code I
               | understand but am too lazy to write, but it's definitely
               | effective at "show me how this works" type learning too.
               | 
               | > Those are the tasks where I've struggled to find ways
               | to make LLMs save me any time or effort
               | 
               | Github CoPilot has an IDE integration where it can output
               | directly into your editor. This is great for "// TODO:
               | Unit Test for add(x, y) method when x < 0" and it'll dump
               | out the full test for you.
               | 
               | Similarly useful for things like "write me a method that
               | loops through a sorted list, and finds anything with
               | <condition> and applies a transformation and saves it in
               | a Map". Basically all those random helper methods and be
               | written for you.
        
               | koreth1 wrote:
               | That last one is an interesting example. If I needed to
               | do that, I would write something like this (in Kotlin, my
               | daily-driver language):                   fun foo(list:
               | List<Bar>) =             list.filter { condition(it)
               | }.associateWith { transform(it) }
               | 
               | which would take me less time to write than the prompt
               | would.
               | 
               | However, if I didn't know Kotlin very well, I might have
               | had to go look in the docs to find the associateWith
               | function (or worse, I might not have even thought to look
               | for it) at which point the prompt would have saved me
               | time and taught me that the function exists.
        
           | spott wrote:
           | > Gemini Pro seems slightly better than GPT-4 original in
           | Human Eval (67->71).
           | 
           | Though they talk a bunch about how hard it was to filter out
           | Human Eval, so this probably doesn't matter much.
        
         | swalsh wrote:
         | "The 10M context ability wipes out most RAG stack complexity
         | immediately."
         | 
         | I'm skeptical, my past experience is just becaues the context
         | has room to stuff whatever you want in it, the more you stuff
         | in the context the less accurate your results are. There seems
         | to be this balance of providing enough that you'll get high
         | quality answers, but not too much that the model is
         | overwhelmed.
         | 
         | I think a large part of developing better models is not just a
         | better architectures that support larger and larger context
         | sizes, but also capable models that can properly leverage that
         | context. That's the test for me.
        
           | HereBePandas wrote:
           | They explicitly address this in page 11 of the report.
           | Basically perfect recall for up to 1M tokens; way better than
           | GPT-4.
        
             | westoncb wrote:
             | I don't think recall really addresses it sufficiently: the
             | main issue I see is answers getting "muddy". Like it's
             | getting pulled in too many directions and averaging.
        
               | a_wild_dandan wrote:
               | I'd urge caution in extending generalizations about
               | "muddiness" to a new context architecture. Let's use the
               | thing first.
        
               | westoncb wrote:
               | I'm not saying it applies to the new architecture, I'm
               | saying that's a big issue I've observed in existing
               | models and that so far we have no info on whether it's
               | solved in the new one (i.e. accurate recall doesn't imply
               | much in that regard).
        
               | westoncb wrote:
               | Would be awesome if it is solved but seems like a much
               | deeper problem tbh.
        
               | a_wild_dandan wrote:
               | Ah, apologies for the misunderstanding. What tests would
               | you suggest to evaluate "muddiness"?
               | 
               | What comes to my mind: run the usual gamut of tests, but
               | with the excess context window saturated with
               | irrelevant(?) data. Measure test answer
               | accuracy/verbosity as a function of context saturation
               | percentage. If there's little correlation between these
               | two variables (e.g. 9% saturation is just as
               | accurate/succinct as 99% saturation), then "muddiness"
               | isn't an issue.
        
               | danielmarkbruce wrote:
               | Manual testing on complex documents. A big legal contract
               | for example. An issue can be referred to in 7 different
               | places in a 100 page document. Does it give a coherent
               | answer?
               | 
               | A handful of examples show whether it can do it. For
               | example, GPT-4 turbo is downright awful at something like
               | that.
        
               | somenameforme wrote:
               | You need to use relevant data. The question isn't random
               | sorting/pruning, but being able to apply large numbers of
               | related hints/references/definitions in a meaningful way.
               | To me this would be the entire point of a large context
               | window. For entirely different topics you can always just
               | start a new instance.
        
               | smeagull wrote:
               | I believe that's a limitation of using vectors of high
               | dimensions. It'll be muddy.
        
               | Aeolun wrote:
               | Not unlike trying to keep the whole contents of the
               | document in your own mind :)
        
               | sirsinsalot wrote:
               | It's amazing we are in 2024 discussing the degree a
               | machine can reason over millions of tokens of context.
               | The degree, not the possibility.
        
               | razodactyl wrote:
               | Haha. This was my thinking this morning. Like: "Oh
               | cool... a talking computer.... but can it read a 2000
               | page book, give me the summary and find a sentence out
               | of... it can? Oh... well it's lame anyway."
               | 
               | The Sora release is even more mind blowing - not the
               | video generation in my mind but the idea that it can
               | infer properties of reality that it has to learn and
               | constrain in its weights to properly generate realistic
               | video. A side effect of its ability is literally a small
               | universe of understanding.
               | 
               | I was thinking that I want to play with audio to audio
               | LLMs. Not text to speech and reverse but literally sound
               | in sound out. It clears away the problem of document
               | layout etc. and leaves room for experimentation on the
               | properties of a cognitive being.
        
               | caesil wrote:
               | Unfortunately Google's track record with language models
               | is one of overpromising and underdelivering.
        
               | chaxor wrote:
               | This is only specifically for web interface LLMs in the
               | past few years that it's been lack luster. However, this
               | statements is not correct for their overall history. W2V
               | based lang models and BERT/Transformer models in the
               | early days (*publicly available, but not in web
               | interface) were _far_ ahead of the curve, as they were
               | the ones that produced these innovations. Effectively,
               | Deepmmind /Google are academics (where the _real_
               | innovations are made, but they do struggle to produce
               | corporate products (where openai shines).
        
               | mlsu wrote:
               | I am skeptical of benchmarks in general, to be honest. It
               | seems to be extremely difficult to come up with
               | benchmarks for these things (it may be true of
               | intelligence as a quality...). It's almost an anti-signal
               | to proclaim good results on benchmarks. The best
               | barometer of model quality has been vibes, in places like
               | /r/localllama where cracked posters are actively testing
               | the newest models out.
               | 
               | Based on Google's track record in the area of text
               | chatbots, I am extremely skeptical of their claims about
               | coherency across a 1M+ context window.
               | 
               | Of course none of this even matters anyway because the
               | weights are closed the architecture is closed nobody has
               | access to the model. I'll believe it when I see it.
        
               | leegao wrote:
               | Their in-context long-sequence understanding "benchmark"
               | is pretty interesting.
               | 
               | There's a language called Kalamang with only 200 native
               | speakers left. There's a set of grammar books for this
               | language that adds up to ~250K tokens. [1]
               | 
               | They set up a test of in-context learning capabilities at
               | long context - they asked 3 long-context models (GPT 4
               | Turbo, Claude 2.1, Gemini 1.5) to perform various
               | Kalamang -> English and English -> Kalamang translation
               | tasks. These are done either 0-shot (no prior training
               | data for kgv in the models), half-book (half of the kgv
               | grammar/wordlists - 125k tokens - are fed into the model
               | as part of the prompt), and full-book (the whole 250k
               | tokens are fed into the model). Finally, they had human
               | raters check these translations.
               | 
               | This is a really neat setup, it tests for various things
               | (e.g. did the model really "learn" anything from these
               | massive grammar books) beyond just synthetic memorize-
               | this-phrase-and-regurgitate-it-later tests.
               | 
               | It'd be great to make this and other reasoning-at-long-
               | ctx benchmarks a standard affair for evaluating context
               | extension. I can't tell which of the many context-
               | extension methods (PI, E2 LLM, PoSE, ReRoPE, SelfExtend,
               | ABF, NTK-Aware ABF, NTK-by-parts, Giraffe, YaRN, Entropy
               | ABF, Dynamic YaRN, Dynamic NTK ABF, CoCA, Alibi, FIRE, T5
               | Rel-Pos, NoPE, etc etc) is really SoTA since they all use
               | different benchmarks, meaningless benchmarks, or
               | drastically different methodologies that there's no fair
               | comparison.
               | 
               | [1] from https://storage.googleapis.com/deepmind-
               | media/gemini/gemini_...
               | 
               | The available resources for Kalamang are: field
               | linguistics documentation10 comprising a ~500 page
               | reference grammar, a ~2000-entry bilingual wordlist, and
               | a set of ~400 additional parallel sentences. In total the
               | available resources for Kalamang add up to around ~250k
               | tokens.
        
               | andy_ppp wrote:
               | Did you think the extraction of information from a the
               | Buster Keaton film was muddy? I thought it was incredibly
               | impressive to be this precise.
        
             | tcdent wrote:
             | Page 8 of the technical paper [1] is especially
             | informative.
             | 
             | The first chart (Cumulative Average NLL for Long Documents)
             | shows a deviation from the trend and an increase in
             | accuracy when working with >=1M tokens. The 1.0 graph is
             | overlaid and supports the experience of 'muddiness'.
             | 
             | [1] https://storage.googleapis.com/deepmind-
             | media/gemini/gemini_...
        
           | swyx wrote:
           | also costs are always based on context token, you dont want
           | to put in 10m of context for every request (its just nice to
           | have that option when you want to do big things that dont
           | scale)
        
             | 1024core wrote:
             | How much would a lawyer charge to review your 10M-token
             | legal document?
        
               | hereonout2 wrote:
               | 10M tokens is something like 14 copies of war and peace,
               | or maybe the entire harry potter series seven times over.
               | That'd be some legal document!
        
               | xp84 wrote:
               | Hmm I don't know but I feel like the U.S. Congress has
               | bills that would push that limit.
        
           | chuckcode wrote:
           | Would like to see the latency and cost of parsing entire 10M
           | context before throwing out the RAG stack which is relatively
           | cheap and fast.
        
           | tkellogg wrote:
           | costs rise on a per-token basis. So you _CAN_ use 10M tokens,
           | but it 's probably not usually a good idea. A database lookup
           | is still better than a few billion math operations.
        
             | sjwhevvvvvsj wrote:
             | I think the unspoken goal is to just lay off your employees
             | and dump every doc and email they've ever written as one
             | big context.
             | 
             | Now that Google has tasted the previously forbidden fruit
             | of layoffs themselves, I think their primary goal in ML is
             | now headcount reduction.
        
               | goatlover wrote:
               | Somehow I just don't see the execs or managers being able
               | to make this work well for them without help. Plus,
               | documents still need to be generated. Are they going to
               | be spending all day prompting LLMs?
        
           | theolivenbaum wrote:
           | Also unless they significantly change their pricing model,
           | we're talking about 0.5$ per API call at current prices
        
           | aik wrote:
           | Have to consider cost for all of this. Big value of RAG
           | already even given the size of GPT-4'a largest context size
           | is it decreases cost very significantly.
        
           | patja wrote:
           | I think there are also a lot of people who are only
           | interested in RAG if they can self-host and keep their
           | documents private.
        
             | jimmySixDOF wrote:
             | Yes and the ability to have direct attribution matters so
             | you know exactly where your responses come from. And costs
             | as others point out, but RAG is not gone in fact it just
             | got easier and a lot more powerful.
        
           | koliber wrote:
           | LLMs are able to utilize "all the worlds" knowledge during
           | training and give seemingly magical answers. While providing
           | context in the query is different than training models, is it
           | possible that more context will give more materials to the
           | LLM and it will be able to pick out the relevant bits on its
           | own?
           | 
           | What if it was possible, with each query, to fine tune the
           | model on the provided context, and then use that JIT fine-
           | tuned model to answer the query?
        
         | freedomben wrote:
         | Is 10M token context correct? The blog post I see 1M but I'm
         | not sure if these are different things
         | 
         | Edit: Ah, I see, it's 1M reliably in production, up to 10M in
         | research:
         | 
         | > _Through a series of machine learning innovations, we've
         | increased 1.5 Pro's context window capacity far beyond the
         | original 32,000 tokens for Gemini 1.0. We can now run up to 1
         | million tokens in production._
         | 
         | > _This means 1.5 Pro can process vast amounts of information
         | in one go -- including 1 hour of video, 11 hours of audio,
         | codebases with over 30,000 lines of code or over 700,000 words.
         | In our research, we've also successfully tested up to 10
         | million tokens._
        
           | huytersd wrote:
           | I know how I'm going to evaluate this model. Upload my
           | codebase and ask it to "find all the bugs".
        
           | p1esk wrote:
           | How could one hour of video fit in 1M tokens? 1 hour at 30fps
           | is 3600*30=100k frames. Each frame is converted in 256
           | tokens. So either they are not processing each frame, or each
           | frame is converted into fewer tokens.
        
             | KTibow wrote:
             | The model can probably perform fine at 1 frame per second
             | (3600*256=921600 tokens), and they could probably use some
             | sort of compression.
        
         | tbruckner wrote:
         | How do you know it isn't RAG?
        
         | tveita wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | The video queries they show take around 1 minute each, this
         | probably burns a ton of GPU. I appreciate how clearly they
         | highlight that the video is sped up though, they're clearly
         | trying to avoid repeating the "fake demo" fiasco from the
         | original Gemini videos.
        
         | theGnuMe wrote:
         | For #1 and #2 it is some version of mixture of experts. This is
         | mentioned in the blog post. So each expert only sees a subset
         | of the tokens.
         | 
         | I imagine they have some new way to route tokens to the experts
         | that probably computes a global context. One scalable way to
         | compute a global context is by a state space model. This would
         | act as a controller and route the input tokens to the MoEs.
         | This can be computed by convolution if you make some
         | simplifying assumptions. They may also still use transformers
         | as well.
         | 
         | I could be wrong but there are some Mamba-MoEs papers that
         | explore this idea.
        
         | resouer wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | This may not be true. My experience of the complexity of RAG
         | lays in how to properly connect to various unstructured data
         | sources and perform data transformation pipeline for large
         | scale data set (which means GB, TB or even PB). It's in the
         | critical path rather a "nice to have", because the quality of
         | data and the pipeline is a major factor for the final generated
         | the result. i.e., in RAG, the importance of R >>> G.
        
         | jorvi wrote:
         | I just hope at some point we get access to mostly uncensored
         | models. Both GPT-4 and Gemini are extremely shackled, and a
         | slightly inferior model that hasn't been hobbled by a very
         | restricting preprompt would handily outperform them.
        
           | ShamelessC wrote:
           | You can customize the system prompt with ChatGPT or via the
           | completions API, just fyi.
        
         | ren_engineer wrote:
         | RAG would still be useful for cost savings assuming they charge
         | per token, plus I'm guessing using the full-context length
         | would be slower than using RAG to get what you need for a
         | smaller prompt
        
           | nostrebored wrote:
           | This is going to be the real differentiator.
           | 
           | HN is very focused on technical feasibility (which remains to
           | be seen!), but in every LLM opportunity, the CIO/CFO/CEO are
           | going to be concerned with the cost modeling.
           | 
           | The way that LLMs are billed now, if you can densely pack the
           | context with relevant information, you will come out ahead
           | commercially. I don't see this changing with the way that LLM
           | inference works.
           | 
           | Maybe this changes with managed vector search offerings that
           | are opaque to the user. The context goes to a preprocessing
           | layer, an efficient cache understands which parts haven't
           | been embedded (new bloom filter use case?), embeds the other
           | chunks, and extracts the intent of the prompt.
        
             | mediaman wrote:
             | Agreed with this.
             | 
             | The leading ability AI (in terms of cognitive power) will,
             | generally, cost more per token than lower cognitive power
             | AI.
             | 
             | That means that at a given budget you can choose more
             | cognitive power with fewer tokens, or less cognitive power
             | with more tokens. For most use cases, there's no real point
             | in giving up cognitive power to include useless tokens that
             | have no hope of helping with a given question.
             | 
             | So then you're back to the question of: how do we reduce
             | the number of tokens, so that we can get higher cognitive
             | power?
             | 
             | And that's the entire field of information retrieval, which
             | is the most important part of RAG.
        
             | golol wrote:
             | The way that LLMs are billed now, if you can densely pack
             | the context with relevant information, you will come out
             | ahead commercially. I don't see this changing with the way
             | that LLM inference works.
             | 
             | Really? Because to my understanding the compute necessary
             | to generate a token grows linearly with the context, and
             | doesn't the OpenAI billing reflect that by seperating
             | prompt and output tokens?
        
         | cchance wrote:
         | The youtube video of the Multimodal analysis of a video is
         | insane, imagine feeding in movies or tv shows and being able to
         | autosummary or find information about them dynamically, how the
         | hell is all this possible already? AI is moving insanely fast.
        
           | vineyardmike wrote:
           | > imagine feeding in movies or tv shows
           | 
           | Google themselves have such a huge footprint of various
           | businesses, that they alone would be an amazing customer for
           | this, never mind all the other cool opportunities from third
           | parties...
           | 
           | Imagine that they can ingest the entirety of YouTube and then
           | dump that into Google Search's index AND use it to generate
           | training data for their next LLM.
           | 
           | Imagine that they can hook it up to your security cameras
           | (Nest Cam), and then ask questions about what happened last
           | night.
           | 
           | Imagine that you can ask Gemini how to do something (eg. fix
           | appliance), and it can go and look up a YouTube video on how
           | to accomplish that ask, and explain it to you.
           | 
           | Imagine that it can apply summarization and descriptions to
           | every photo AND video in your personal Google Photos library.
           | You can ask it to find a video of your son's first steps, or
           | a graduation/diploma walk for your 3rd child (by name) and it
           | can actually do that.
           | 
           | Imagine that Google Meet video calls can have the entire
           | convo itself fed into an LLM (live?), instead of just a
           | transcription. You can have an AI assistant there with you
           | that can interject and discuss, based on both the audio and
           | video feed.
        
             | anhner wrote:
             | I'd love to see that applied to the Google ecosystem, the
             | question is - why haven't they already done this?
        
               | is_true wrote:
               | IMO, they aren't sure how to monetize it, Google is run
               | by the ads team.
               | 
               | Problem is they are jeopardizing their moat.
               | 
               | Google is still in a great position, they have the
               | knowledge and lots of data to pull this off. They just
               | have to take the risk of losing some ad revenue for a
               | while.
        
               | vineyardmike wrote:
               | Well, they just announced publicly that the technology is
               | available. Maybe its just too new to have been
               | productized so far.
        
         | zitterbewegung wrote:
         | RAG doesn't go away at 10 Million tokens if you do esoteric
         | sources like shodan API queries.
        
         | kylerush wrote:
         | I assume using this large of a context window instead of RAG
         | would mean the consumption of many orders of magnitude more
         | GPU.
        
         | karmasimida wrote:
         | Even 1m tokens eliminate the need for RAG, unless it is for
         | cost.
        
           | sroussey wrote:
           | Or accuracy
        
           | 7734128 wrote:
           | 1 million might sound like a lot, but it's only a few
           | megabytes. I would want RAG, somehow, to be able to process
           | gigabytes or terabytes of material in a streaming fashion.
        
             | karmasimida wrote:
             | RAG will not change how many tokens LLM can produce at
             | once.
             | 
             | Longer context on the other hand, could put some RAG use
             | cases to sleep, if your instructions are like, literally a
             | manual long, then there is no need for rag.
        
               | 7734128 wrote:
               | I think RAG could be used that do that. If you have a one
               | time retrieval in the beginning, basically amending the
               | prompt, then I agree with you. But there are projects
               | (classmate doing his masters thesis project as one
               | implementation of this) that retrieves once every few
               | tokens and make the retrieved information available to
               | the generation somehow. That would not take a toll on the
               | context window.
        
         | localhost wrote:
         | RE: RAG - they haven't released pricing, but if input tokens
         | are priced at GPT-4 levels - $0.01/1K then sending 10M tokens
         | will cost you $100.
        
           | s-macke wrote:
           | If you think the current APIs will stay that way, then you're
           | right. But when they start offering dedicated chat instances
           | or caching options, you could be back in the penny region.
           | 
           | You probably need a couple GB to cache a conversation. That's
           | not so easy at the moment because you have to transfer that
           | data to and from the GPUs and store the data somewhere.
        
             | localhost wrote:
             | The tokens need to be fed into the model along with the
             | prompt and this takes time. Naive attention is O(N^2). They
             | probably use at least flash attention, and likely something
             | more exotic to their hardware.
             | 
             | You'll notice in their video [1] that they never show the
             | prompts running interactively. This is for a roughly 800K
             | context. They claim that "the model took around 60s to
             | respond to each of these prompts".
             | 
             | This is not really usable as an interactive experience. I
             | don't want to wait 1 minute for an answer each time I have
             | a question.
             | 
             | [1] https://www.youtube.com/watch?v=SSnsmqIj1MI
        
           | campers wrote:
           | In the announcements today they also halved the pricing of
           | Gemini 1.0 Pro to $0.000125 / 1K characters, which is a
           | quarter of GPT3.5 Turbo so it could potentially be a bit
           | lower than GPT-4 pricing.
        
         | TweedBeetle wrote:
         | Regarding how they're getting to 10M context, I think it's
         | possible they are using the new SAMBA architecture.
         | 
         | Here's the paper: https://arxiv.org/abs/2312.00752
         | 
         | And here's a great podcast episode on it:
         | https://www.cognitiverevolution.ai/emergency-pod-mamba-memor...
        
           | LightMachine wrote:
           | As a Brazilian, I approve that choice. Vambora amigos!
        
         | renonce wrote:
         | > They don't talk about how they get to 10M token context
         | 
         | I don't know how either but maybe
         | https://news.ycombinator.com/item?id=39367141
         | 
         | Anyway I mean, there is plenty of public research on this so
         | it's probably just a matter of time for everyone else to catch
         | up
        
           | albertzeyer wrote:
           | Why do you think this specific variant (RingAttention)? There
           | are so many different variants for this.
           | 
           | As far as I know, the problem in most cases is that while the
           | context length might be high in theory, the actual ability to
           | use it is still limited. E.g. recurrent networks even have
           | infinite context, but they actually only use 10-20 frames as
           | context (longer only in very specific settings; or maybe if
           | you scale them up).
        
             | renonce wrote:
             | There are ways to test the neural network's ability to
             | recall from a very long sequence. For example, if you
             | insert a random sentence like "X is Sam Altman" somewhere
             | in the text, will the model be able to answer the question
             | "Who is X?", or maybe somewhat indirectly "Who is X (in
             | another language)" or "Which sentence was inserted out of
             | context?" "Which celebrity was mentioned in the text?"
             | 
             | Anyways the ability to generalize to longer context length
             | is evidenced by such tests. If every token of the model's
             | output is able to answer questions in such a way that any
             | sentence from the input would be taken into account, this
             | gives evidence that the full context window indeed matters.
             | Currently I find Claude 2 to perform very well on such
             | tasks, so that sets my expectation of how a language model
             | with an extremely long context window should look like.
        
         | AaronFriel wrote:
         | There will always be more data that _could_ be relevant than
         | fits in a context window, and especially for multi-turn
         | conversations, huge contexts incur huge costs.
         | 
         | GPT-4 Turbo, using its full 128k context, costs around $1.28
         | per API call.
         | 
         | At that pricing, 1m tokens is $10, and 10m tokens is an eye-
         | watering $100 per API call.
         | 
         | Of course prices will go down, but the price advantage of
         | working with less will remain.
        
           | 7734128 wrote:
           | Would the price really increase linearly? Isn't the demands
           | on compute and memory increasing steeper than that as a
           | function of context length?
        
           | elorant wrote:
           | I don't see a problem with this pricing. At 1m tokens you can
           | upload the whole proceedings of a trial and ask it to draw an
           | analysis. Paying $10 for that sounds like a steal.
        
             | AaronFriel wrote:
             | Of course, if you get exactly the answer you want in the
             | first reply.
        
             | staticman2 wrote:
             | While it's hard to say what's possible on the cutting edge,
             | historically models tend to get dumber as the context size
             | gets bigger. So you'd get a much more intelligent analysis
             | of a 10,000 token excerpt of the trial than a million token
             | complete transcript of the trial. I have not spent the
             | money testing big token sizes in GPT 4 turbo, but it would
             | not surprise me if it gets dumber. Think of it this way, if
             | the model is limited to 3,000 token replies, if an analysis
             | would require a more detailed response than 3,000 tokens,
             | it cannot provide it, it'll just give you insufficient
             | information. What it'll probably do is ignore parts of the
             | trial transcript because it can't analyze all that
             | information in 3,000 tokens. And asking a followup question
             | is another million tokens.
        
             | ithkuil wrote:
             | Unfortunately the whole context has to be reprocessed fully
             | for each query, which means that if you "chat" with the
             | model you'll incur in that $10 fee for every interaction
             | which quickly sums up.
             | 
             | It may still be worth it for some use cases
        
         | qwerty_clicks wrote:
         | FYI, MM is the standard for million. 10MM not 10M I'm reading
         | all these comments confused as heck why you are excited about
         | 10M tokens
        
           | MichaelNolan wrote:
           | Maybe for accountants, but for everyone else a single M is
           | much more common.
        
         | a_vanderbilt wrote:
         | After their giant fib with the Gemini video a few weeks back
         | I'm not believing anything til I see it used by actual people.
         | I hope it's that much better than GPT-4, but I'm not holding my
         | breath there isn't an asterisk or trick hiding somewhere.
        
         | nborwankar wrote:
         | Re RAG aren't you ignoring the fact that no one wants to put
         | confidential company data into such LLM's. Private RAG
         | infrastructure remains a need for the same reason that privacy
         | of data of all sorts remains a need. Huge context solves the
         | problem for large open source context material but that's only
         | part of the picture.
        
         | outside1234 wrote:
         | It takes 60 seconds to process all of that context in their
         | three.js demo, which is, I will say, not super interactive. So
         | there is still room for RAG and other faster alternatives to
         | narrow the context.
        
         | aubanel wrote:
         | > They are pretty clear that 1.5 Pro is better than GPT-4 in
         | general, and therefore we have a new LLM-as-judge leader, which
         | is pretty interesting
         | 
         | I fully disagree, they compare Gemini 1.5 Pro and GPT4 only on
         | context length, not on other tasks where they compare it only
         | to other Gemini which is a strange self-own.
         | 
         | I'm convinced that if they do not show the results against
         | GPT4/Claude, it is because they do not look good.
        
         | kristjansson wrote:
         | For other's reference, the paper:
         | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
         | joshsabol46 wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | RAG is needed for the same reason you don't `SELECT *` all of
         | your queries.
        
         | nestorD wrote:
         | Regarding the 10M tokens context, RingAttention has been shown
         | [0] recently (by researchers, not ML engineers in a FAANG) to
         | be able to scale to comparable (1M) context sizes (it does take
         | work and a _lot_ of GPUs).
         | 
         | [0]: https://news.ycombinator.com/item?id=39367141
        
           | jebarker wrote:
           | > researchers, not ML engineers in a FAANG
           | 
           | Why did you point out this distinction?
        
             | nestorD wrote:
             | It means they have significantly less means (to get a lot
             | of GPUs letting them scale up in context length) and are
             | likely less well-versed in optimization (which also helps
             | with scaling up)[0].
             | 
             | I believe those two things together are likely enough to
             | explain the difference between a 1M context length and a
             | 10M context length.
             | 
             | [0]: Which is not looking down on that particular research
             | team, the vast majority of people have less means and
             | optimization know-how than Google.
        
             | vineyardmike wrote:
             | Probably to indicate that its research and not productized?
        
         | bschne wrote:
         | > The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | 1. People mention accuracy issues with longer contexts 2.
         | People mention processing time issues with longer contexts 3.
         | Something people haven't mentioned in this thread is cost --
         | even thought prompt tokens are usually cheaper than generated
         | tokens, and Gemini seems to be cheaper than GPT-4, putting a
         | whole knowledge base or 80-page document in the context is
         | going to make every time you run that prompt quite expensive
        
         | lqcfcjx wrote:
         | This might be a stupid question - even if there's no quality
         | degradation from 10M context, will it be extremely slow in
         | reference?
        
         | Havoc wrote:
         | >3. The 10M context ability wipes out most RAG stack complexity
         | immediately.
         | 
         | I'd imagine RAG would still be much more efficient
         | computationally
        
         | oblio wrote:
         | What's RAG?
        
           | girvo wrote:
           | Retrieval augmented generation.
           | 
           | > Retrieval Augmented Generation (RAG) is a technique where
           | the capabilities of a large language model (LLM) are
           | augmented by retrieving information from other systems and
           | inserting them into the LLM's context window via a prompt.
           | 
           | (stolen from: https://github.com/psychic-api/rag-stack)
        
           | ohmyiv wrote:
           | Retrieval Augmented Generation. In basic terms, it optimizes
           | output of LLMs by using additional external data sources
           | before answering queries. (That actually might be too basic
           | of a description)
           | 
           | Here:
           | 
           | https://blogs.nvidia.com/blog/what-is-retrieval-augmented-
           | ge...
        
             | ssd532 wrote:
             | Is it same as embedding? Is embedding an RAG method?
        
         | DebtDeflation wrote:
         | >The 10M context ability wipes out most RAG stack complexity
         | immediately
         | 
         | From a technology standpoint, maybe. From an economics
         | standpoint, it seems like it would be quite expensive to jam
         | the entire corpus into every single prompt.
        
         | shostack wrote:
         | Wake me when I can get access without handing over my texts and
         | contacts. I opened the Gemini app on Android and that onerous
         | privacy policy was the first experience. Worse, I didn't seem
         | able to move past accepting giving Google the ability to hoover
         | up my data to disable that in the settings so I just gave up
         | and went back to ChatGPT where I at least generally have
         | control over the data I give it.
        
         | tomaskafka wrote:
         | "I really want to know how they're getting to 10M context,
         | though."
         | 
         | My $5 says it's a RAG or a similar technique (hierarchical RAG
         | comes to mind), just like all other large context LLMs.
        
       | cubefox wrote:
       | I think Anthropic and OpenAI could also have offered a one
       | million context window a while ago. The relevant architecture
       | breakthrough was probably when a linear increase in context
       | length only required a linear increase in inference compute
       | instead of a quadratic one. Anthropic and then OpenAI achieved
       | linear context compute scaling before an architecture for it was
       | published publicly (MAMBA paper).
        
         | bearjaws wrote:
         | The problem is, the 128k window performed terribly and showed
         | that attention was mostly limited to the first and last 20%.
         | 
         | Increasing it to 1M just means even more data is ignored.
        
           | cubefox wrote:
           | Maybe their architecture wasn't as good as MAMBA and Google
           | could use the better architecture thanks to being late to the
           | game...
        
       | zippothrowaway wrote:
       | I've always been suspicious of any announcement from Demis
       | Hassabis since way back in his video game days when he did a
       | monthly article in Edge magainze about the game he was
       | developing. "Infinite Polygons" became a running joke in the
       | industry because of his obvious snake-oil. The game itself,
       | Republic [1], was an uninteresting failure.
       | 
       | He learned how to promote himself from working for Peter "Project
       | Milo" Molyneux and I see similar patterns of hype.
       | 
       | [1]
       | https://en.wikipedia.org/wiki/Republic:_The_Revolution#Marke...
        
         | pradn wrote:
         | The line between delusional and visionary is thin! I know I'm
         | too grounded in "expected value" math to do super outlier stuff
         | like starting a video game company...
        
         | Qwero wrote:
         | Funny read about his game.
         | 
         | Nonetheless while still underwhelming in comparison to gpt-4
         | (excluding this announcement as I haven't tried it yet), alpha
         | go, zero and especially fold were tremendous!
        
         | COAGULOPATH wrote:
         | Yeah, it's funny. I used to think "Demis Hassabis...where have
         | I heard that name before?" And then I realized I saw him in the
         | manuals for old Bullfrog games.
        
         | rreichman wrote:
         | And yet - AlphaGo, AlphaZero, AlphaFold...
        
       | obblekk wrote:
       | Very impressive if the benchmarks replicate. Some questions:
       | 
       | * token cost? In multiples of Gemini pro 1
       | 
       | * memory usage? Does already scarce gpu memory become even more
       | of a bottleneck?
       | 
       | * video resolution? Sherlock Jr (1924) is their test video -
       | black and white, 45min, low res
       | 
       | Most curious about the video... I wonder if RAG within video will
       | become the next battlefront
        
       | technics256 wrote:
       | Does anyone actually have access to Ultra yet? It's a lame blog
       | post where it says "it's available!" but the fine print says "by
       | whitelist".
       | 
       | Ok, whatever that means.
       | 
       | OpenAI at least releases it all at once, to everyone.
        
         | Szpadel wrote:
         | oh, openai had a lot of waitlists also, gpt4 API, large context
         | versions etc
        
       | sonium wrote:
       | I just watched the demo with the Apollo 11 transcript. (sidenote:
       | maybe Gemini is named after the space program?).
       | 
       | Wouldn't the transcript or at least a timeline of Apollo 11 be
       | part of the training corpus? So even without the 400 pages in the
       | context window just given the drawing I would assume a prompt
       | like "In the context of Apoll 11, what moment does the drawing
       | refer to?" would yield the same result.
        
         | technics256 wrote:
         | Gemini is named that way because of the collaboration between
         | Google brain and deep mind
        
         | singularity2001 wrote:
         | Correct except that it spits out the timestamp
        
         | torginus wrote:
         | Gemini is named after the spacecraft that put the second person
         | into orbit - pretty aptly named, but not sure if this was the
         | intention.
        
           | DrNosferatu wrote:
           | Google needs their Apollo.
        
           | d0mine wrote:
           | The second person was put by MR-3 (Mercury, not Gemini) https
           | ://en.m.wikipedia.org/wiki/Timeline_of_space_travel_by_...
        
         | empath-nirvana wrote:
         | i asked chatgpt4 to identify three humorous moments in the
         | apollo 11 transcript and it hallucinated all 3 of them (i think
         | -- i can't find what it's referring to). Presumably it's in
         | it's corpus, too.
         | 
         | > The "Snoopy" Moment: During the mission, the crew had a
         | small, black-and-white cartoon Snoopy doll as a semi-official
         | mascot, representing safety and mission success. At one point,
         | Collins joked about "Snoopy" floating into his view in the
         | spacecraft, which was a light moment reflecting the camaraderie
         | and the use of humor to ease the intense focus required for
         | their mission.
         | 
         | The "Biohazard" Joke: After the successful moon landing and
         | upon preparing for re-entry into Earth's atmosphere, the crew
         | humorously discussed among themselves the potential of being
         | quarantined back on Earth due to unknown lunar pathogens. They
         | joked about the extensive debriefing they'd have to go through
         | and the possibility of being a biohazard. This was a light-
         | hearted take on the serious precautions NASA was taking to
         | prevent the hypothetical contamination of Earth with lunar
         | microbes.
         | 
         | The "Mailbox" Comment: In the midst of their groundbreaking
         | mission, there was an exchange where one of the astronauts
         | joked about expecting to find a mailbox on the Moon, or asking
         | where they should leave a package, playing on the surreal
         | experience of being on the lunar surface, far from the ordinary
         | elements of Earthly life. This comment highlighted the
         | astronauts' ability to find humor in the extraordinary
         | circumstances of their journey.
        
       | htrp wrote:
       | > Gemini 1.5 delivers dramatically enhanced performance. It
       | represents a step change in our approach, building upon research
       | and engineering innovations across nearly every part of our
       | foundation model development and infrastructure. This includes
       | making Gemini 1.5 more efficient to train and serve, with a new
       | Mixture-of-Experts (MoE) architecture.
       | 
       | Looks like they fine tuned across use cases and grabbed the
       | mixtral architecture?
        
         | sebzim4500 wrote:
         | There's no way that's all it is, scaling mixtral to a context
         | length of 10M while maintaining any level of reasoning ability
         | would be extremely slow. If the only purpose of the model was
         | to produce this report then maybe that's possible, but if they
         | plan on actually deploying this to end users then there is no
         | way they can run quadratic attention on 10M tokens.
        
       | joak wrote:
       | <<We'll also introduce 1.5 Pro with a standard 128,000 token
       | context window when the model is ready for a wider release>>
       | 
       | So actually they are lagging: their 128k model is yet to be
       | released while OpenAI released theirs some months ago.
        
         | joak wrote:
         | Their 10M tokens demo is impressive though. They "released" a
         | demo. Confusing...
        
         | kyrra wrote:
         | See: https://blog.google/technology/ai/google-gemini-next-
         | generat...
         | 
         | > Gemini 1.5 Pro comes with a standard 128,000 token context
         | window. But starting today, a limited group of developers and
         | enterprise customers can try it with a context window of up to
         | 1 million tokens via AI Studio and Vertex AI in private
         | preview.
        
           | joak wrote:
           | Gemini 1.5 Pro is not yet released: <<Starting today, we're
           | offering a limited preview of 1.5 Pro to developers and
           | enterprise customers via AI Studio and Vertex AI>>
           | 
           | Something like an alpha version.
           | 
           |  _Limited preview_ in their jargon.
        
       | iamgopal wrote:
       | AI race is amazing, Nvidia reaping the benefits now, but soon the
       | world.
        
       | cubefox wrote:
       | The whitepaper says the Buster Keaton film was reduced to 1 FPS
       | before being fed in. Apparently multi-modal language models can
       | only read individual pictures, so videos have to be reduced to a
       | series of frames. I assume animal brains are more efficient than
       | that. E.g. by only feeding the "changes/difference over time"
       | instead of a sequence of time slices.
        
         | riku_iki wrote:
         | it will probably eventually be improved by adding some encoder
         | on top of LLM, which will encode 60 frames into 1 while
         | attempting to preserve information..
        
       | freedomben wrote:
       | > _Our teams continue pushing the frontiers of our latest models
       | with safety at the core._
       | 
       | They're not kidding, Gemini (at least what's currently available)
       | is so safe that it's not all that useful.
       | 
       | The "safety" permeates areas where you wouldn't even expect it,
       | like refusing to answer questions about "unsafe" memory
       | management in C. It interjects lectures about safety in answers
       | when you didn't even ask it to do that in the question.
       | 
       | For example, I clicked on _one of the four example questions that
       | Gemini proposes to help you get started_ and it was something
       | like  "Write an SMS calling in sick. It's a big presentation day
       | and I'm sad to let the team down." Gemini decided to tell me that
       | it can't impersonate positions of trust like medical
       | professionals or employers (which is not at all what I asking it
       | to do).
       | 
       | The other things I asked it, it gave me wrong and obviously wrong
       | answers. The funniest (though glad it was obviously wrong) was
       | when I asked it "I'm flying from Karachi to Denver. Will I need
       | to pick up my bags in Newark?" and it told me "no, because
       | Karachi to Newark is a domestic flight"
       | 
       | Unless they stop putting "safety at the core," or figure out how
       | to do it in a way that isn't unnecessarily inhibiting, annoying,
       | and frankly insulting (protip: humans don't like to be accused of
       | asking for unethical things, especially when they weren't asking
       | for them. when other humans do that to us, we call that assuming
       | the worst and it's a negative personality trait), any
       | announcements/releases/breakthroughs from Google are going to be
       | a "meh" for me.
        
       | dghlsakjg wrote:
       | This is incredible if it isn't just hype!
       | 
       | I hope the demos aren't fudged/scripted like Google did with
       | Gemini 1.0
        
         | amf12 wrote:
         | These demos seem to be videos from AI studio, and which display
         | the time in seconds. Hopefully not fudged.
        
       | EZ-E wrote:
       | Remember AI Dungeon and how it was frustrating about how it would
       | forget what happened previously? With a 10M context window, am I
       | right to assume it would be possible to weave a story which would
       | span with multiple multiple books worth of content? (more or less
       | 1400 pages)
        
         | dougmwne wrote:
         | Pretty much! Check out this demo of finding a scene in a 1400
         | page book based on a stick figure drawing. Mind blowing, right?
         | 
         | https://twitter.com/JeffDean/status/1758148159942091114
        
           | EZ-E wrote:
           | In theory it would be possible to drop a book and just say
           | "hey Google, create a sequel"
           | 
           | But I doubt it is /that/ good, it's not like we can test it
           | either
        
         | VikingCoder wrote:
         | Dear Google,
         | 
         | Teach Gemini how to be a Dungeon Master, and run free
         | adventures at Comic Con.
         | 
         | Then offer it up as a subscription.
         | 
         | Sincerely,
         | 
         | Everyone
        
         | og_kalu wrote:
         | 10M tokens is about 25,000 pages. 10M tokens is also never
         | coming to production and is solely research testing.
         | 
         | 1M tokens is what they've said will be available for production
         | and is about 2,500 pages.
        
       | eigenvalue wrote:
       | Based on what I've seen so far, I think the probability that this
       | is actually better than GPT4 on the kind of real world coding
       | tasks that I use it for is less than 1%. Literally everything
       | from Google on this has been vaporware or laughably bad in actual
       | practice in my personal experience. Which is totally insane to me
       | given their financial resources, human resources, and multi-year
       | lead in AI/DL research, but that's what seems to have happened. I
       | certainly hope that they can develop and actually release a
       | capable model, but at this point, I think you have to be deeply
       | skeptical of everything they say until such a model is available
       | for real by the public and you can try it on actual, real tasks
       | and not fake benchmark nonsense and waitlists.
        
       | scarmig wrote:
       | One interesting tidbit from the technical report:
       | 
       | >HumanEval is an industry standard open-source evaluation
       | benchmark (Chen et al., 2021), but we found controlling for
       | accidental leakage on webpages and open-source code repositories
       | to be a non-trivial task, even with conservative filtering
       | heuristics. An analysis of the test data leakage of Gemini 1.0
       | Ultra showed that continued pretraining on a dataset containing
       | even a single epoch of the test split for HumanEval boosted
       | scores from 74.4% to 89.0%, highlighting the danger of data
       | contamination. We found that this sharp increase persisted even
       | when examples were embedded in extraneous formats (e.g. JSON,
       | HTML). We invite researchers assessing coding abilities of these
       | models head-to-head to always maintain a small set of truly held-
       | out test functions that are written in-house, thereby minimizing
       | the risk of leakage. The Natural2Code benchmark, which we
       | announced and used in the evaluation of Gemini 1.0 series of
       | models, was created to fill this gap. It follows the exact same
       | format of HumanEval but with a different set of prompts and
       | tests.
        
       | llm_trw wrote:
       | Yeah. I'll believe that when I can use it.
        
       | DidISayTooMuch wrote:
       | How can I fine tune these models for my use? Their docs isn't
       | clear whether the Gemini models are fine tuneable.
        
       | qwertox wrote:
       | As a sidenote, it's worth clicking the play button and then
       | checking how they're highlighting the current paragraph and word
       | in the inspector.
        
       | aubanel wrote:
       | For reference, here is the technical report:
       | https://storage.googleapis.com/deepmind-media/gemini/gemini_...
        
       | dumbmachine wrote:
       | It would, probably, be cost prohibitive to use 10M context to
       | it's fullest each time.
       | 
       | I instead hope for to have an api to access to the context as a
       | datastore, so like RAG we can control what to store but unlike
       | rag all data stays within context.
        
       | killthebuddha wrote:
       | 10M tokens is an absolute game changer, especially if there's no
       | noticeable decay in quality with prompt size. We're going to see
       | things like entire domain specific languages embedded in prompts.
       | IMO people will start thinking of the prompt itself as a sort of
       | runtime rather than a static input.
       | 
       | Back when OpenAI still supported raw text completion with text-
       | davinci-003 I spent some time experimenting with tiny prompt-
       | embedded DSLs. The results were very, very, interesting IMO. In a
       | lot of ways, text-davinci-003 with embedded functions still feels
       | to me like the "smartest" language model I've ever interacted
       | with.
       | 
       | I'm not sure how close we are to "superintelligence" but for
       | baseline general intelligence we very well could have already
       | made the prerequisite technological breakthroughs.
        
         | empath-nirvana wrote:
         | It's pretty slow, though looks like up to 60 seconds for some
         | of the answers, and uses god knows how much compute, so there's
         | probably going to be some trade offs -- you're going to want to
         | make sure that that much context is actually useful for what
         | you want.
        
           | drusepth wrote:
           | TBF: when talking about the first "superintelligence", I'd
           | expect it to take unreasonable amounts of compute and/or be
           | slow -- that can always be optimized. Bringing it into
           | existence in the first place is the hardest part.
        
             | unshavedyak wrote:
             | Yea. Of course for some tasks we need speed, but i've been
             | kinda surprised that we haven't seen very slow models which
             | perform far better than faster models. We're treading new
             | territory, and everyone seems to make models that are "fast
             | enough".
             | 
             | I wanna see how far this tech can scale, regardless of
             | speed. I don't care if it takes 24h to formulate a
             | response. Are there "easy" variables which drastically
             | improve output?
             | 
             | I suspect not. I imagine people have tried that. Though i'm
             | still curious as to why.
        
               | TaylorAlexander wrote:
               | I think the problem is that 24 hours of compute to run a
               | response would be incredibly expensive. I mean hell how
               | would that even be trained.
        
       | Yusefmosiah wrote:
       | I see a lot of talk about retrieval over long context. Some even
       | think this replaces RAG.
       | 
       | I don't care if the model can tell me which page in the book or
       | which code file has a particular concept. RAG already does this.
       | I want the model to notice how a concept is distributed
       | throughout a text, and be able to connect, compare, contrast,
       | synthesize, and understand all the ways that a book touches on a
       | theme, or to rewrite multiple code files in one pass, without
       | introducing bugs.
       | 
       | How does Gemini 1.5's reasoning compare to GPT-4? GPT-4 already
       | has superhuman memory; its bottleneck is its relatively weak
       | reasoning.
        
         | sinuhe69 wrote:
         | In my experience (I work mostly and deeply with Bard/Gemini),
         | the reasoning capability of Gemini is quite good. Gemini Pro is
         | already much better than ChatGPT 3.5, but they still make quite
         | a few mistakes along the way. What is more worrying is that
         | when these models made mistakes, they tried really hard to
         | justify their reasoning (errors), practically misleading the
         | users. Because of their high mimicry ability, users really have
         | to pay attention to validate and eventually spot the errors. Of
         | course, this is still far below the human level, so I'm not
         | sure whether they add value or are more of a burden.
        
         | og_kalu wrote:
         | The most impressive demonstration of long context is this in my
         | opinion,
         | 
         | https://imgur.com/a/qXcVNOM
         | 
         | Testing language translation abilities of an extremely obscure
         | language after passing in one grammar book as context.
        
       | petargyurov wrote:
       | Version number suggests they're waiting to announce something
       | bigger already?
        
       | bloopernova wrote:
       | Hooray for competition.
        
       | luke-stanley wrote:
       | Still no Ultra model API available to UK devs? Considering
       | Deepmind's London base, this is kinda strange. Maybe they could
       | ask Ultra how to roll it out faster?
        
       | bobvanluijt wrote:
       | Demo with Google AI Studio:
       | https://twitter.com/bobvanluijt/status/1758185143116730875
        
       | processing wrote:
       | just wade through documentation to access it?
       | 
       | clicking on the AI studio link doesn't show me the app page - it
       | redirects to a document on early access. I do as required - go
       | back and try clicking on the AI studio link and I'm redirected to
       | the document on turning early access.
       | 
       | frustrating.
        
       | robertlagrant wrote:
       | Slightly surprisingly I can't get to AI Studio from the UK. It is
       | available in quite a few countries, but not here.
        
       | ChildOfChaos wrote:
       | Is this just more nonsense from Google though? I expect big
       | things from Google, but they need to shut up and actually release
       | stuff instead of saying how amazing there stuff is and then
       | release potato ai, nothing they have done in the AI space
       | recently has lived up to any of the hype, they should stay silent
       | for a bit then release something that kills GPT4 if they honestly
       | are able but instead they are just full of hype.
        
         | sinuhe69 wrote:
         | Yeah, their Gemini demo was a disaster. But they have released
         | their Ultra model for the general audience, so you can test
         | them yourself. Talking about killing the competitor is a little
         | funny, considering they are all generative LLM based on the
         | same principles (and general architecture) with their inherent
         | flaws and shortcomings. All of them can not even execute a
         | basic plan like a cheap human assistant. So their values are
         | very limited.
         | 
         | Breakthrough will only come with a next generation
         | architecture. LLM for special domains is currently the most
         | promising approach.
        
           | ChildOfChaos wrote:
           | Yeah but even with ultra they kept saying how it was better
           | than GPT4 and then when it actually got released it was
           | awful.
        
       | jeffbee wrote:
       | A little off-topic I guess, but is anyone else seeing what I am
       | seeing: a total inability to actually upgrade to paid Gemini?
       | Every time I try to sign up it serves me an error page: "We're
       | sorry - Google One storage plans aren't available right now."
        
       | DrNosferatu wrote:
       | Did they say a general availability date?
       | 
       | (a bit confused)
        
       | dang wrote:
       | There's also
       | https://twitter.com/JeffDean/status/1758146022726041615
       | 
       | (via https://news.ycombinator.com/item?id=39383593, but we merged
       | those comments hither)
        
       | topicseed wrote:
       | 1 million tokens?? This is wild and a lot of RAG can be removed.
        
       | topicseed wrote:
       | Is this going to be only for consumer Gemini app or for
       | API/Vertex too? The context window is..... Simply lovely.
        
       | summerlight wrote:
       | One interesting proposal here is a multiple NIAH retrieval
       | benchmark. When they put 100 needles, then the recall rate
       | becomes considerably lower, something around 60~70%. Not sure
       | what's the exact configuration of this benchmark, but intuitively
       | this makes sense and should be a critical metric for the model's
       | reliability.
        
       | reissbaker wrote:
       | The long context length is of course incredible, but I'm more
       | shocked that the _Pro_ model is now on par with Ultra (~GPT-4, at
       | least the original release). That implies when they release 1.5
       | Ultra, we 'll finally have a GPT-4 killer. And assuming that 1.5
       | Pro is priced similarly to the current Pro, that's a 4x price
       | advantage per-token.
       | 
       | Not surprising that OpenAI shipped a blog post today about their
       | video generation -- I think they're feeling considerable heat
       | right now.
        
         | topicseed wrote:
         | Gemini 1 Ultra was also said to be on par with ChatGPT 4 and
         | it's not really there so let's see for ourselves when we can
         | get our hands on it.
        
           | reissbaker wrote:
           | Ultra benchmarked around the original release of GPT-4, not
           | the current model. My understanding is that was fairly
           | accurate -- it's close to current GPT-4 but not quite equal.
           | However, close-to-GPT-4 but 4x cheaper and 10x context length
           | would be very impressive and IMO useful.
        
             | refulgentis wrote:
             | No, it benchmarked around the original release of GPT-4
             | _given 32 attempts_ versus GPT-4 's 5.
        
         | tigershark wrote:
         | Feeling the heat? Did you actually watch the videos? That was a
         | huge leap forward compared to anything existing at the moment.
         | Order of magnitudes away from a blog post discussing a model
         | that maybe will finally be on par with chat gtp 4...
        
           | mupuff1234 wrote:
           | The openai announcement is also more or less a blog post,
           | isn't it?
           | 
           | Do we know how much time or money does it take to create a
           | movie clip?
        
             | tigershark wrote:
             | There was Sam Altman taking live prompt requests on twitter
             | and generating videos. They were not the same quality as
             | some of the ones in the website, but they were still
             | incredibly impressive.
        
               | mupuff1234 wrote:
               | And how much compute were those requests using?
        
       | m3kw9 wrote:
       | imagine sending 5-10 mbs over the network per request, and the
       | cost per token. You may accidently go broke after a big lag.
        
       | system2 wrote:
       | Let's hope this lowers the pricing of GPT-4 to GPT3.5 levels.
       | Because of Open AI's ridiculous pricing, we can't use it
       | regularly as it would cost us thousands of dollars per month.
        
       | stolsvik wrote:
       | So, this has native image/video modality. I wonder whether that
       | gives it an edge in physical / world understanding? That is,
       | handling and navigating our 3/4 dimensions? Cause and effect and
       | so on?
        
       | animanoir wrote:
       | Google is so finished, they are so late on this.
        
       | tmaly wrote:
       | Is there a $20 a month option for 1.5 Ultra?
       | 
       | If there is, where do I sign up?
        
       | ancorevard wrote:
       | Is this a blog post of did they actually ship?
        
       | jstummbillig wrote:
       | Imagine a day, when your new recording setting 10M token context
       | model is not enough to make it to hn #1
       | 
       | Wild times.
        
       | thot_experiment wrote:
       | I gotta say, I've been trying out Gemini recently and it's
       | embarrassingly bad. I can't take anything google puts out
       | seriously when their current offerings are so so much worse than
       | ChatGPT (or even local llama!).
       | 
       | As a particularly egregious example, yesterday night I gave
       | Gemini a list of drinks and other cocktail ingredients I had
       | laying around and asked for some recommendations for cute drinks
       | that I could make. It's response:
       | 
       | > I'm just a language model, so I can't help you with that.
       | 
       | ChatGPT 3.5 came up with several delicious options with clear
       | instructions, but it's not just this instance, I've NEVER gotten
       | a response from Gemini that I even felt was _more useful_ than
       | just a freaking bing search! Much less better than ChatGPT. I 'm
       | just going to assume they're using cherrypicked metrics to make
       | themselves feel better until proven otherwise. I have zero
       | confidence in Google's AI plays, and I assume all their competent
       | talent is now at OpenAI or Anthropic.
        
         | samspenc wrote:
         | My experiences are similar, but I think we are talking about
         | the Gemini free model, available on the Google Gemini website.
         | I think the rest of the comments are saying the paid versions
         | (Pro / Ultra) are significantly better, though I haven't tested
         | it myself to compare.
        
           | shellfishgene wrote:
           | I have the 2 months trial for the paid version, and find
           | myself going back to free ChatGPT often. Gemini loves to put
           | everything in bullet point lists and short paragraphs with
           | subheadings for example, even when asking for a letter. I'm
           | not a heavy user, but it seems to not quite get what I want
           | often. Not important but annoying: It starts almost every
           | answer with "Absolutely!", even when it doesn't match the
           | question (e.g. "How does x work?").
        
         | staticman2 wrote:
         | I don't think "I'm just a language model, I can't help you with
         | that" comes from Gemini. Google has a seperate censorship model
         | that blocks you from receiving Gemini's response in certain
         | situations.
         | 
         | When Gemeni (Ultra) refuses to do something itself it is more
         | verbose and specific as to why it won't do it, it my
         | experience.
        
       | mikeweiss wrote:
       | Does anyone know what kinds of GPUs/Chips Google is using for
       | Gemini? They aren't using Nvidia correct?
        
         | sackfield wrote:
         | TPUs: https://cloud.google.com/tpu?hl=en
        
           | mikeweiss wrote:
           | So Google doesn't rely on Nvidia at all? How come they are
           | the only ones that can manage to use non Nvidia chips and
           | compete with Open AI?
        
             | fragmede wrote:
             | They offer Nvidia GPUs on GCP so they use them at some
             | level.
        
             | Andrex wrote:
             | Google's been making TPU chips optimized for machine
             | learning and using them in data centers for almost a
             | decade[0]. They were well-poised to capitalize on AI from a
             | lot of angles.
             | 
             | 0. https://en.wikipedia.org/wiki/Tensor_Processing_Unit#His
             | tory
        
       | jakub_g wrote:
       | Very off-topic but I can't help, the pace of change reminds of
       | the "Bates 4000" sketch from The Onion Movie:
       | 
       | https://m.youtube.com/watch?v=fw7FniaeaSo
        
       | TaylorAlexander wrote:
       | Does anyone know how to get Gemini to help refactor code? I'm
       | trying to paste in my code file and the web page says "an error
       | has occurred" and the code does not show up in the code window. I
       | tried signing up for Gemini Advanced and that didn't help. I also
       | tried pointing it to the file on GitHub and it said it couldn't
       | access it. People here are saying Gemini is great for code
       | refactoring. How do you do that?
        
       | noisy_boy wrote:
       | I feel sad for those who are in law school right now.
        
       | arthur_sav wrote:
       | zzZZzzzZ
        
       | ylluminate wrote:
       | Has anyone given an idea of the release timeline for 1.5?
        
       | newzisforsukas wrote:
       | ah, so google found a moat.
        
       | Aeolun wrote:
       | This got my trying Gemini, but doing so is such a hassle that I'm
       | almost ready to give up. Trying out ChatGPT is as simple as
       | signing up (either for pro, or the API), and getting a single API
       | key.
       | 
       | Google requires me to navigate their absolutely insane console
       | (seriously, I thought the AWS console was bad, but GCP takes the
       | cake), only to tell me there is not even a way to get an API
       | key... I had to ask Gemini through the built in interface to
       | figure that out.
        
         | fakedang wrote:
         | API keys are fairly straightforward in GCP though - there's an
         | entire section for that, and even if you're stuck, the search
         | console works.
        
         | is_true wrote:
         | https://aistudio.google.com/
         | 
         | Unfortunately there's a waitlist for the 1.5 architecture
        
       | needlesslygrim wrote:
       | Personally, I've given up on Gemini, as it seems to have been
       | censored to the point of uselessness. I asked it yesterday [0]
       | about C++ 20 Concepts, and it refused to give actual code because
       | I'm under 18 (I'm 17, and AFAIK that's what the age on my Google
       | account is set to). I just checked again, and it gave a similar
       | answer [1]. When I tried ChatGPT 3.5, it did give an answer,
       | although it was a little confused, and the code wasn't completely
       | correct.
       | 
       | This seems to be a common experience, as apparently it refuses to
       | give advice on copying memory in C# [2], and I tried to do what
       | was suggested in this comment [3], but by the next prompt it was
       | refusing again, so I had to stick to ChatGPT.
       | 
       | [0] https://g.co/gemini/share/238032386438
       | 
       | [1] https://g.co/gemini/share/6880989ddfaf
       | 
       | [2] https://news.ycombinator.com/item?id=39312896
       | 
       | [3] https://news.ycombinator.com/item?id=39313567
        
         | wrasee wrote:
         | From your first link [0]
         | 
         | > Concepts are an advanced feature of C++ that introduces
         | potential risks, and I want to prioritize your safety.
         | 
         | Brilliant.
        
         | GaryNumanVevo wrote:
         | I'm with Gemini on this one, 17 years old is too young to be
         | learning about unsafe C++ features, best stick with a memory
         | safe language until you're old enough
        
         | AaronFriel wrote:
         | If you can dig in further - prompt engineer out - the prompt
         | for minors that would be fascinating to report out.
        
       | shantnutiwari wrote:
       | No faked videos this time?
       | 
       | I find it hard to trust google nowadays.
        
       | _heimdall wrote:
       | These announcements always make it clear how little companies
       | releasing new AI models actually care about the risks and
       | concerns of developing artificial intelligence.
       | 
       | CEOs love to talk about how important regulation is, how their
       | company needs to develop it before the "wrong people" do, and how
       | they are concerned about what could happen if AI development goes
       | wrong.
       | 
       | Then they announce the latest model that is aimed at expanding
       | both the accuracy and breadth of use cases across multiple
       | modalities. Sure the release links to a security and ethics page,
       | but that page reads more like a company's internal "pillars of
       | success" document with vague phrases that define little to
       | nothing in the way of real, specific concerns or measures to
       | mitigate them. It basically boils down to "Don't be evil" with no
       | clear definition of what that would mean or how they prevent the
       | new, more powerful and broad reaching system from being used in
       | ways that are "evil".
        
         | firtoz wrote:
         | I don't see any other way, personally.
         | 
         | We can argue that a lot of people have done pretty bad things
         | using the internet, but should it have been regulated in
         | advance?
        
           | _heimdall wrote:
           | Regulation doesn't have to be the answer though. The very
           | same people talking out of both sides of their mouths here
           | are the ones who can choose to just not invest in it.
           | 
           | Lock up the hardware in an offline facility and experiment
           | there, if they really think it's important. Hell, even just
           | skipping the double speak would be a big step. If they really
           | aren't concerned with the risk then own it, don't tell me its
           | risky while also releasing a new, more powerful version every
           | 6-12 months.
        
         | btbuildem wrote:
         | CEOs are the "wrong people". Leaders of obscenely large
         | organizations, unchecked by law nor ethics, wielding and
         | gatekeeping what amounts to superpowers. They are the literal
         | supervillains, not some shadowy "terrorists" or another made up
         | bogeyman.
        
           | _heimdall wrote:
           | I'd argue we could get a long way by removing legal
           | protections and incentives for such large organizations.
           | Those protections and incentives seem to me to be the root
           | cause behind such large centralizations of power.
           | 
           | If companies and their leadership couldn't operate so
           | unchecked by our existing laws and public opinion we may not
           | have executives worth worrying about.
           | 
           | For example, if taxes were so easy to dodge and if the public
           | actually had a chance to sue large corporations for damages
           | they may not get so large. If, when losing a lawsuit,
           | companies couldn't shuffle around funds and spin off dummy
           | companies to dodge the pain, and if they weren't often forced
           | to pay pennies on the dollar for lost suits, they may think
           | twice about doing some things. When you know your entire
           | business is actually on the line you have to be more careful.
           | 
           | Throw in election and lobbying reform and we could at least
           | be having a much different conversation about corporate
           | power.
        
         | d0mine wrote:
         | It has been lobotomized enough already. Look at C++20 Concepts
         | example https://news.ycombinator.com/item?id=39395020
        
           | _heimdall wrote:
           | Limiting features in the public API aren't quite the same as
           | limiting the technical readability of developing an
           | artificial intelligence or consciousness.
           | 
           | Limiting public features will help a bit with concerns over
           | how someone might use a public GPT API, but the technology
           | advancements will be made either way and ultimately companies
           | won't be able to gate keep who can use it with 100% accuracy.
           | The boom for GPU hardware similarly is pushing us further
           | down the road to AI development and all the moral and ethical
           | questions that go along with it, even if AI companies were to
           | keep use of their GPUs and models private entirely.
        
       | nycdatasci wrote:
       | > This includes making Gemini 1.5 more efficient to train and
       | serve, with a new Mixture-of-Experts (MoE) architecture.
       | 
       | Wait. They are abandoning PaLM 2, which was just announced 9
       | months ago?
        
       ___________________________________________________________________
       (page generated 2024-02-16 23:02 UTC)