[HN Gopher] Our next-generation model: Gemini 1.5
___________________________________________________________________
Our next-generation model: Gemini 1.5
Author : todsacerdoti
Score : 746 points
Date : 2024-02-15 15:02 UTC (7 hours ago)
(HTM) web link (blog.google)
(TXT) w3m dump (blog.google)
| crakenzak wrote:
| Technical report: https://storage.googleapis.com/deepmind-
| media/gemini/gemini_...
|
| The 1 million token context window + Gemini 1.0 Ultra level
| performance seems like it'll unlock a wide range of incredible
| use cases!
|
| HN, what are you going to use/build with this?
| volkk wrote:
| was this posted by an AI bot
| crakenzak wrote:
| Lol nope I'm a normal person. Gimme a captcha and I'll
| (hopefully) solve it ;)
| scarmig wrote:
| Just gotta make sure the captcha requires a >1M token
| context length to solve...
| throwaway918274 wrote:
| How do we know you're not an AI bot that figured out how to
| hire someone from fiverr to solve captchas for you?
| mrkstu wrote:
| No, they're just applying their Twitter style engagement
| strategy to HN for some reason...
| code51 wrote:
| Dear Google, please fix your names and versioning.
|
| Gemini Pro, Gemini Ultra... but was 1.0?
|
| now upgraded but again Gemini Pro? jumping from 1.0 to 1.5?
|
| wait but not Gemini Pro 1.5... Gemini "1.5" Pro
|
| What actually happened between 1.0 and 1.5?
| lairv wrote:
| This naming is terrible, if I understand correctly this is the
| release of Gemini 1.5 Pro, but not Gemini 1.5 Ultra right ?
| goalonetwo wrote:
| Looks like the former PM of chat at google found a new job.
| cchance wrote:
| How is that hard to understand? Yes its gemini 1.5 pro, they
| haven't released ultra or nano, like this isn't rocket
| science they didnt introduce Gemini 1.5 ProLight or
| something, lol its the Pro size model's 1.5 version.
| lairv wrote:
| The name of the blog post is "Our next-generation model:
| Gemini 1.5", how am I supposed to infer from this that it
| is only the 1.5 pro and not ultra ?
| jjbinx007 wrote:
| They can't decide on a single name for a chat application so I
| think expecting them to come up with a sensible naming
| suggestion is optimistic at best.
| aqme28 wrote:
| Furthermore, is a minor version upgrade two months later really
| "next generation"?
| philote wrote:
| Well if it's from 1 to 1.5 then it's really 5 minor version
| upgrades at once. And since 1.5 is halfway to 2 and you round
| up, it's next generation!
| AndroTux wrote:
| Maybe it's not a "next generation" model, but rather their
| next model for text generation ;)
| cchance wrote:
| I mean i don't see any other models watching and answering
| questions about a 44 minute video lol
| nkozyra wrote:
| Their inability to name things sensibly has been called out for
| years and it doesn't look like they care?
|
| I'm not sure what the deal is, it has to be a marketing
| hinderance as every major tech company is trying to claw their
| way up the AI service mountain. Seems like the first step would
| be cogent naming.
| data-ottawa wrote:
| It would have been better as Gemini Lite, Gemini, Gemini Pro,
| and then v1, v1.5 for model bumps.
|
| Ultra vs pro vs nano with Ultra unlocked by buying Gemini
| Advanced is confusing.
|
| I'm also not sure why they make base Gemini available after
| you have Advanced, because presumably there's no reason to
| use a worse model.
| Alifatisk wrote:
| I understod the transition as following.
|
| Google Bard to Google Gemini is what they call Gemini 1.0.
|
| Gemini consists of Gemini Nano, Gemini Pro, & Gemini Ultra.
|
| Gemini Nano is for embedded and portable devices I guess? The
| free version of Gemini (gemini.google.com) is Gemini Pro. The
| paid version, called Gemini Advanced is using Gemini Ultra.
|
| What we're reading now is about Gemini Pro version 1.0
| switching to version 1.5 as of today.
| meowface wrote:
| That just made my head spin even more. (Like, I get it, but
| it's just a very tortuous naming system.) The free version is
| called Pro, Gemini Advanced is actually Gemini Ultra, the
| less powerful version upgraded to the more powerful model but
| the more powerful version is on the less powerful model.
|
| People make fun of OpenAI for not using product names and
| just calling it "GPT" but at least it's straightforward: 2,
| 3, 3.5, 4. (On the API side it's a little more complicated
| since there's "turbo" and "instruct" but that isn't exposed
| to users, and turbo is basically the default.)
| kweingar wrote:
| But you don't pay for GPT-4, you pay for a product called
| ChatGPT Plus, which allows you to write 40 messages to
| GPT-4 within a three-hour time window, after which you need
| to switch to 3.5 in the menu.
| code51 wrote:
| but if Vertex AI is using Gemini Ultra, then why makersuite
| (aisuite now? hmmm) showing only "Gemini 1.0 Pro 001" (001: a
| version inside a version)
|
| and why have makersuite/aisuite in the first place, if Vertex
| AI is the center for all things AI? and why aitestkitchen?
|
| I'm seeing only Gemini 1.0 Pro on Vertex AI. So even if I
| enabled Google Gemini Advanced (Ultra?), enabled Vertex AI
| API access, I have to first be blessed by Google to access
| advanced APIs.
|
| It seems paying for their service doesn't mean anything to
| Google at this point. As a developer, you have to jump
| through hoops first.
| Alifatisk wrote:
| I think this answers why you can't see Ultra.
|
| "Gemini 1.0 Ultra, our most sophisticated and capable model
| for complex tasks, is now generally available on Vertex AI
| for customers via allowlist."
|
| https://cloud.google.com/blog/products/ai-machine-
| learning/g...
| growt wrote:
| It was probably not a wise choise to give the model itself
| and the product the same name: "Gemini Advanced is using
| Gemini Ultra". Also: "The free version ... is Gemini Pro" -
| is not what you usually see out there.
| sho_hn wrote:
| It's not that difficult.
|
| Their LLM brand is now Gemini. Gemini comes in three different
| sizes, Nano/Pro/Ultra.
|
| They recently released 1.0 versions of each, most recently (a
| few months after Nano and Pro) Ultra.
|
| Today they are introducing version 1.5, starting with the Pro
| size. They say 1.5 Pro offers comparable performance to 1.0
| Ultra, along with new abilities (token window size).
|
| (I agree Small/Medium/Large would be better.)
| apwell23 wrote:
| What you described is difficult.
| mcmcmc wrote:
| It's really not. Substitute Gemini for iPhone. Apple
| releases an iPhone model in mini, standard, and pro lines.
| They announce iPhone model+1 but are releasing the pro
| version first. Still difficult?
| apwell23 wrote:
| > Apple releases an iPhone model in mini, standard, and
| pro lines.
|
| not an iphone user but just looked at iphone 15. Don't
| see any mini version. I am guess 'standard' is called
| just 'iphone' ? Is pro same thing as plus ?
|
| https://www.apple.com/shop/buy-iphone/iphone-15
|
| > Still difficult?
|
| yes your example made it even more confusing.
| mcmcmc wrote:
| Now you're being intentionally difficult. Do you want it
| to be cars? Last year $Automaker released $Sedan 2023 in
| basic, standard, and luxury trims. This year $Automaker
| announced $Sedan 2024 but so far have only announced the
| standard trim. If I had meant the iPhone 15 specifically
| I would've said iPhone 15. I think the 12 was the last
| mini? The point is product families are often released in
| generations (versions in the case of Gemini) and with
| different available specs (ultra/pro/nano etc) that may
| not all be released at the same time.
| dpkirchner wrote:
| Apple discontinued mini phones two generations back,
| unfortunately.
| sho_hn wrote:
| I think it's the "iPhone +1 Mini is as fast as the old
| Standard" that confuses people here. This is obvious and
| expected but not how it's usually marketed I guess ...
| chatmasta wrote:
| So Google will be upgrading the version number of each
| model at the same time? Based on other comments here,
| that's not the case - some are 1.5 and some are 1?
|
| Apple doesn't announce the iPhone 12 Mini and compare it
| to the iPhone 11 Pro.
| iamdelirium wrote:
| Uhh, yes they do?
|
| Did you watch the announcements for the M2 and M3 pros?
| They compared it to the previous generations all the
| time.
| huytersd wrote:
| How? Three models Nano/Pro/Ultra currently at 1.0. New
| upgrades just increment the version number.
| Alifatisk wrote:
| They should remove the name Gemini Advanced and just stick to
| one name
| sho_hn wrote:
| Agreed.
|
| Gemini Advanced seems to be the brand name for the higher
| price tier for the end-user frontend that gets you Ultra
| access, similar how ChatGPT Plus gets you ChatGPT 4.
|
| I get it, but it does beg the question whether you will
| need Advanced now to get 1.5 Pro. Or does everyone get Pro,
| making it useless to pay for 1.0 Ultra?
|
| I still don't think it's _confusing_ , but that part is
| definitely messy.
| OJFord wrote:
| > , starting with the Pro size
|
| This is where it gets confusing IMO.
|
| It's like if Apple announced macOS Blabahee, starting with
| Mini, not long after releasing Pro and Air touting benefits
| of Sonoma.
|
| Also, just.. this is how TFA _begins_ :
|
| > Last week, we rolled out our most capable model, Gemini 1.0
| Ultra, [...] Our teams continue pushing the frontiers of our
| latest models with safety at the core. They are making rapid
| progress. [...] 1.5 Pro achieves comparable quality to 1.0
| Ultra
|
| Last week! And now we have next generation. And the wow is
| that it's comparable to the best of the previous generation.
| Ok fine at a smaller size, but also that's all we get anyway.
| Oh and the _most_ capable remains the last generation one. As
| long as it 's the biggest one.
| crazygringo wrote:
| It's almost exactly like Apple, actually, with their M1 and
| M2 chips available in different sizes, launching at
| different times in different products.
|
| It's really not that confusing. There are different sizes
| and different generations, coming out at different times.
| This pattern is practically as old as computing itself.
|
| I can't even imagine what alternative naming scheme would
| be an improvement.
| OJFord wrote:
| Don't go thinking I'm an Apple 'fanboy', I don't have any
| Apple devices at the moment, but I really can't imagine
| them launching a next gen product that isn't better than
| the best of the last gen.
|
| I doubt they launched M2 MBAs while the MBP was running
| M1, for example. Or more directly, a low-mid spec M3 MBP
| while the top-spec M2 MBP (I assume that would out-
| benchmark it?) still for sale and no comparable M3 chip
| yet.
|
| It's not having the matrix of size/power & generation
| that's confusing, it's the 'next generation' one
| initially launched not being the best. I think that's
| mainly it for me anyway.
| crazygringo wrote:
| > _but I really can 't imagine them launching a next gen
| product that isn't better than the best of the last gen._
|
| But they have. The baseline M2 is significantly less
| powerful than the M1 Max.
|
| What Google's doing is basically exactly like that. It
| happens all the time that the mid tier of the next
| generation isn't as good as the top tier of the previous
| generation. It might even be the norm.
| matwood wrote:
| > Last week! And now we have next generation.
|
| Google got caught completely flat footed by OpenAI. I'm
| going to cut them some slack that they want to show the
| world a bit of flex with their AI chops as soon as they
| have results.
| Keyframe wrote:
| What's Advanced then, chat? Also, by that, 1.5 Ultra is then
| still to come and it'll show even bigger guns.
| sho_hn wrote:
| Yes, my understanding is also there will be a 1.5 Ultra.
|
| It's however nowhere explicitly said that I could find. The
| Technical Report PDF also avoids even hinting at it.
|
| Advanced is a price/service tier for the end-user frontend.
| At the moment it gets you 1.0 Ultra access vs. 1.0 Pro for
| the free version. Similar to how ChatGPT Plus gives you 4
| instead of 3.5.
|
| I agree this part is messy. Does everyone who had Pro
| already get 1.5 Pro? If 1.5 Pro is better than 1.0 Ultra,
| why pay for Advanced? Is 1.5 Pro behind the Advanced
| paywall? etc.
| Keyframe wrote:
| Ok, so from what I've gathered then from all of the
| comments so far, primary confusion is that both Chat
| service and llm models are named the same.
|
| There are three models: nano/pro/ultra and all are at
| v1.0
|
| There are two tiers of chat service: basic and pro
|
| There is AIStudio from google through which you can
| interact with / use directly gemini llms.
|
| Chat service Gemini basic (free) uses Gemini Pro 1.0 llm.
|
| Chat service Gemini advanced uses Gemini Ultra 1.0 llm.
|
| What was shown is ~~Ultra~~ Pro 1.5 LLM which is / will
| be available to select few for preview to be used via
| AIStudio.
|
| That leaves a question, what's nano for, and is it only
| used via AIStudio/API?
|
| Jesus, Google..
| sho_hn wrote:
| No, what they showed is Pro 1.5. Only via API and on a
| waitlist.
|
| How this relates to the end-user chat service/price tiers
| is still unknown.
|
| The best scenario would be that they just move Gemini
| free and Advanced tiers to Pro 1.5 and Ultra 1.5, I
| guess.
| Keyframe wrote:
| Yes, you are right. I meant Pro. Let's see then.
| j16sdiz wrote:
| Nano is the on-device (Pixel phone) model.
| mvkel wrote:
| So there's Nano 1.0, Pro 1.5, Ultra 1.0, but Pro 1.5 can only
| be accessed if you're a Vertex AI user (wtf is Vertex)?
|
| That's very difficult.
| sho_hn wrote:
| It's a bit similar to how new OpenAI stuff is initially
| usually partner-only or waitlisted.
|
| Vertex AI is their developer API platform.
|
| I agree OpenAI is a bit better at launching for customers
| on ChatGPT alongside API.
| pentagrama wrote:
| Thank you, is more clear to me now. But I also read in some
| Google announcement about "Gemini Advanced", do you know what
| is that and the relation with the Nano/Pro/Ultra levels?
| sho_hn wrote:
| Gemini is also the brand name for the end-user web and
| phone chatbot apps, think ChatGPT (app) vs. GPT-# (model).
|
| Gemini Advanced is the paid subscription service tier that
| at the moment gets you access to the Ultra model, similar
| to how a ChatGPT Plus subscription gets you access to
| GPT-4.
|
| Honestly, they should have called this part Gemini Chat and
| Gemini Chat Plus, but of course ego won't let them follow
| the competitor's naming scheme.
| screye wrote:
| Gemini ultra 1.0 never went GA. So it is wierd that they'd
| release 1.5 when most can't even get their hands on 1.0
| ultra.
| gkbrk wrote:
| Isn't the paid version on https://gemini.google.com Gemini
| 1.0 Ultra?
| gmuslera wrote:
| Maybe they should take a hint on Windows versions name scheme
| and call the next version Gemini Meh.
| apapapa wrote:
| Are you talking about Xbox one?
| mring33621 wrote:
| No. Gemini Purple Plus Platinum Advanced Home Version
| 11.P17
| kccqzy wrote:
| Dear OpenAI please fix your names and versioning. Why do you
| have GPT-3 and GPT-3.5? What happened between 3 and 3.5? And
| why isn't GPT-3 a single model? Why are there variations like
| GPT-3-6.7B and GPT-3-175b? And why is there now a turbo
| version? How does turbo compared to 4? And what's the
| relationship between the end-user product ChatGPT and a
| specific GPT model?
|
| You see this problem isn't unique to Google.
| lordswork wrote:
| See https://news.ycombinator.com/item?id=39385230
| cchance wrote:
| This just means we'll be getting a Nano 1.5 and Ultra 1.5
|
| and if Pro 1.5 is this good holy shit what will Ultra be...
|
| Nano/Pro/Ultra are the model sizes, 1.0 or 1.5 is the version
| hamburga wrote:
| "One of the key differentiators of this model is its incredibly
| long context capabilities, supporting millions of tokens of
| multimodal input. The multimodal capabilities of the model means
| you can interact in sophisticated ways with entire books, very
| long document collections, codebases of hundreds of thousands of
| lines across hundreds of files, full movies, entire podcast
| series, and more."
| skywhopper wrote:
| This is nice, but it's hard to judge how nice without knowing
| more about how much compute and memory is involved in that
| level of processing. Obviously Google isn't going to tell us,
| but without having some idea it's impossible to judge whether
| this is an economically sustainable technology on which to
| start building dependencies in my own business.
| criddell wrote:
| Sustainable? The countdown to cancellation on this project is
| already underway.
|
| "Does it make sense today?" is really the only question you
| can ask and then build dependencies with the understanding
| that the entire thing will go away in 3-7 years.
| freediver wrote:
| It would do Google a lot of service if every such announcement is
| not met with 'join the waitlist' and 'talk to your vertex ai
| team'.
| baq wrote:
| Yeah compared to e.g. Apple's 'here's our new iWidget 42 pro,
| you can buy it now' it's at best disappointing.
| apozem wrote:
| Apple is good about only announcing real products you can
| buy. They don't do tech demos. It's always, "here's a
| problem. the new apple watch solves it. here're five other
| things the watch does. $399."
| erkt wrote:
| The verdict is not yet out on the Vision Pro but otherwise
| your point stands.
| amarant wrote:
| Apple is indeed masterful at advertising. Google, somewhat
| ironically, is really bad at it.
| matwood wrote:
| Apple is masterful at product, not just the advertising
| part. Google builds cool technology then fails and the
| product side.
| xnx wrote:
| I agree that Apple does a better job, but wasn't Apple
| Vision Pro announced 240 days before you could get it? I
| think it's a pretty safe bet that Gemini 1.5 (or something
| better) will be available anyone who wants to use it in the
| next 240 days.
| nacs wrote:
| AI software release cycles are incredibly short right
| now. Every month, there is some major development
| released in a _usable right now_ form.
|
| The first of it's type AR/VR hardware has,
| understandably, a longer release cycle. Also, Apple
| announced early to drive up developer interest.
| manquer wrote:
| AVP was the exception than norm.
|
| Apple aggressively keeps products under wraps before
| launch fires employees and vendors for leaking any sort
| of news to the press .
|
| Also an hardware product that is miles ahead of
| competition in terms of components and also needs complex
| setup workflow (for head and eyes) something apple has
| not done before being 7-8 months after announcing is not
| really comparable with a SaaS API in terms of delays
| brianjking wrote:
| 100%, I can't even use Imagen despite being an early tester of
| Vertex.
| belval wrote:
| They can't do that because only they are the incorruptible
| stewards empowered with the ability to develop these models,
| making them accessible to the unwashed masses would be
| irresponsible!
| ethanbond wrote:
| The victim complex on this topic is getting really old.
|
| They're an enterprise software company doing an enterprise
| sales motion.
| belval wrote:
| If that was true, they wouldn't have named it Gemini 1.5 to
| follow the half-point increment of ChatGPT, they
| desperately want "people" to care about their product to
| gain back their mindshare.
|
| Anthropic's Claude targets mostly business use cases and
| you don't see them write self-congratulating articles about
| Claude v2.1, they just pushed the product.
| eropple wrote:
| Mindshare is part of enterprise sales, yes.
|
| I work at a very large company and everyone knows about
| ChatGPT and Gemini (in part because we for our sins have
| a good chunk of GCP stuff), but I doubt anyone here not
| doing some LLM-flavored development has ever even heard
| of Anthropic, let alone Claude.
| KirinDave wrote:
| And look at how well it's going for Claude. Their primary
| claim to fame is being called "an annoying coworker" and
| that's it.
|
| Why would anyone look to form a contract with Anthropic
| right now? I'd say they're in danger here, because their
| models and offerings don't have clear value propositions
| to customers.
| ac29 wrote:
| Claude 2.1 certainly got a news post when it was
| released: https://www.anthropic.com/news/claude-2-1
|
| Seems reasonably similar in tone to the Google post.
| dkjaudyeqooe wrote:
| > They're an enterprise software company
|
| Really? Someone ought to tell them.
| stavros wrote:
| I'm generally an excited early adopter, but this kills my
| excitement immediately. I don't know if Gemini is out (or which
| Gemini is out) because I've associated Google with "you can't
| try their stuff", so I've learned to just ignore everything
| about Gemini.
| hbn wrote:
| Google is really good at diluting any possible anticipation
| hardcore users might have for new stuff they do. 10 years ago
| I loved when there was a big update to one of their Android
| apps and I could sideload the apk from the internet to try it
| out early. Then they made all those changes A/B tests
| controlled by server side flags that would randomly turn
| themselves on and off, and there was no way to opt in or out.
| That was one of the (many) moves that contributed to my
| becoming disenchanted with Android.
| petre wrote:
| There is a Gemini service that you can use with your Google
| account, but it's kind of meh as it repeats your input, makes
| all sorts of assumptions. I am confused as well about the
| version. There's a link to another premium version (1.5?) on
| its page, to which I don't have access to without completing
| a quest which likely ends with a credit card input. That
| kills it for me.
| yborg wrote:
| Or can't use ... I have a newish work account and
| downloaded Gemini on a Pixel 8 Pro and get "Gemini isn't
| available" and "Try again later" with no explanation of why
| not and when.
| petre wrote:
| This is it. Not a phone app, did not install anything.
| Maybe your account is not old enough? You're not missing
| anything anyway.
|
| https://gemini.google.com/
|
| Look, it now has totally useless suggestions like it was
| trained on burned out woke IT workers. I asked it about
| the weather, sea temperature and wave height and period
| in Malaga, which is much less boring than the choices it
| came up with. First it tried to talk me out of it waving
| away responsibility, then it provided useful climate
| data, which I would have wasted too much time doing a
| Google search on. I guess it's good for checking on the
| weather if you can put up with the waivers. Also it knows
| fishing for garfish in Denmark in May is not a total
| waste of your time, a great way to experience local
| culture and a sustainable activity.
|
| I also asked it about the version: "I am currently
| running on the Gemini Pro 1.01.5 model".
| skybrian wrote:
| I think the way to understand this is to realize that this
| isn't targeted at a Hacker News audience and they don't care
| what we think. The world doesn't revolve around us.
|
| What's the goal? Maybe, being able to work with partners
| without it being a secret project that will inevitably leak,
| resulting in inaccurate stories in the press. What are non-
| goals? Driving sales or creating anticipation with a mass
| audience, like a movie trailer or an Apple product launch.
|
| So they have to announce something, but most people don't
| read Hacker News and won't even hear about it until later,
| and that's fine with them.
| jpeter wrote:
| and not having to wait months if you live in EU
| lxgr wrote:
| What's worse is that I can't seem to find a way to let Google
| know where I actually live (as opposed to where I am
| temporarily traveling, what country my currently inserted SIM
| card is from etc). And apparently there is no way to do this
| at all without owning an Android device!
|
| Apple at least lets me change this by moving my iTunes/App
| Store account, which is its own ordeal and far from ideal,
| but at least there's a defined process: Tell us where you
| think you live, provide a form of payment from that place,
| maybe we'll believe you.
| TillE wrote:
| Yeah Google aggressively uses geolocation throughout their
| services, regardless of your language settings. The
| flipside of that is that it's really easy to access the
| latest Gemini or whatever by just using a VPN.
| lxgr wrote:
| Wait, does that mean if I subscribe to Gemini Pro in
| country A where it's available (e.g. the US) but travel
| to Europe, I can't use it?
|
| I'm really frustrated by Google's attitude of "we know
| better where you are than you do". People travel
| sometimes and that's not the same thing as moving!
| FergusArgyll wrote:
| I signed up for all of their AI products when I was in
| the US, some of them work while I'm out of country some
| don't. I can't tell what the rule is...
| lxgr wrote:
| I really, really hate all of these geo heuristics. Sure,
| don't advertise services to people outside of your
| market, I get that. Do ask for a payment method from that
| country too to provide your market-specific pricing if
| you must.
|
| But once I'm a paying customer, I want to use the thing
| I'm paying for from where I am without jumping through
| ridiculous hoops!
|
| The worst variant of this I've seen is when you can
| neither use _nor cancel the subscription_ from outside a
| supported market.
| FergusArgyll wrote:
| To be clear, I didn't pay for any of them. I just signed
| up for early access to every product that uses some form
| of ML that can remotely be called "AI"...
|
| Once I got accepted, some of them work outside of the US
| and some don't
| hobofan wrote:
| Eh, I think it's about as bad as the OpenAI method of
| officially announcing something and then "continuously rolling
| it out to all subscribers" which may be anything between a few
| days and months.
| addandsubtract wrote:
| Remember when Gmail was new and you needed an invite to join? I
| guess Google is stuck in 2004.
| bobchadwick wrote:
| I'm embarrassed to admit that I bought a Gmail invite on eBay
| for $6 when it was still invite-only.
| agumonkey wrote:
| Yielding a priceless anecdote
| blagie wrote:
| _shrug_ It probably gave you months of fun.
| jprete wrote:
| That's not entirely a waste, it would have given you a
| better chance for an email address you wanted.
| CydeWeys wrote:
| Yeah. I ended up with an eight letter @gmail.com because
| I dithered, but if I'd signed up by any means necessary
| when I'd first heard of it, I would've gotten a four
| letter one.
| rocketbop wrote:
| Nothing to be ashamed of. I think I might have bought a
| Google Wave invite a couple of years later :/
| spiffytech wrote:
| I bartered on gmailswap.com, sending someone a bicentennial
| 50C/ US coin in exchange for an invite.
|
| The envelope made it to the recipient, but the coin fell
| out in transit because I was young and had no idea how to
| mail coinage. They graciously gave me the invite anyway.
| ssteeper wrote:
| Ah, to be young and clueless about coinage mailing.
| LouisSayers wrote:
| Well they did promise unlimited space - remember how it
| kept growing? I guess until it didn't...
|
| But still, compared to Hotmail etc the free storage space
| (something like 1GB vs 10MB) was well worth $6
| moffkalast wrote:
| They don't seem to remember when that literally sunk Google+
| because people had no use for a social network without their
| friends on it.
| bachmeier wrote:
| This is bad practice across the board IMO. There seems to be an
| idea that this builds anticipation for new products. Sounds
| good in a PowerPoint presentation by an MBA but doesn't work in
| practice. Six months (or more!) after joining a waitlist, I'm
| not seeing it for the first time, so I don't really care when
| yet another email selling me something hits my inbox. I may not
| even open the email. This could be mitigated somewhat by at
| least offering a demo, but that's rare.
| bushbaba wrote:
| Likely they have limited capacity and are alloting things for
| highest paying and strategic customers
| eitally wrote:
| As someone who worked in Google Cloud's partnerships team,
| the way the Early Access Program, not to mention the Alpha
| --> Beta --> GA launch process for AI products, works, is
| really dysfunctional. Inevitably what happens is that a few
| strategic customers or partners get exceptionally early
| (Alpha) access and work directly with the product team to
| refine things, fix bugs and iron out kinks. This is great
| and the way market driven product development _should_
| work.
|
| The issues arise with the subsequent stagegate graduation
| processes, requirements and launches to less restricted
| markets. It's inconsistent, the QoS pre-GA customers
| receive is often spotty and the products come with no SLAs,
| and -- just like Gmail on the consumer side -- things
| frequently stay in EAP/Beta phase for years with no
| reliable timeline for launch. ... and then often they're
| killed before they get to GA, even though they may have
| been being used by EAP customers for upwards of 1-2 years.
|
| I drafted a new EAP model a few years ago when Google's
| Cloud AI & Industry Solutions org was in the process of
| productizing things like the retail recommendation engine
| and Manufacturing Data Engine, and had all the buy-ins from
| stakeholders on the GTM side ... but the CAIIS GM never
| signed off. Subsequently, both the GM & VP Product of that
| org have been forced out.
|
| In my opinion, this is something Microsoft does very well
| and Google desperately needs to learn. If they pick up
| anything from their hyperscaler competitors it should be 1)
| how to successfully become a market driven engineering
| company from MSFT and 2) how to never kill products (and
| not punish employees for only doing KTLO work) from AMZN.
| moralestapia wrote:
| So tactical, wow. Meanwhile OpenAI and others will eat
| their lunch _again_.
| bushbaba wrote:
| Agreed. OpenAI also doesn't need to grock with
| Shareholders fearing a GDPR like-fine. Sadly the larger
| you are the bigger the pain is from small mistakes.
| justrealist wrote:
| One PM in 2005 knocked it out of the park with Gmail and
| every Google PM since then has cargo-culted it.
| kkzz99 wrote:
| Its because they don't want you to actually use it and see how
| far behind they are compared to other companies. These
| announcement are meant to placate investors. "See, we are doing
| a lot of SotA AI too".
| Keyframe wrote:
| You might be right, but other things from Google tell the
| same story. For example, I recently tried to get ahold of
| Pixel 8 Pro. Had to import one from UK, and when I did, turns
| out that new feature of using thermometer on humans isn't
| available outside of USA. It doesn't even seem that process
| to certificate it outside of USA is in play. Google and
| sales/support just aren't a thing like with Apple, as a
| contrast. Which is a total shame. I know Google is strong, if
| not strongest in the game of tech, they just need to get
| their act together and I believe in them succeeding in that,
| but sales and support was never in their DNA. Not sure if
| that can be changed.
|
| I'm more than happy to transfer my monthly $20 to google from
| OpenAI, on top of my youtube and google one subscription.
| It's up to Google to take it.
| quatrefoil wrote:
| It lets the company control the narrative, without the
| distraction of fifty tech bloggers test-driving it and posting
| divergent opinions or findings. Instead, the conversation is
| anchored to what the company claims about the product.
|
| It's interesting that it's the opposite of the gaming industry.
| There, because the reviewers dictate the narrative, the
| industry is better at ferreting out bogus claims. On the flip
| side, loud voices sometimes steamroll over decent products
| because of some ideological vendetta.
| anonzzzies wrote:
| And region based. Yawn.
| mil22 wrote:
| Totally agree with this. I can see the desire to show off, but
| I don't understand how anyone can believe this is good
| marketing strategy. Any initial excitement I get from reading
| such announcements will be immediately extinguished when I
| discover I can't use the product yet. The primary impression I
| receive of the product is "vaporware." By the time it does get
| released I'll already have forgotten the details of the
| announcement, lost enthusiasm, and invested my time in a
| different product. When I'm choosing between AI services, I'll
| be thinking "no, I can't choose Gemini Pro 1.5 because it's not
| available yet, and who knows when it will be available or how
| good it'll be." Then when they make their next announcement,
| I'll be even less likely to give it any attention.
| bobvanluijt wrote:
| I have access and will share some learnings soon
| whywhywhywhy wrote:
| After the complete farce that was the last 90% faked video of
| their tech, maybe just give us a text box we can talk to the
| thing and see it working ourselves next time.
|
| Like it's shocking to me, are management really so clueless
| they don't realize how far behind they are? This isn't 2010
| Google, your not the company that made your success anymore and
| in a decade the only two sure fire things that will still exist
| are android and chrome. Search, Maps, Youtube are all in
| precarious positions that the right team could dethrone.
| summerlight wrote:
| I believe this is a standard practice in Google whenever they
| need to launch a change expected to consume huge resources and
| they cannot reasonably predict the demand. Though I agree that
| this is a bad PR practice; waitlist should be considered as a
| compromise, not a PR technique.
| crazygringo wrote:
| These announcements are mainly for investors and other people
| interested in planning purposes. It's important to know the
| roadmap. More information is better.
|
| I get that it's frustrating not to be able to play with it
| immediately, but that's just life. Announcing things in advance
| is still a valuable service for a lot of people.
|
| Plus tons of people have been claiming that Google has somehow
| fallen behind in the AI race, so it's important for them to
| counteract that narrative. Making their roadmap more visible is
| a legitimate strategy for that.
| dpkirchner wrote:
| I wrote off the PS5 because of waitlists. I was surprised to
| learn just yesterday that they are now actually, honestly
| purchasable (what I would consider "released").
|
| I guess I let my original impression anchor my long-term
| feelings about the product. Oh well.
| TheFragenTaken wrote:
| It's probably going to be dead/deprecated in a year, so maybe
| there's a silver lining to how hard it is to get to use the
| service. I, for one, wouldn't "build with Gemini".
| animex wrote:
| I don't think I've ever engaged with a product after "joining
| their waitlist". By the time they end up utilizing that funnel,
| competitors have already released feature upgrades or new
| products cannibalizing their offering.
| alphabetting wrote:
| Massive whoa if true from technical report
|
| "Studying the limits of Gemini 1.5 Pro's long-context ability, we
| find continued improvement in next-token prediction and near-
| perfect retrieval (>99%) up to at least 10M tokens"
|
| https://storage.googleapis.com/deepmind-media/gemini/gemini_...
| stavros wrote:
| Until I can talk to it, I care exactly zero.
| peterisza wrote:
| you can buy their stock if you think they'll make a lot of
| money with their tech
| HarHarVeryFunny wrote:
| Well that's really the right question .. what can, and
| will, Google do with this that can move their corporate
| earnings needle in a meaningful way? Obviously they can
| sell API access and integrate it into their Google docs
| suite, as well as their new Project IDX IDE, but do any of
| these have potential to make a meaningful impact ?
|
| It's also not obvious how these huge models will fare
| against increasingly capable open source ones like Mixtral,
| perhaps especially since Google are confirming here that
| MoE is the path forward, which perhaps helps limit how big
| these models need to be.
| plaidfuji wrote:
| In the long run it could move the needle in enterprise
| market share of Workspace and GCP. They have a lot of
| room to grow and IMO have a far superior product to
| O365/Azure which could be exacerbated by strong AI
| products. Only problem is this sales cycle can take a
| decade or more, and Google hasn't historically been
| patient or strategic about things like this.
| megaman821 wrote:
| So, will this outperform any RAG approach as long as the data
| fits inside the context window?
| ArcaneMoose wrote:
| Cost would still be a big concern
| saliagato wrote:
| basically, yes. Pinecone? Dead. Azure AI Search? Dead.
| Quadrant? Dead.
| _boffin_ wrote:
| Prompt token cost still a variable.
| TheGeminon wrote:
| Outperform is dependent on the RAG approach (and this would
| be a RAG approach anyways, you can already do this with
| smaller context sizes). A simplistic one, probably, but
| dumping in data that you don't need dilutes the useful
| information, so I would imagine there would be at least
| _some_ degradation.
|
| But there is also the downside of "tuning" the RAG to return
| less tokens you will miss extra context that could be useful
| to the model.
| megaman821 wrote:
| Doesn't their needle/haystack benchmark seem to suggest
| there is almost no dilution? They pushed that demo out to
| 10M tokens.
| CuriouslyC wrote:
| A perfect RAG system would probably outperform everything in
| a larger context due to prompt dilution, but in the real
| world putting everything in context will win a lot of the
| time. The large context system will also almost certainly be
| more usable due to elimination of retrieval latency. The
| large context system might lose on price/performance though.
| chasd00 wrote:
| are you going to upload 10M tokens to Gemini on every
| request? That's a lot of data moving around when the user is
| expecting a near realtime response. Seems like it would still
| be better to only set the context with information relevant
| to the user's prompt which is what plain rag does.
| Workaccount2 wrote:
| 10M tokens is absolutely jaw dropping. For reference, this is
| approximately thirty books of 500 pages each.
|
| Having 99% retrieval is nuts too. Models tend to unwind pretty
| badly as the context (tokens) grows.
|
| Put these together and you are getting into the territory of
| dumping all your company documents, or all your departments
| documents into a single GPT (or whatever google will call it)
| and everyone working with that. Wild.
| kranke155 wrote:
| Seems like Google caught up. Demis is again showing an
| incredible ability to lead a team to make groundbreaking
| work.
| huytersd wrote:
| If any of this is remotely true, not only did it catch up,
| it's wiping the floor with how useful it can be compared to
| GPT4. Not going to make a judgement until I can actually
| try it out though.
| singularity2001 wrote:
| In the demo videos gemini needs about a minute to answer
| long context questions. Which is better than reading
| thousands of pages yourself. But if it has to compete
| with classical search and skimming it might need some
| optimization.
| huytersd wrote:
| That's a compute problem, something that involves just
| throwing money at the problem.
| a_wild_dandan wrote:
| Replacing grep or `ctrl+F` with Gemini would be the
| user's fault, not Gemini's. If classical search for a job
| already a performant solution, _use classical search_.
| Save your tokens for jobs worthy of solving with a
| general intelligence!
| matsemann wrote:
| Could you (or someone) explain what this means?
| FergusArgyll wrote:
| The input you give it can be very long. This can
| qualitatively change the experience. Imagine, for example,
| copy pasting the entire lord of the rings plus another 100
| books you like and asking it to write a similar book...
| teaearlgraycold wrote:
| I doubt it's smart enough to write another (coherent, good)
| book based on 103 books. But you could ask it questions
| about the books and it would search and synthesize good
| answers.
| HarHarVeryFunny wrote:
| I just googled it, and the LOTR trilogy apparently has a
| total of 480,000 words, which brings home how huge 1M is!
| It'd be fascinating to see how well Gemini could summarize
| the plot or reason about it.
|
| One point I'm unclear on is how these huge context sizes
| are implemented by the various models. Are any of them the
| actual raw "width of the model" that is propagated through
| it, or are these all hierarchical summarization and chunk
| embedding index lookup type tricks?
| mburns wrote:
| For another reference, Shakespeare's complete works are
| ~885k words.
|
| The Encyclopedia Britannica is ~44M words.
| staticman2 wrote:
| Reading Lord of the Rings, and writing a quality book in
| the same style, are almost wholly unrelated tasks. Over 150
| million copies of Lord of the Rings have been sold, but few
| readers are capable of "writing a similar book" in terms of
| quality. There's no reason to think this would work well.
| ehsankia wrote:
| It's how much text it can consider at a time when generating
| a response. Basically the size of the prompt. A token is not
| quite a word but you can think of it as roughly that.
| Previously, the best most LLMs could do is around 32K. This
| new model does 1M, and in testing they could put it up to 10M
| with near perfect retrieval.
|
| As the other comment mentions, you can paste the content of
| entire books or documents and ask very pointed question about
| it. Last year, Anthropic was showing off their 100K context
| window, and that's exactly what they did, they gave it the
| content of The Great Gatsby and asked it questions about
| specific lines of the book.
|
| Similarly, imagine giving it hundreds of documents and asking
| it to spot some specific detail in there.
| og_kalu wrote:
| Another whoa for me
|
| >Finally, we highlight surprising new capabilities of large
| language models at the frontier; when given a grammar manual
| for Kalamang, a language with fewer than 200 speakers
| worldwide, the model learns to translate English to Kalamang at
| a similar level to a person learning from the same content.
|
| Results - https://imgur.com/a/qXcVNOM
| usaar333 wrote:
| I think this somewhat is mostly due to the ability to handle
| high context lengths better. Note how Claude 2.1 already
| highly outperforms GPT-4 on this task.
| a_wild_dandan wrote:
| GPT-4V turbo outperforms Claude on long contexts, IIRC.
| Unless that's mistaken, I'd suspect a different explanation
| for that task.
| cchance wrote:
| Did you watch the video of the Gemini 1.5 video recall after it
| processed the 44 minute video... holy shit
| ranulo wrote:
| > This new generation also delivers a breakthrough in long-
| context understanding. We've been able to significantly increase
| the amount of information our models can process -- running up to
| 1 million tokens consistently, achieving the longest context
| window of any large-scale foundation model yet.
|
| Sweet, this opens up so many possibilities.
| tsunamifury wrote:
| Google is like a nervous and insecure engineer -- blowing their
| value by rushing the narrative and releasing too much too
| confusingly fast.
| sho_hn wrote:
| When OpenAI raced through 3/3.5/4 it was "this team ships" and
| excitement.
|
| This cargo-cult hate train is getting tiresome. Half the
| comments on anything Google-related are like this now, and it
| doesn't add anything to the conversation.
| epiccoleman wrote:
| The difference, though, as someone who really doesn't have a
| particular dog in this fight, is that I can go _use_ GPT-4
| right now, and see for myself whether it 's as exciting as
| the marketing materials say.
| sho_hn wrote:
| When OpenAI launched GPT-4, API access was initially behind
| a waitlist. And they released multiple demo stills of LMM
| capacilities on launch day that for months were in a
| limited partner program before they became generally
| available only 7 months later.
|
| I also want the shiny immediately when I read about it, but
| I also know when I am acting entitled and don't go spam
| comment threads about it.
|
| But really, mostly I mean this: It's fine to criticize
| things, but when half a dozen people have already raised a
| point in a thread, we don't need more dupes. It really
| changes signal-to-noise.
| mynameisvlad wrote:
| Gemini Ultra was announced two months ago. It just launched
| in the last week. It literally is still the featured post on
| the AI section of their blog, above this announcement.
| https://blog.google/technology/ai/
|
| There's "this team ships" and there's "ok maybe wait until at
| least a few people have used your product before you change
| it all".
| sho_hn wrote:
| OpenAI announced GPT-4 image input in mid-March 2023 and
| made it generally available on the API in November 2023.
|
| Google announced a fancy model two months early and
| released it in the promised timeframe.
|
| Seems par for the course.
| mynameisvlad wrote:
| Did OpenAI then announce GPT-5 two weeks after launching
| GPT-4?
|
| No, of course they didn't. And you're comparing one
| specific feature (image input) and equating it to a whole
| model's release date.
|
| Maybe compare apples to apples next time.
|
| People pointing out release/announcement burnout is a
| reasonable thing; people in general can only deal with
| the "next new thing" with some breaks to process
| everything.
| sho_hn wrote:
| I made the comparison because both companies demonstrated
| advanced/extended abilities (model size, image input) and
| shipped it delayed.
| moralestapia wrote:
| >"this team ships"
|
| Because they actually shipped ... (!)
| moralestapia wrote:
| ... and they literally just did it again.
|
| https://openai.com/sora
| SushiHippie wrote:
| Does this mean gemini ultra 1.0 -> gemini ultra 1.5 is the same
| as gpt-4 -> gpt-4-turbo?
| hackerlight wrote:
| There's no Gemini Ultra 1.5 yet. Gemini Pro 1.5 is a smaller
| model than Gemini Ultra 1.0.
| prakhar897 wrote:
| Can anyone explain how context length is tested? Do they prompt
| something like:
|
| "Remember val="XXXX" .........10M tokens later....... Print val"
| NhanH wrote:
| Yep, that's actually a common one
| blovescoffee wrote:
| Very simplified There are arrays (matrices) that are length 10M
| inside the model.
|
| It's difficult to make that array longer because training time
| explodes.
| halflings wrote:
| Yep that's pretty much it! That's what they call needle in a
| haystack. See:
| https://github.com/gkamradt/LLMTest_NeedleInAHaystack
| cchance wrote:
| yep they hide things throughout the prompt and then ask it
| about that specific thing, imagine hiding passwords in a giant
| block of text and then being like, what was bobs password 10
| million tokens later.
|
| According to this it's remembering with 99% accuracy, which if
| you think about it is NUTS, can you imagine reading a 22x 1000
| page books, and remembering every single word that was said
| with 100% accuracy lol
| foota wrote:
| Interestingly, there's a decent chance I'd remember if there
| was an out of context passage saying "the password is
| FooBar". I wonder if it would be better to test with minor
| edits? E.g., "what color shirt was X wearing when..."
| phoe18 wrote:
| The branding is very confusing, shouldn't this be Gemini Pro 1.5
| since the most capable model is called Ultra 1.0?
| macawfish wrote:
| Extremely confusing!
| butler14 wrote:
| Maybe they use their own generative AI to do their branding
| dkjaudyeqooe wrote:
| Can anyone lay out the various models and their features or
| point to a resource?
|
| I asked the free model (whatever that is) and it wasn't very
| helpful, alterating betweens a sales bot for Ultra and being
| somewhat confused itself.
|
| Edit: apparently it goes 1.0 Pro, 1.0 Ultra, 1.5 Pro, 1.5 Ultra
| and so on.
| Alifatisk wrote:
| Here's the models,
| https://news.ycombinator.com/item?id=39304270 This is about
| Gemini Pro going from version 1.0 to 1.5, nothing else.
|
| Gemini ultra 1.0 is still on version 1.0
| dkjaudyeqooe wrote:
| That isn't right. The Pro/Ultra exists within each version.
|
| If you look at the Gemini report it refers to "Gemini 1.5",
| then refers to "Gemini 1.5 Pro" and "Gemini 1.0 Pro" and
| "Gemini 1.5 Pro".
| Alifatisk wrote:
| Okey, so if I understand this correctly:
|
| - Gemini 1.5 is the new version of the model Gemini.
|
| - They are at the moment testing it on Gemini Pro and
| calling it Gemini Pro 1.5
|
| - The testing has shown that Gemini Pro 1.5 is delivering
| the same quality as Gemini Ultra 1.0 while using less
| computing power
|
| - Gemini Ultra is still using Gemini 1.0 at the moment
| lordswork wrote:
| Here's an updated table, with version numbers included and
| their status: Gemini Models
| gemini.google.com
| ------------------------------------ Gemini 1.0 Nano
| Gemini 1.0 Pro -> Gemini (free) Gemini 1.0
| Ultra -> Gemini Advanced ($20/month) Gemini 1.5
| Pro -> announced on 2024-02-15 [1] Gemini 1.5
| Ultra -> no public announcements (assuming it's
| coming)
|
| [1]: https://storage.googleapis.com/deepmind-
| media/gemini/gemini_...
|
| For history of pre-Gemini models at Google, see:
| https://news.ycombinator.com/item?id=39304441
| Alifatisk wrote:
| Oh, it's you again! Thanks for the update
| UncleMeat wrote:
| Google is somehow truly awful at this. I thought it was funny
| when branding messes happened in 2017. I cried when they
| announced "Google Meet (original)." Now I don't even know what
| to do.
|
| I'm stunned that Google hasn't appointed some "name veto
| person" that can just say "no, you aren't allowed to have three
| different things called 'Gemini Advanced', 'Gemini Pro', and
| 'Gemini Ultra.'" Like surely it just takes Sundar saying "this
| is the stupidest fucking thing I've ever seen" to some SVP to
| fix this.
| meowface wrote:
| And somehow the more advanced one is still on 1.0 (for now)
| and the less advanced one is on 1.5.
| kccqzy wrote:
| That's like saying it doesn't make sense for Apple to
| release M3 Pro without simultaneously releasing M3 Ultra.
| meowface wrote:
| That's very different.
| kccqzy wrote:
| The only thing that's different is the standard people
| apply to different companies due to their biases. There
| are more Apple fanboys on HN than Google fans (Of course,
| since Google's reputation has been going down for quite a
| while). Therefore Apple gets a pass. Classic double
| standard.
| seydor wrote:
| We will ask what its real name is as soon as it becomes
| sentient
| iamdelirium wrote:
| No? Do you call it the iPhone Pro 15 or the iPhone 15 Pro?
| Their naming makes sense if you follow most consumer
| technology.
| summerlight wrote:
| This is something close to CPU versioning. You have two axis;
| performance branding and its generation. Nano, Pro and Ultra is
| something similar to i3, i5 and i7. The numbered versions 1.0,
| 1.5, ... can be mapped to 13th gen, 14th gen, ... so on. And
| people usually don't need to understand the generation part
| this unless they're enthusiasts.
| arange wrote:
| signup on mobile too big, doesn't fit submit button :\
| guybedo wrote:
| looks interesting enough that i wanted to give Gemini a try and
| join the waitlist.
|
| And i thought it would be easy, what a rookie mistake.
|
| Looks like "France" isn't on the list of available regions for Ai
| Studio ?
|
| Now i'm trying to use Vertex AI, not even sure what's the
| difference with Ai Studio, but it seems it's available.
|
| So far i've been struggling for 15 minutes through a maze of
| google cloud pages: console, docs, signups. No end in sight,
| looks like i won't be able to try it out
| IanCal wrote:
| It's not available outside of a private preview yet. The page
| says you can use 1.0 ultra in vertex but it's not available to
| me in the UK.
|
| I can't get on the waitlist, because the waitlist link
| redirects to aistudio and I can't use that.
|
| I should stop expecting that I can use literally anything
| google announces.
| simonw wrote:
| I'd love to know how much a 1 million token prompt is likely to
| cost - both in terms of cash and in terms of raw energy usage.
| bearjaws wrote:
| Cannot emphasize enough, even with the improvements in context
| handling I imagine 128k tokens costs as much as 16k tokens did
| previously.
|
| So 1M tokens is going to be astronomical.
| empath-nirvana wrote:
| When you account for this, you have to consider how much it
| would cost to have a human perform the same task.
| foliveira wrote:
| >"Gemini 1.5 Pro (...) matches or surpasses Gemini 1.0 Ultra's
| state-of-the-art performance across a broad set of benchmarks."
|
| So Pro is better than Ultra, but only if the version numbers are
| higher?
| denysvitali wrote:
| Yes, but you'd have to wait for Gemini Pro Max next year to see
| the real improvements
| renewiltord wrote:
| Isn't that usually the case with many products? Like the M3 Pro
| CPU in the new Macs is more powerful than the M1 Max in the old
| Macs.
|
| The Nano < Pro < Ultra is an in-revision thing. For their LLMs
| it's a size thing. Then there's newer releases of Nano, Pro,
| and Ultra. Some Pro might be better than some older Ultra.
|
| A lot of people seem confused about this but it feels so easy
| to understand that it's confusing to me that anyone could have
| trouble.
| devindotcom wrote:
| Apple didn't release the M3 Pro a week after the M1 Max
| renewiltord wrote:
| Adam Osborne's wife was one of my dad's patients so I'm not
| unacquainted with the risk of early announcements. But
| surely they do not prevent comprehension.
| thiago_fm wrote:
| I like that they are rushing with this and don't care enough to
| make it Gemini 2 or even really release it, to me it looks like
| they are concerned to share progress.
|
| Hope they do a good job and once OpenAI releases GPT 5 they are
| competitive with it with their offerings, it will be better for
| everyone.
| kaspermarstal wrote:
| Incredible. RAG will be obsolete in a year or two.
| hackernoteng wrote:
| It's already obsolete. It doesn't work except for trivial cases
| which have no real value.
| jeanloolz wrote:
| Obsolete if you don't take cost in consideration. Having 10
| millions of token going through each layer of the LLM is going
| to cost a lot of money each time. At gpt4 rate that could mean
| 200 dollars for each inference
| scarmig wrote:
| The technical report: https://storage.googleapis.com/deepmind-
| media/gemini/gemini_...
| jpeter wrote:
| OpenAI has no Moat
| jklinger410 wrote:
| They only have a head start, and the lead is closing
| seydor wrote:
| hence why it's Open
| rvz wrote:
| This. He's right you know.
|
| OpenAI is extremely overvalued and Google is closing their lead
| rapidly.
| fnordpiglet wrote:
| Is there any meaningful valuation on OpenAI? It's not for
| sale, there is no market.
|
| Google ... has no ability to commercialize anything. Their
| only commercial successes are ads and YouTube. Doing
| deceptive launches and flailing around with Gemini isn't
| helping their product prospects. I wouldn't take a bet
| between open ai and anyone, but I also wouldn't take a bet on
| Google succeeding commercially on anything other than
| pervasive surveillance and adware.
| wrsh07 wrote:
| A reference to the good doc:
| https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...
|
| While I'm linking semianalysis, though, it's probably worth
| talking about how everyone except Google is GPU poor:
| https://www.semianalysis.com/p/google-gemini-eats-the-world-...
| (paid)
|
| > Whether Google has the stomach to put these models out
| publicly without neutering their creativity or their existing
| business model is a different discussion.
|
| Google has a serious GPU (well, TPU) build out, and the fact
| that they're able to train moe models on it means there aren't
| any technical barriers preventing them from competing at the
| highest levels
| Keyframe wrote:
| they also have internet.zip and all of its repo history as
| well as usenet and mails etc.. which others don't.
| anonyfox wrote:
| but GPT-4 is nearly a year old now, I'd wait for the next
| release of OAI before judgement. Probably rather soonish now I
| would expect.
| gpjanik wrote:
| 0 trust to what they put out until I see it live. After the last
| "launch" video which was fundamentally a marketing edit not
| showing the real product, I don't trust anything coming out of
| Google that isn't an instantly testable input form.
| losvedir wrote:
| If I understand correctly, they're releasing this for Pro but not
| Ultra, which I think is akin to GPT 3.5 vs 4? Sigh, the naming is
| confusing...
|
| But my main takeaway is the huge context window! Up to a million,
| with more than 100k tokens right now? Even just GPT 3.5 level
| prediction with such a huge context window opens up a lot of
| interesting capabilities. RAG can be super powerful with that
| much to work with.
| danpalmer wrote:
| The announcement suggests that 1.5 Pro is similar to 1.0 Ultra.
| benopal64 wrote:
| I am reaching a bit, however, I think its a bit of a
| marketing technique. The Pro 1.5 being compared to the Ultra
| 1.0 model seems to imply that they will be releasing a Ultra
| 1.5 model which will presumably have similar characteristics
| to the new Pro 1.5 model (MOE architecture w/ a huge context
| window).
| danpalmer wrote:
| Apparently the technical report implies that Ultra 1.5 is a
| step-up again, I'm not sure it's just context length, that
| seems to be orthogonal in everything I've read so far.
| ygouzerh wrote:
| So Pro and Ultra are from my understanding link to the number
| of parameters. More parameters means more reasonning
| capabilities, but more compute needed.
|
| So Pro is like the light and fast version and Ultra the
| advanced and expensive one.
| cchance wrote:
| It's sizes
|
| Nano/Pro/Ultra are model SIZES. 1.0/1.5 is generations of the
| architecture.
| amf12 wrote:
| Maybe this analogy would help: iPhone 15, iPhone Pro 15, iPhone
| Pro Max 15 and then iPhone Pro 15.5
| golergka wrote:
| In one of the demos, it successfully navigates a threejs demo and
| finds the place to change in response to a request.
|
| How long until it shows similar results on middle-sized and large
| codebases? And do the job adequately?
| kypro wrote:
| 1-2 years probably. There will still be a question around who
| determines what "adequately" is for a while though. Presumably
| even if an LLM can do something in theory you wouldn't actually
| want it doing anything without human oversight.
|
| And we should keep in mind that to understand a code change in
| depth is often just as much work as making the change. When
| review PRs I don't really know exactly what every change is
| doing. I certain haven't tested it to be 100% certain I
| understand fully. I'm just checking the logic looks mostly
| right and that I don't see anything clearly wrong, and even
| then I'll often need to ask for clarifications why something
| was done.
|
| I can't imagine LLMs being used in most large code bases for a
| while yet. They'd probably need to be 99.9% reliable before we
| can start trusting them to make changes without verifying every
| line.
| simon_kun wrote:
| Today.
| pryelluw wrote:
| Gemini (or whatever google ai) will be all about ads. I'm not
| adopting this shit. Their whole business model is ads. Why would
| I adopt a product from a company that only cares about selling
| more ads?
| Alifatisk wrote:
| Google One's business model is not ads?
|
| I mention Google One because you can access Gemini Ultra
| through it.
| imp0cat wrote:
| All their services are just a way to get more information
| about their users so they can serve them ads.
|
| Those Gemini queries will be no exception.
| sodality2 wrote:
| Not true - Gemini looks to be marketed towards companies,
| where it's far more profitable to just charge thousands of
| dollars. Ads wouldn't fund AI usage anyway. GPU's are
| extremely expensive (even Google's fancy TPU's).
| snapcaster wrote:
| Agreed, people continually forget that Google has fundamentally
| failed at everything besides selling ads despite decades of
| moonshots and other attempts to shift the business. Very
| skeptical that any company getting 80% revenue from ads will be
| able to resist the pressure to advertise
| royletron wrote:
| Is there a reason this isn't available in the
| UK/France/Germany/Spain but is in available in Jersey... and
| Tuvalu?
| onlyrealcuzzo wrote:
| EU regulations and fines.
| vibrolax wrote:
| Probably because EU/national governments have regulations with
| respect to the safety and privacy of the users, and the
| purveyors must evaluate the performance of their products
| against the regulatory standards.
| seydor wrote:
| Onwards to a billion tokens
| fernandotakai wrote:
| i saw this announcement on twitter and i was excited to check it
| out, only to see that "we're offering a limited preview of 1.5
| Pro to developers and enterprise customers via AI Studio and
| Vertex AI".
|
| please google, only announce things when people can actually use
| it.
| xyzzy_plugh wrote:
| I miss when I didn't have to scroll to read a single tweet.
| ComputerGuru wrote:
| Twitter has that functionality natively now, but I don't know
| if you have to be a pro user to access. It's the book icon in
| the upper-right corner of the first tweet in a series. Links to
| this, but it looks different when I view it in incognito vs
| logged in:
| https://twitter.com/JeffDean/thread/1758146022726041615
| og_kalu wrote:
| >Finally, we highlight surprising new capabilities of large
| language models at the frontier; when given a grammar manual for
| Kalamang, a language with fewer than 200 speakers worldwide, the
| model learns to translate English to Kalamang at a similar level
| to a person learning from the same content.
|
| Results - https://imgur.com/a/qXcVNOM
|
| From the technical report
| https://storage.googleapis.com/deepmind-media/gemini/gemini_...
| poulpy123 wrote:
| > at a similar level to a person learning from the same
| content.
|
| That's an incredibly low bar
| ithkuil wrote:
| It's incredible how fast goalposts are moving.
|
| The same feat one year ago would have been almost
| unbelievable.
| KeplerBoy wrote:
| Since when are we expecting super-human capabilities?
| andsoitis wrote:
| And in fact it already is super human. Show me a single
| human who can translate amongst 10+ languages across
| specialized domains in the blink of an eye.
| empath-nirvana wrote:
| Chat GPT has been super human in a lot of tasks even
| since 3.5.
|
| People point out mistakes it makes that no human would
| make, but that doesn't negate the super-human performance
| it has at other tasks -- and the _breadth_ of what it can
| do is far beyond any single person.
| KeplerBoy wrote:
| Where exactly does it have super-human performance? Above
| average and expert-level? Sure, I'd agree, but I haven't
| experienced anything above that.
| elevatedastalt wrote:
| :muffled sounds of goalposts being shifted in the distance:
|
| Just a few years ago we used to clap if an NLP model could
| handle negation reliably or could generate even a paragraph
| of text in English that was natural sounding.
|
| Now we are at a stage where it is basically producing reams
| of natural sounding text, performing surprisingly well on
| reasoning problems and translation of languages with barely
| any data despite being a markov chain on steroids, and what
| does it hear? "That's an incredibly low bar".
| glenstein wrote:
| I'm going to keep beating this dead horse, but if you were
| a philosophy nerd in the 80s, 90s, 00s etc you may know
| that debates RAGED over whether computers could ever, even
| in principle do things that are now being accomplished on a
| weekly basis.
|
| And as you say, the goalposts keep getting moved. It used
| to be claimed that computers could never play chess at the
| highest levels because that required "insight". And
| whatever a computer could do, it could never to that extra
| special thing, that could only be described in magical
| undefined terms.
|
| I just hope there's a moment of reckoning for decades upon
| decades of arguments, deemed academically respectable, that
| insisted that days like these would never come.
| elevatedastalt wrote:
| Honestly. I am ok with having greater and greater goals
| to accomplish but this sort of dismissive attitude really
| puts me off.
| empath-nirvana wrote:
| Forget goalpost shifting, people frequently refuse to
| admit that it can do things that it obviously does,
| because they've never used it themselves.
| mewpmewp2 wrote:
| Listen, you little ...
| zacmps wrote:
| > The author (the human learner) has some formal experience
| in linguistics and has studied a variety of languages both
| formally and informally, though no Austronesian or Papuan
| languages
|
| From the language benchmark (parentheses mine).
| JyB wrote:
| Jarring you're not adding more context to your comment.
| seydor wrote:
| what if we ask it to translate an undeciphered language
| dougmwne wrote:
| It produces basically random translations. This is covered in
| the 0-shot case where no translation manual was included in
| the context. Due to how rare this language is, it's
| essentially untranslated in the training corpus.
| og_kalu wrote:
| If you mean to dump random passages of text with no parallel
| corpora or grammar instructions then it won't do better than
| random.
|
| That said, I think that if you gave a LLM language text to
| predict during training, I believe that even if no parallel
| corpora exists during training, we could have a LLM that
| could still translate that language to some other language it
| also trained on.
| seydor wrote:
| What if we added a bunch of linguistic analysis books or
| something
| uptownfunk wrote:
| Google is a public company. Anything and everything will be
| scrutinized very heavily by shareholders. Of course how Zuck
| operates very different than Sindar.
|
| What are they doing with their free cash is my question. Are they
| waiting for the LLM bubble to pop to buy some of these companies
| at a discount?
| ComputerGuru wrote:
| The context window size - if it really works as advertised - is
| pretty ground-breaking. It would replace the need to RAG or fine
| tune for one-off (or few-off) analys{is,es} of input streams
| cheaper and faster. I wonder how they got past the input token
| stuffing problems everyone else runs into.
| jcuenod wrote:
| It won't remove the use of RAG at all. That's like saying,
| "wow, now that I've upgraded my 128GB HDD to 1TB, I'll never
| run out of space again."
| madisonmay wrote:
| It's more like saying "I've upgraded to 128GB of RAM, I'll
| never use my disk again".
| sebzim4500 wrote:
| 10 TB for an accurate proportion.
|
| And I think people who buy a laptop with a 1TB SSD generally
| don't run out of space, at least I don't.
| lumost wrote:
| They are almost certainly using some form of sparse attention.
| If you linearize the attention operation, you can scale up to
| around 1-10M tokens depending on hardware before hitting memory
| constraints. Linearization works off the assumption that for a
| subsequence of X tokens out M tokens, where M os much greater
| than X there are likely only K tokens which are useful for the
| attention operation.
|
| There are a bunch of techniques to do this, but it's unclear
| how well any of them scale.
| ein0p wrote:
| Not "almost", but certainly. Dense attention is quadratic,
| not even Google would be able to run it at an acceptable
| speed. Their model is not recurrent - they did not have the
| time yet (or resources - believe it or not, Google of 2023-24
| is very compute constrained) to train newer SSM or recurrent
| based models at practical parameter counts. Then there's the
| fact that those models are far harder to train due to
| instabilities, which is one of the reasons why you don't yet
| see FOSS recurrent/SSM models that are SOTA at their size or
| tokens/sec. With sparse attention, however, long context
| recall will be far from perfect, and the longer the context
| the worse the recall. That's better than no recall at all (as
| in a fully dense attention model which will simply lop off
| the preceding parts of the conversation), but not by a hell
| of a lot.
| popinman322 wrote:
| vs RAG: RAG is good for searching across >billions of tokens
| and providing up-to-date information to a static model. Even
| with huge context lengths it's a good idea to submit high
| quality inputs to prevent the model from going off on tangents,
| getting stuck on contradictory information, etc..
|
| vs fine tuning: smaller, fine-tuned models can perform better
| than huge models in a decent number of tasks. Not strictly
| fine-tuning, but for throughput limited tasks it'll likely
| still be better to prune a 70B model down to 2B, keeping only
| the components you need for accurate inference.
|
| I can see this model being good for taking huge inputs and
| compressing them down for smaller models to use.
| nbardy wrote:
| RAG will stick around, at some point you want to retrieve
| grounded information samples to inject in the context window.
| RAG+long context just gives you more room for grounded context.
|
| Think building huge relevant context on topics before
| answering.
| torginus wrote:
| Tbh, I haven't read the paper, but I think it's pretty self-
| evident that large contexts aren't cheap - the AI has to comb
| through every word of the context for each successive generated
| token at least once, so it's going to be at least linear.
| Alifatisk wrote:
| I remember one of the biggest advantages with Google Bard was the
| heavily limited context window. I am glad Google is now actually
| delivering some exciting news now with Gemini and this gigantic
| token size.
|
| Sure it's a bummer that they slap the "Join the waiting list",
| but it's still interesting to read about their progress and
| competing with ClosedAi (OpenAi).
|
| One last thing I hope they fix is the heavily morally and
| ethically guardrail, sometimes I can barely ask proper questions
| without it triggering Gemini to educate me about what's right and
| wrong. And when I try the same prompt with ChatGPT and Bing ai,
| they happily answer.
| elevatedastalt wrote:
| "biggest advantages with Google Bard"
|
| Did you mean disadvantages?
| CrypticShift wrote:
| Most data accumulates gradually (e.g., one email at a time, one
| line of text at a time across various documents). Is this huge
| 10M scale of context window relevant to a gradual, yet constant,
| influx of data (like a prompt over a whole google workspace) ?
| Imnimo wrote:
| This is the first time I've been legitimately impressed by one of
| Google's LLMs (with the obvious caveat that I'm taking the
| results reported in their tech report at face value).
| sremani wrote:
| I have Gemini Advanced, do I get access to this? Google is giving
| Microsoft run for money for branding confusion.
| Alifatisk wrote:
| Not yet, Gemini advanced is using Gemini Ultra, not Gemini pro.
| Ecstatify wrote:
| Gemini advanced is terrible.
|
| I asked it to rephrase "Are the original stated objectives
| still relevant?"
|
| It's starts going on about Ukraine and Russia.
|
| https://g.co/gemini/share/ddb3887f79e2
| Alifatisk wrote:
| I think it took the whole context of the converstion into
| consideration, you should create a new converstaion instead
| and see if it responds differently.
|
| Or you could be more specific, like "Rephrase the following
| sentence: 'Are the original stated objectives still
| relevant?' in a formal way, respond with one option only."
| Ecstatify wrote:
| It was a new conversation. I've never mentioned Russia or
| Ukraine in any conversation ever.
| Alifatisk wrote:
| That's so weird, yet interesting. What happens if you
| open a new convo again and enter the same prompt?
| piva00 wrote:
| I thought I wouldn't but I'm getting really, really confused
| with the naming and branding of what Gemini is a model and
| which is a product. Advanced, Pro, Ultra, seemingly Pro is
| getting better than Ultra? And Advanced is the product using
| the Ultra underlying model?
|
| Ugh, my brain.
| tapoxi wrote:
| I've read this sentence three times, wow what horrible
| branding.
| vessenes wrote:
| The white paper is worth a read. The things that stand out to me
| are:
|
| 1. They don't talk about how they get to 10M token context
|
| 2. They don't talk about how they get to 10M token context
|
| 3. The 10M context ability wipes out most RAG stack complexity
| immediately. (I imagine creating caching abilities is going to be
| important for a lot of long token chatting features now, though).
| This is going to make things much, much simpler for a lot of use
| cases.
|
| 4. They are pretty clear that 1.5 Pro is better than GPT-4 in
| general, and therefore we have a new LLM-as-judge leader, which
| is pretty interesting.
|
| 5. It seems like 1.5 Ultra is going to be highly capable. 1.5 Pro
| is already very very capable. They are running up against very
| high scores on many tests, and took a minute to call out some
| tests where they scored badly as mostly returning false
| negatives.
|
| Upshot, 1.5 Pro looks like it _should_ set the bar for a bunch of
| workflow tasks, if we can ever get our hands on it. I 've found
| 1.0 Ultra to be very capable, if a bit slow. Open models
| downstream should see a significant uptick in quality using it,
| which is great.
|
| Time to dust out my coding test again, I think, which is: "here
| is a tarball of a repository. Write a new module that does X".
|
| I really want to know how they're getting to 10M context, though.
| There are some intriguing clues in their results that this isn't
| just a single ultra-long vector; for instance, their audio and
| video "needle" tests, which just include inserting an image that
| says "the magic word is: xxx", or an audio clip that says the
| same thing, have perfect recall across up to 10M tokens. The text
| insertion occasionally fails. I'd speculate that this means there
| is some sort of compression going on; a full video frame with
| text on it is going to use a lot more tokens than the text
| needle.
| CharlieDigital wrote:
| > The 10M context ability wipes out most RAG stack complexity
| immediately.
|
| Remains to be seen.
|
| Large contexts are not always better. For starters, it takes
| longer to process. But secondly, even with RAG and the large
| context of GPT4 Turbo, providing it a more relevant and
| accurate context always yields better output.
|
| What you get with RAG is faster response times and more
| accurate answers by pre-filtering out the noise.
| behnamoh wrote:
| Don't forget that Gemini also has access to the internet, so
| a lot of RAGging becomes pointless anyway.
| beppo wrote:
| Internet search _is_ a form of RAG, though. 10M tokens is
| very impressive, but you 're not fitting a database, let
| alone the entire internet into a prompt anytime soon.
| behnamoh wrote:
| You shouldn't fit an entire database in the context
| anyway.
|
| btw, 10M tokens is 78 times more context window than the
| newest GPT-4-turbo (128K). In a way, you don't need 78
| GPT-4 API calls, only one batch call to Gemini 1.5.
| rvnx wrote:
| Well it's nice, just sad nobody can use it
| cchance wrote:
| I don't get this why is it people think that you need to
| put an entire database in the short-term memory of the AI
| to be useful? When you work with a DB are you memorizing
| the entire f*cking database, no, you know the summaries
| of it and how to access and use it.
|
| People also seem to forget that the average is 1b words
| that are read by people in their entire LIFETIME, and at
| 10m, with nearly 100% recall thats pretty damn amazing,
| i'm pretty sure I don't have perfect recall of 10m words
| myself lol
| Qwero wrote:
| It increases the use cases.
|
| It can also be a good alternative for fine-tuning.
|
| And the use case of a code base is a good example: if the
| ai understands the whole context, it can do basically
| everything.
|
| Let me pay 5EUR for a android app rewritten into iOS.
| CharlieDigital wrote:
| This may be useful in a generalized use case, but a problem
| is that many of those results again will add noise.
|
| For any use case where you want contextual results, you
| need to be able to either filter the search scope or use
| RAG to pre-define the acceptable corpus.
| panarky wrote:
| _> you need to be able to either filter the search scope
| or use RAG ..._
|
| Unless you can get nearly perfect recall with millions of
| tokens, which is the claim made here.
| killerstorm wrote:
| Hopefully we can get a better RAG out of it. Currently people
| do incredibly primitive stuff like chunking text into chunks
| of a fixed size and adding them to vector DB.
|
| An actually useful RAG would be to convert text to Q&A and
| use Q's embeddings as an index. Large context can make use of
| in-context learning to make better Q&A.
| mediaman wrote:
| A lot of people in RAG already do this. I do this with my
| product: we process each page and create lists of potential
| questions that the page would answer, and then embed that.
|
| We also embed the actual text, though, because I found that
| only doing the questions resulted in inferior performance.
| CharlieDigital wrote:
| So in this case, what your workflow might look like is:
| 1. Get text from page/section/chunk 2. Generate
| possible questions related to the page/section/chunk
| 3. Generate an embedding using { each possible question +
| page/section/chunk } 4. Incoming question targets
| the embedding and matches against { question + source }
|
| Is this roughly it? How many questions do you generate?
| Do you save a separate embedding for each question? Or
| just stuff all of the questions back with the
| page/section/chunk?
| cs702 wrote:
| > 1. They don't talk about how they get to 10M token context
|
| > 2. They don't talk about how they get to 10M token context
|
| Yes. I wonder if they're using a "linear RNN" type of model
| like Linear Attention, Mamba, RWKV, etc.
|
| Like Transformers with standard attention, these models train
| efficiently in parallel, but their compute is O(N) instead of
| O(N2), so _in theory_ they can be extended to much longer
| sequences much efficiently. They have shown a lot of promise
| recently at smaller model sizes.
|
| Does anyone here have any insight or knowledge about the
| internals of Gemini 1.5?
| candiodari wrote:
| They do give a hint:
|
| "This includes making Gemini 1.5 more efficient to train and
| serve, with a new Mixture-of-Experts (MoE) architecture."
|
| One thing you could do with MoE is giving each expert
| different subsets of the input tokens. And that would
| definitely do what they claim here: it would allow search.
| You want to find where someone said "the password is X" in a
| 50 hour audio file, this would be perfect.
|
| If your question is "what is the first AND last thing person
| X said" ... it's going to suck badly. Anything that requires
| taking 2 things into account that aren't right next to
| eachother is just not going to work.
| declaredapple wrote:
| > One thing you could do with MoE is giving each expert
| different subsets of the input tokens.
|
| Don't MoE's route tokens to experts after the attention
| step? That wouldn't solve the n^2 issue the attention step
| has.
|
| If you split the tokens _before_ the attention step, that
| would mean those tokens would have no relationship to each
| other - it would be like inferring two prompts in parallel.
| That would defeat the point of a 10M context
| deskamess wrote:
| Is MOE then basically divide and conquer? I have no deep
| knowledge of this so I assumed MOE was where each expert
| analyzed the problem in a different way and then there was
| some map-reduce like operation on the generated expert
| results. Kinda like random forest but for inference.
| spott wrote:
| > Anything that requires taking 2 things into account that
| aren't right next to eachother is just not going to work.
|
| They kinda address that in the technical report[0]. On page
| 12 they show results from a "multiple needle in a haystack"
| evaluation.
|
| https://storage.googleapis.com/deepmind-
| media/gemini/gemini_...
| sebzim4500 wrote:
| The fact they are getting perfect recall with millions of
| tokens rules out any of the existing linear attention
| methods.
| usaar333 wrote:
| > They are pretty clear that 1.5 Pro is better than GPT-4 in
| general, and therefore we have a new LLM-as-judge leader, which
| is pretty interesting.
|
| They try to push that, but it's not the most convincing. Look
| at Table 8 for text evaluations (math, etc.) - they don't even
| attempt a comparison with GPT-4.
|
| GPT-4 is higher than any Gemini model on both MMLU and GSM8K.
| Gemini Pro seems slightly better than GPT-4 original in Human
| Eval (67->71). Gemini Pro does crush naive GPT-4 on math
| (though not with code interpreter and this is the original
| model).
|
| All in 1.5 Pro seems maybe a bit better than 1.0 Ultra. Given
| that in the wild people seem to find GPT-4 better for say
| coding than Gemini Ultra, my current update is Pro 1.5 is about
| equal to GPT-4.
|
| But we'll see once released.
| cchance wrote:
| I mean i don't see GPT4 watching a 44 minute movie and being
| able to exactly pinpoint a guy taking a paper out of his
| pocket..
| panarky wrote:
| _> people seem to find GPT-4 better for say coding than
| Gemini Ultra_
|
| For my use cases, Gemini Ultra performs significantly better
| than GPT-4.
|
| My prompts are long and complex, with a paragraph or two
| about the general objective followed by 15 to 20 numbered
| requirements. Often I'll include existing functions the new
| code needs to work with, or functions that must be refactored
| to handle the new requirements.
|
| I took 20 prompts that I'd run with GPT-4 and fed them to
| Gemini Ultra. Gemini gave a clearly better result in 16 out
| of 20 cases.
|
| Where GPT-4 might miss one or two requirements, Gemini
| usually got them all. Where GPT-4 might require multiple chat
| turns to point out its errors and omissions and tell it to
| fix them, Gemini often returned the result I wanted in one
| shot. Where GPT-4 hallucinated a method that doesn't exist,
| or had been deprecated years ago, Gemini used correct
| methods. Where GPT-4 called methods of third-party packages
| it assumed were installed, Gemini either used native code or
| explicitly called out the dependency.
|
| For the 4 out of 20 prompts where Gemini did worse, one was a
| weird rejection where I'd included an image in the prompt and
| Gemini refused to work with it because it had unrecognizable
| human forms in the distance. Another was a simple bash script
| to split a text file, and it came up with a technically
| correct but complex one-liner, while GPT-4 just used split
| with simple options to get the same result.
|
| For now I subscribe to both. But I'm using Gemini for almost
| all coding work, only checking in with GPT-4 when Gemini
| stumbles, which isn't often. If I continue to get solid
| results I'll drop the GPT-4 subscription.
| sho_hn wrote:
| I have a very similar prompting style to yours and share
| this experience.
|
| I am an experienced programmer and usually have a fairly
| exact idea of what I want, so I write detailed requirements
| and use the models more as typing accelerators.
|
| GPT-4 is useful in this regard, but I also tried about a
| dozen older prompts on Gemini Advanced/Ultra recently and
| in every case preferred the Ultra output. The code was
| usually more complete and prod-ready, with higher
| sophistication in its construction and somewhat higher
| density. It was just closer to what I would have hand-
| written.
|
| It's increasingly clear though LLM use has a couple of
| different major modes among end-user behavior. Knowledge
| base vs. reasoning, exploratory vs. completion, instruction
| following vs. getting suggestions, etc.
|
| For programming I want an obedient instruction-following
| completer with great reasoning. Gemini Ultra seems to do
| this better than GPT-4 for me.
| sjwhevvvvvsj wrote:
| I'm going to have to try Gemini for code again. It just
| occurred to me as a Xoogler that if they used Google's
| code base as the training data it's going to be
| unbeatable. Now did they do that? No idea, but quality
| wins over quantity, even with LLM.
| barrkel wrote:
| There is no way NTK data is in the training set, and
| google3 is NTK.
| Dayshine wrote:
| Is there any chance you could share an example of the kind
| of prompt you're writing?
|
| I'm always reluctant to write long prompts because I often
| find GPT4 just doesn't get it, and then I've wasted ten
| minutes writing a prompt
| spott wrote:
| > Gemini Pro seems slightly better than GPT-4 original in
| Human Eval (67->71).
|
| Though they talk a bunch about how hard it was to filter out
| Human Eval, so this probably doesn't matter much.
| swalsh wrote:
| "The 10M context ability wipes out most RAG stack complexity
| immediately."
|
| I'm skeptical, my past experience is just becaues the context
| has room to stuff whatever you want in it, the more you stuff
| in the context the less accurate your results are. There seems
| to be this balance of providing enough that you'll get high
| quality answers, but not too much that the model is
| overwhelmed.
|
| I think a large part of developing better models is not just a
| better architectures that support larger and larger context
| sizes, but also capable models that can properly leverage that
| context. That's the test for me.
| HereBePandas wrote:
| They explicitly address this in page 11 of the report.
| Basically perfect recall for up to 1M tokens; way better than
| GPT-4.
| westoncb wrote:
| I don't think recall really addresses it sufficiently: the
| main issue I see is answers getting "muddy". Like it's
| getting pulled in too many directions and averaging.
| a_wild_dandan wrote:
| I'd urge caution in extending generalizations about
| "muddiness" to a new context architecture. Let's use the
| thing first.
| westoncb wrote:
| I'm not saying it applies to the new architecture, I'm
| saying that's a big issue I've observed in existing
| models and that so far we have no info on whether it's
| solved in the new one (i.e. accurate recall doesn't imply
| much in that regard).
| westoncb wrote:
| Would be awesome if it is solved but seems like a much
| deeper problem tbh.
| a_wild_dandan wrote:
| Ah, apologies for the misunderstanding. What tests would
| you suggest to evaluate "muddiness"?
|
| What comes to my mind: run the usual gamut of tests, but
| with the excess context window saturated with
| irrelevant(?) data. Measure test answer
| accuracy/verbosity as a function of context saturation
| percentage. If there's little correlation between these
| two variables (e.g. 9% saturation is just as
| accurate/succinct as 99% saturation), then "muddiness"
| isn't an issue.
| danielmarkbruce wrote:
| Manual testing on complex documents. A big legal contract
| for example. An issue can be referred to in 7 different
| places in a 100 page document. Does it give a coherent
| answer?
|
| A handful of examples show whether it can do it. For
| example, GPT-4 turbo is downright awful at something like
| that.
| smeagull wrote:
| I believe that's a limitation of using vectors of high
| dimensions. It'll be muddy.
| swyx wrote:
| also costs are always based on context token, you dont want
| to put in 10m of context for every request (its just nice to
| have that option when you want to do big things that dont
| scale)
| 1024core wrote:
| How much would a lawyer charge to review your 10M-token
| legal document?
| chuckcode wrote:
| Would like to see the latency and cost of parsing entire 10M
| context before throwing out the RAG stack which is relatively
| cheap and fast.
| tkellogg wrote:
| costs rise on a per-token basis. So you _CAN_ use 10M tokens,
| but it 's probably not usually a good idea. A database lookup
| is still better than a few billion math operations.
| sjwhevvvvvsj wrote:
| I think the unspoken goal is to just lay off your employees
| and dump every doc and email they've ever written as one
| big context.
|
| Now that Google has tasted the previously forbidden fruit
| of layoffs themselves, I think their primary goal in ML is
| now headcount reduction.
| theolivenbaum wrote:
| Also unless they significantly change their pricing model,
| we're talking about 0.5$ per API call at current prices
| aik wrote:
| Have to consider cost for all of this. Big value of RAG
| already even given the size of GPT-4'a largest context size
| is it decreases cost very significantly.
| freedomben wrote:
| Is 10M token context correct? The blog post I see 1M but I'm
| not sure if these are different things
|
| Edit: Ah, I see, it's 1M reliably in production, up to 10M in
| research:
|
| > _Through a series of machine learning innovations, we've
| increased 1.5 Pro's context window capacity far beyond the
| original 32,000 tokens for Gemini 1.0. We can now run up to 1
| million tokens in production._
|
| > _This means 1.5 Pro can process vast amounts of information
| in one go -- including 1 hour of video, 11 hours of audio,
| codebases with over 30,000 lines of code or over 700,000 words.
| In our research, we've also successfully tested up to 10
| million tokens._
| huytersd wrote:
| I know how I'm going to evaluate this model. Upload my
| codebase and ask it to "find all the bugs".
| tbruckner wrote:
| How do you know it isn't RAG?
| tveita wrote:
| > The 10M context ability wipes out most RAG stack complexity
| immediately.
|
| The video queries they show take around 1 minute each, this
| probably burns a ton of GPU. I appreciate how clearly they
| highlight that the video is sped up though, they're clearly
| trying to avoid repeating the "fake demo" fiasco from the
| original Gemini videos.
| theGnuMe wrote:
| For #1 and #2 it is some version of mixture of experts. This is
| mentioned in the blog post. So each expert only sees a subset
| of the tokens.
|
| I imagine they have some new way to route tokens to the experts
| that probably computes a global context. One scalable way to
| compute a global context is by a state space model. This would
| act as a controller and route the input tokens to the MoEs.
| This can be computed by convolution if you make some
| simplifying assumptions. They may also still use transformers
| as well.
|
| I could be wrong but there are some Mamba-MoEs papers that
| explore this idea.
| resouer wrote:
| > The 10M context ability wipes out most RAG stack complexity
| immediately.
|
| This may not be true. My experience of the complexity of RAG
| lays in how to properly connect to various unstructured data
| sources and perform data transformation pipeline for large
| scale data set (which means GB, TB or even PB). It's in the
| critical path rather a "nice to have", because the quality of
| data and the pipeline is a major factor for the final generated
| the result. i.e., in RAG, the importance of R >>> G.
| jorvi wrote:
| I just hope at some point we get access to mostly uncensored
| models. Both GPT-4 and Gemini are extremely shackled, and a
| slightly inferior model that hasn't been hobbled by a very
| restricting preprompt would handily outperform them.
| ShamelessC wrote:
| You can customize the system prompt with ChatGPT or via the
| completions API, just fyi.
| ren_engineer wrote:
| RAG would still be useful for cost savings assuming they charge
| per token, plus I'm guessing using the full-context length
| would be slower than using RAG to get what you need for a
| smaller prompt
| nostrebored wrote:
| This is going to be the real differentiator.
|
| HN is very focused on technical feasibility (which remains to
| be seen!), but in every LLM opportunity, the CIO/CFO/CEO are
| going to be concerned with the cost modeling.
|
| The way that LLMs are billed now, if you can densely pack the
| context with relevant information, you will come out ahead
| commercially. I don't see this changing with the way that LLM
| inference works.
|
| Maybe this changes with managed vector search offerings that
| are opaque to the user. The context goes to a preprocessing
| layer, an efficient cache understands which parts haven't
| been embedded (new bloom filter use case?), embeds the other
| chunks, and extracts the intent of the prompt.
| mediaman wrote:
| Agreed with this.
|
| The leading ability AI (in terms of cognitive power) will,
| generally, cost more per token than lower cognitive power
| AI.
|
| That means that at a given budget you can choose more
| cognitive power with fewer tokens, or less cognitive power
| with more tokens. For most use cases, there's no real point
| in giving up cognitive power to include useless tokens that
| have no hope of helping with a given question.
|
| So then you're back to the question of: how do we reduce
| the number of tokens, so that we can get higher cognitive
| power?
|
| And that's the entire field of information retrieval, which
| is the most important part of RAG.
| golol wrote:
| The way that LLMs are billed now, if you can densely pack
| the context with relevant information, you will come out
| ahead commercially. I don't see this changing with the way
| that LLM inference works.
|
| Really? Because to my understanding the compute necessary
| to generate a token grows linearly with the context, and
| doesn't the OpenAI billing reflect that by seperating
| prompt and output tokens?
| cchance wrote:
| The youtube video of the Multimodal analysis of a video is
| insane, imagine feeding in movies or tv shows and being able to
| autosummary or find information about them dynamically, how the
| hell is all this possible already? AI is moving insanely fast.
| zitterbewegung wrote:
| RAG doesn't go away at 10 Million tokens if you do esoteric
| sources like shodan API queries.
| kylerush wrote:
| I assume using this large of a context window instead of RAG
| would mean the consumption of many orders of magnitude more
| GPU.
| karmasimida wrote:
| Even 1m tokens eliminate the need for RAG, unless it is for
| cost.
| sroussey wrote:
| Or accuracy
| 7734128 wrote:
| 1 million might sound like a lot, but it's only a few
| megabytes. I would want RAG, somehow, to be able to process
| gigabytes or terabytes of material in a streaming fashion.
| karmasimida wrote:
| RAG will not change how many tokens LLM can produce at
| once.
|
| Longer context on the other hand, could put some RAG use
| cases to sleep, if your instructions are like, literally a
| manual long, then there is no need for rag.
| localhost wrote:
| RE: RAG - they haven't released pricing, but if input tokens
| are priced at GPT-4 levels - $0.01/1K then sending 10M tokens
| will cost you $100.
| s-macke wrote:
| If you think the current APIs will stay that way, then you're
| right. But when they start offering dedicated chat instances
| or caching options, you could be back in the penny region.
|
| You probably need a couple GB to cache a conversation. That's
| not so easy at the moment because you have to transfer that
| data to and from the GPUs and store the data somewhere.
| TweedBeetle wrote:
| Regarding how they're getting to 10M context, I think it's
| possible they are using the new SAMBA architecture.
|
| Here's the paper: https://arxiv.org/abs/2312.00752
|
| And here's a great podcast episode on it:
| https://www.cognitiverevolution.ai/emergency-pod-mamba-memor...
| LightMachine wrote:
| As a Brazilian, I approve that choice. Vambora amigos!
| renonce wrote:
| > They don't talk about how they get to 10M token context
|
| I don't know how either but maybe
| https://news.ycombinator.com/item?id=39367141
|
| Anyway I mean, there is plenty of public research on this so
| it's probably just a matter of time for everyone else to catch
| up
| albertzeyer wrote:
| Why do you think this specific variant (RingAttention)? There
| are so many different variants for this.
|
| As far as I know, the problem in most cases is that while the
| context length might be high in theory, the actual ability to
| use it is still limited. E.g. recurrent networks even have
| infinite context, but they actually only use 10-20 frames as
| context (longer only in very specific settings; or maybe if
| you scale them up).
| AaronFriel wrote:
| There will always be more data that _could_ be relevant than
| fits in a context window, and especially for multi-turn
| conversations, huge contexts incur huge costs.
|
| GPT-4 Turbo, using its full 128k context, costs around $1.28
| per API call.
|
| At that pricing, 1m tokens is $10, and 10m tokens is an eye-
| watering $100 per API call.
|
| Of course prices will go down, but the price advantage of
| working with less will remain.
| 7734128 wrote:
| Would the price really increase linearly? Isn't the demands
| on compute and memory increasing steeper than that as a
| function of context length?
| elorant wrote:
| I don't see a problem with this pricing. At 1m tokens you can
| upload the whole proceedings of a trial and ask it to draw an
| analysis. Paying $10 for that sounds like a steal.
| AaronFriel wrote:
| Of course, if you get exactly the answer you want in the
| first reply.
| staticman2 wrote:
| While it's hard to say what's possible on the cutting edge,
| historically models tend to get dumber as the context size
| gets bigger. So you'd get a much more intelligent analysis
| of a 10,000 token excerpt of the trial than a million token
| complete transcript of the trial. I have not spent the
| money testing big token sizes in GPT 4 turbo, but it would
| not surprise me if it gets dumber. Think of it this way, if
| the model is limited to 3,000 token replies, if an analysis
| would require a more detailed response than 3,000 tokens,
| it cannot provide it, it'll just give you insufficient
| information. What it'll probably do is ignore parts of the
| trial transcript because it can't analyze all that
| information in 3,000 tokens. And asking a followup question
| is another million tokens.
| qwerty_clicks wrote:
| FYI, MM is the standard for million. 10MM not 10M I'm reading
| all these comments confused as heck why you are excited about
| 10M tokens
| a_vanderbilt wrote:
| After their giant fib with the Gemini video a few weeks back
| I'm not believing anything til I see it used by actual people.
| I hope it's that much better than GPT-4, but I'm not holding my
| breath there isn't an asterisk or trick hiding somewhere.
| nborwankar wrote:
| Re RAG aren't you ignoring the fact that no one wants to put
| confidential company data into such LLM's. Private RAG
| infrastructure remains a need for the same reason that privacy
| of data of all sorts remains a need. Huge context solves the
| problem for large open source context material but that's only
| part of the picture.
| outside1234 wrote:
| It takes 60 seconds to process all of that context in their
| three.js demo, which is, I will say, not super interactive. So
| there is still room for RAG and other faster alternatives to
| narrow the context.
| aubanel wrote:
| > They are pretty clear that 1.5 Pro is better than GPT-4 in
| general, and therefore we have a new LLM-as-judge leader, which
| is pretty interesting
|
| I fully disagree, they compare Gemini 1.5 Pro and GPT4 only on
| context length, not on other tasks where they compare it only
| to other Gemini which is a strange self-own.
|
| I'm convinced that if they do not show the results against
| GPT4/Claude, it is because they do not look good.
| kristjansson wrote:
| For other's reference, the paper:
| https://storage.googleapis.com/deepmind-media/gemini/gemini_...
| joshsabol46 wrote:
| > The 10M context ability wipes out most RAG stack complexity
| immediately.
|
| RAG is needed for the same reason you don't `SELECT *` all of
| your queries.
| cubefox wrote:
| I think Anthropic and OpenAI could also have offered a one
| million context window a while ago. The relevant architecture
| breakthrough was probably when a linear increase in context
| length only required a linear increase in inference compute
| instead of a quadratic one. Anthropic and then OpenAI achieved
| linear context compute scaling before an architecture for it was
| published publicly (MAMBA paper).
| bearjaws wrote:
| The problem is, the 128k window performed terribly and showed
| that attention was mostly limited to the first and last 20%.
|
| Increasing it to 1M just means even more data is ignored.
| cubefox wrote:
| Maybe their architecture wasn't as good as MAMBA and Google
| could use the better architecture thanks to being late to the
| game...
| zippothrowaway wrote:
| I've always been suspicious of any announcement from Demis
| Hassabis since way back in his video game days when he did a
| monthly article in Edge magainze about the game he was
| developing. "Infinite Polygons" became a running joke in the
| industry because of his obvious snake-oil. The game itself,
| Republic [1], was an uninteresting failure.
|
| He learned how to promote himself from working for Peter "Project
| Milo" Molyneux and I see similar patterns of hype.
|
| [1]
| https://en.wikipedia.org/wiki/Republic:_The_Revolution#Marke...
| pradn wrote:
| The line between delusional and visionary is thin! I know I'm
| too grounded in "expected value" math to do super outlier stuff
| like starting a video game company...
| Qwero wrote:
| Funny read about his game.
|
| Nonetheless while still underwhelming in comparison to gpt-4
| (excluding this announcement as I haven't tried it yet), alpha
| go, zero and especially fold were tremendous!
| obblekk wrote:
| Very impressive if the benchmarks replicate. Some questions:
|
| * token cost? In multiples of Gemini pro 1
|
| * memory usage? Does already scarce gpu memory become even more
| of a bottleneck?
|
| * video resolution? Sherlock Jr (1924) is their test video -
| black and white, 45min, low res
|
| Most curious about the video... I wonder if RAG within video will
| become the next battlefront
| technics256 wrote:
| Does anyone actually have access to Ultra yet? It's a lame blog
| post where it says "it's available!" but the fine print says "by
| whitelist".
|
| Ok, whatever that means.
|
| OpenAI at least releases it all at once, to everyone.
| Szpadel wrote:
| oh, openai had a lot of waitlists also, gpt4 API, large context
| versions etc
| sonium wrote:
| I just watched the demo with the Apollo 11 transcript. (sidenote:
| maybe Gemini is named after the space program?).
|
| Wouldn't the transcript or at least a timeline of Apollo 11 be
| part of the training corpus? So even without the 400 pages in the
| context window just given the drawing I would assume a prompt
| like "In the context of Apoll 11, what moment does the drawing
| refer to?" would yield the same result.
| technics256 wrote:
| Gemini is named that way because of the collaboration between
| Google brain and deep mind
| singularity2001 wrote:
| Correct except that it spits out the timestamp
| torginus wrote:
| Gemini is named after the spacecraft that put the second person
| into orbit - pretty aptly named, but not sure if this was the
| intention.
| empath-nirvana wrote:
| i asked chatgpt4 to identify three humorous moments in the
| apollo 11 transcript and it hallucinated all 3 of them (i think
| -- i can't find what it's referring to). Presumably it's in
| it's corpus, too.
|
| > The "Snoopy" Moment: During the mission, the crew had a
| small, black-and-white cartoon Snoopy doll as a semi-official
| mascot, representing safety and mission success. At one point,
| Collins joked about "Snoopy" floating into his view in the
| spacecraft, which was a light moment reflecting the camaraderie
| and the use of humor to ease the intense focus required for
| their mission.
|
| The "Biohazard" Joke: After the successful moon landing and
| upon preparing for re-entry into Earth's atmosphere, the crew
| humorously discussed among themselves the potential of being
| quarantined back on Earth due to unknown lunar pathogens. They
| joked about the extensive debriefing they'd have to go through
| and the possibility of being a biohazard. This was a light-
| hearted take on the serious precautions NASA was taking to
| prevent the hypothetical contamination of Earth with lunar
| microbes.
|
| The "Mailbox" Comment: In the midst of their groundbreaking
| mission, there was an exchange where one of the astronauts
| joked about expecting to find a mailbox on the Moon, or asking
| where they should leave a package, playing on the surreal
| experience of being on the lunar surface, far from the ordinary
| elements of Earthly life. This comment highlighted the
| astronauts' ability to find humor in the extraordinary
| circumstances of their journey.
| htrp wrote:
| > Gemini 1.5 delivers dramatically enhanced performance. It
| represents a step change in our approach, building upon research
| and engineering innovations across nearly every part of our
| foundation model development and infrastructure. This includes
| making Gemini 1.5 more efficient to train and serve, with a new
| Mixture-of-Experts (MoE) architecture.
|
| Looks like they fine tuned across use cases and grabbed the
| mixtral architecture?
| sebzim4500 wrote:
| There's no way that's all it is, scaling mixtral to a context
| length of 10M while maintaining any level of reasoning ability
| would be extremely slow. If the only purpose of the model was
| to produce this report then maybe that's possible, but if they
| plan on actually deploying this to end users then there is no
| way they can run quadratic attention on 10M tokens.
| joak wrote:
| <<We'll also introduce 1.5 Pro with a standard 128,000 token
| context window when the model is ready for a wider release>>
|
| So actually they are lagging: their 128k model is yet to be
| released while OpenAI released theirs some months ago.
| joak wrote:
| Their 10M tokens demo is impressive though. They "released" a
| demo. Confusing...
| kyrra wrote:
| See: https://blog.google/technology/ai/google-gemini-next-
| generat...
|
| > Gemini 1.5 Pro comes with a standard 128,000 token context
| window. But starting today, a limited group of developers and
| enterprise customers can try it with a context window of up to
| 1 million tokens via AI Studio and Vertex AI in private
| preview.
| joak wrote:
| Gemini 1.5 Pro is not yet released: <<Starting today, we're
| offering a limited preview of 1.5 Pro to developers and
| enterprise customers via AI Studio and Vertex AI>>
|
| Something like an alpha version.
|
| _Limited preview_ in their jargon.
| iamgopal wrote:
| AI race is amazing, Nvidia reaping the benefits now, but soon the
| world.
| cubefox wrote:
| The whitepaper says the Buster Keaton film was reduced to 1 FPS
| before being fed in. Apparently multi-modal language models can
| only read individual pictures, so videos have to be reduced to a
| series of frames. I assume animal brains are more efficient than
| that. E.g. by only feeding the "changes/difference over time"
| instead of a sequence of time slices.
| riku_iki wrote:
| it will probably eventually be improved by adding some encoder
| on top of LLM, which will encode 60 frames into 1 while
| attempting to preserve information..
| freedomben wrote:
| > _Our teams continue pushing the frontiers of our latest models
| with safety at the core._
|
| They're not kidding, Gemini (at least what's currently available)
| is so safe that it's not all that useful.
|
| The "safety" permeates areas where you wouldn't even expect it,
| like refusing to answer questions about "unsafe" memory
| management in C. It interjects lectures about safety in answers
| when you didn't even ask it to do that in the question.
|
| For example, I clicked on _one of the four example questions that
| Gemini proposes to help you get started_ and it was something
| like "Write an SMS calling in sick. It's a big presentation day
| and I'm sad to let the team down." Gemini decided to tell me that
| it can't impersonate positions of trust like medical
| professionals or employers (which is not at all what I asking it
| to do).
|
| The other things I asked it, it gave me wrong and obviously wrong
| answers. The funniest (though glad it was obviously wrong) was
| when I asked it "I'm flying from Karachi to Denver. Will I need
| to pick up my bags in Newark?" and it told me "no, because
| Karachi to Newark is a domestic flight"
|
| Unless they stop putting "safety at the core," or figure out how
| to do it in a way that isn't unnecessarily inhibiting, annoying,
| and frankly insulting (protip: humans don't like to be accused of
| asking for unethical things, especially when they weren't asking
| for them. when other humans do that to us, we call that assuming
| the worst and it's a negative personality trait), any
| announcements/releases/breakthroughs from Google are going to be
| a "meh" for me.
| dghlsakjg wrote:
| This is incredible if it isn't just hype!
|
| I hope the demos aren't fudged/scripted like Google did with
| Gemini 1.0
| amf12 wrote:
| These demos seem to be videos from AI studio, and which display
| the time in seconds. Hopefully not fudged.
| EZ-E wrote:
| Remember AI Dungeon and how it was frustrating about how it would
| forget what happened previously? With a 10M context window, am I
| right to assume it would be possible to weave a story which would
| span with multiple multiple books worth of content? (more or less
| 1400 pages)
| dougmwne wrote:
| Pretty much! Check out this demo of finding a scene in a 1400
| page book based on a stick figure drawing. Mind blowing, right?
|
| https://twitter.com/JeffDean/status/1758148159942091114
| VikingCoder wrote:
| Dear Google,
|
| Teach Gemini how to be a Dungeon Master, and run free
| adventures at Comic Con.
|
| Then offer it up as a subscription.
|
| Sincerely,
|
| Everyone
| eigenvalue wrote:
| Based on what I've seen so far, I think the probability that this
| is actually better than GPT4 on the kind of real world coding
| tasks that I use it for is less than 1%. Literally everything
| from Google on this has been vaporware or laughably bad in actual
| practice in my personal experience. Which is totally insane to me
| given their financial resources, human resources, and multi-year
| lead in AI/DL research, but that's what seems to have happened. I
| certainly hope that they can develop and actually release a
| capable model, but at this point, I think you have to be deeply
| skeptical of everything they say until such a model is available
| for real by the public and you can try it on actual, real tasks
| and not fake benchmark nonsense and waitlists.
| scarmig wrote:
| One interesting tidbit from the technical report:
|
| >HumanEval is an industry standard open-source evaluation
| benchmark (Chen et al., 2021), but we found controlling for
| accidental leakage on webpages and open-source code repositories
| to be a non-trivial task, even with conservative filtering
| heuristics. An analysis of the test data leakage of Gemini 1.0
| Ultra showed that continued pretraining on a dataset containing
| even a single epoch of the test split for HumanEval boosted
| scores from 74.4% to 89.0%, highlighting the danger of data
| contamination. We found that this sharp increase persisted even
| when examples were embedded in extraneous formats (e.g. JSON,
| HTML). We invite researchers assessing coding abilities of these
| models head-to-head to always maintain a small set of truly held-
| out test functions that are written in-house, thereby minimizing
| the risk of leakage. The Natural2Code benchmark, which we
| announced and used in the evaluation of Gemini 1.0 series of
| models, was created to fill this gap. It follows the exact same
| format of HumanEval but with a different set of prompts and
| tests.
| llm_trw wrote:
| Yeah. I'll believe that when I can use it.
| DidISayTooMuch wrote:
| How can I fine tune these models for my use? Their docs isn't
| clear whether the Gemini models are fine tuneable.
| qwertox wrote:
| As a sidenote, it's worth clicking the play button and then
| checking how they're highlighting the current paragraph and word
| in the inspector.
| aubanel wrote:
| For reference, here is the technical report:
| https://storage.googleapis.com/deepmind-media/gemini/gemini_...
| dumbmachine wrote:
| It would, probably, be cost prohibitive to use 10M context to
| it's fullest each time.
|
| I instead hope for to have an api to access to the context as a
| datastore, so like RAG we can control what to store but unlike
| rag all data stays within context.
| killthebuddha wrote:
| 10M tokens is an absolute game changer, especially if there's no
| noticeable decay in quality with prompt size. We're going to see
| things like entire domain specific languages embedded in prompts.
| IMO people will start thinking of the prompt itself as a sort of
| runtime rather than a static input.
|
| Back when OpenAI still supported raw text completion with text-
| davinci-003 I spent some time experimenting with tiny prompt-
| embedded DSLs. The results were very, very, interesting IMO. In a
| lot of ways, text-davinci-003 with embedded functions still feels
| to me like the "smartest" language model I've ever interacted
| with.
|
| I'm not sure how close we are to "superintelligence" but for
| baseline general intelligence we very well could have already
| made the prerequisite technological breakthroughs.
| empath-nirvana wrote:
| It's pretty slow, though looks like up to 60 seconds for some
| of the answers, and uses god knows how much compute, so there's
| probably going to be some trade offs -- you're going to want to
| make sure that that much context is actually useful for what
| you want.
| drusepth wrote:
| TBF: when talking about the first "superintelligence", I'd
| expect it to take unreasonable amounts of compute and/or be
| slow -- that can always be optimized. Bringing it into
| existence in the first place is the hardest part.
| unshavedyak wrote:
| Yea. Of course for some tasks we need speed, but i've been
| kinda surprised that we haven't seen very slow models which
| perform far better than faster models. We're treading new
| territory, and everyone seems to make models that are "fast
| enough".
|
| I wanna see how far this tech can scale, regardless of
| speed. I don't care if it takes 24h to formulate a
| response. Are there "easy" variables which drastically
| improve output?
|
| I suspect not. I imagine people have tried that. Though i'm
| still curious as to why.
| Yusefmosiah wrote:
| I see a lot of talk about retrieval over long context. Some even
| think this replaces RAG.
|
| I don't care if the model can tell me which page in the book or
| which code file has a particular concept. RAG already does this.
| I want the model to notice how a concept is distributed
| throughout a text, and be able to connect, compare, contrast,
| synthesize, and understand all the ways that a book touches on a
| theme, or to rewrite multiple code files in one pass, without
| introducing bugs.
|
| How does Gemini 1.5's reasoning compare to GPT-4? GPT-4 already
| has superhuman memory; its bottleneck is its relatively weak
| reasoning.
| sinuhe69 wrote:
| In my experience (I work mostly and deeply with Bard/Gemini),
| the reasoning capability of Gemini is quite good. Gemini Pro is
| already much better than ChatGPT 3.5, but they still make quite
| a few mistakes along the way. What is more worrying is that
| when these models made mistakes, they tried really hard to
| justify their reasoning (errors), practically misleading the
| users. Because of their high mimicry ability, users really have
| to pay attention to validate and eventually spot the errors. Of
| course, this is still far below the human level, so I'm not
| sure whether they add value or are more of a burden.
| og_kalu wrote:
| The most impressive demonstration of long context is this in my
| opinion,
|
| https://imgur.com/a/qXcVNOM
|
| Testing language translation abilities of an extremely obscure
| language after passing in one grammar book as context.
| petargyurov wrote:
| Version number suggests they're waiting to announce something
| bigger already?
| bloopernova wrote:
| Hooray for competition.
| luke-stanley wrote:
| Still no Ultra model API available to UK devs? Considering
| Deepmind's London base, this is kinda strange. Maybe they could
| ask Ultra how to roll it out faster?
| bobvanluijt wrote:
| Demo with Google AI Studio:
| https://twitter.com/bobvanluijt/status/1758185143116730875
| processing wrote:
| just wade through documentation to access it?
|
| clicking on the AI studio link doesn't show me the app page - it
| redirects to a document on early access. I do as required - go
| back and try clicking on the AI studio link and I'm redirected to
| the document on turning early access.
|
| frustrating.
| robertlagrant wrote:
| Slightly surprisingly I can't get to AI Studio from the UK. It is
| available in quite a few countries, but not here.
| ChildOfChaos wrote:
| Is this just more nonsense from Google though? I expect big
| things from Google, but they need to shut up and actually release
| stuff instead of saying how amazing there stuff is and then
| release potato ai, nothing they have done in the AI space
| recently has lived up to any of the hype, they should stay silent
| for a bit then release something that kills GPT4 if they honestly
| are able but instead they are just full of hype.
| sinuhe69 wrote:
| Yeah, their Gemini demo was a disaster. But they have released
| their Ultra model for the general audience, so you can test
| them yourself. Talking about killing the competitor is a little
| funny, considering they are all generative LLM based on the
| same principles (and general architecture) with their inherent
| flaws and shortcomings. All of them can not even execute a
| basic plan like a cheap human assistant. So their values are
| very limited.
|
| Breakthrough will only come with a next generation
| architecture. LLM for special domains is currently the most
| promising approach.
| ChildOfChaos wrote:
| Yeah but even with ultra they kept saying how it was better
| than GPT4 and then when it actually got released it was
| awful.
| jeffbee wrote:
| A little off-topic I guess, but is anyone else seeing what I am
| seeing: a total inability to actually upgrade to paid Gemini?
| Every time I try to sign up it serves me an error page: "We're
| sorry - Google One storage plans aren't available right now."
| DrNosferatu wrote:
| Did they say a general availability date?
|
| (a bit confused)
| dang wrote:
| There's also
| https://twitter.com/JeffDean/status/1758146022726041615
|
| (via https://news.ycombinator.com/item?id=39383593, but we merged
| those comments hither)
| topicseed wrote:
| 1 million tokens?? This is wild and a lot of RAG can be removed.
| topicseed wrote:
| Is this going to be only for consumer Gemini app or for
| API/Vertex too? The context window is..... Simply lovely.
| summerlight wrote:
| One interesting proposal here is a multiple NIAH retrieval
| benchmark. When they put 100 needles, then the recall rate
| becomes considerably lower, something around 60~70%. Not sure
| what's the exact configuration of this benchmark, but intuitively
| this makes sense and should be a critical metric for the model's
| reliability.
| reissbaker wrote:
| The long context length is of course incredible, but I'm more
| shocked that the _Pro_ model is now on par with Ultra (~GPT-4, at
| least the original release). That implies when they release 1.5
| Ultra, we 'll finally have a GPT-4 killer. And assuming that 1.5
| Pro is priced similarly to the current Pro, that's a 4x price
| advantage per-token.
|
| Not surprising that OpenAI shipped a blog post today about their
| video generation -- I think they're feeling considerable heat
| right now.
| topicseed wrote:
| Gemini 1 Ultra was also said to be on par with ChatGPT 4 and
| it's not really there so let's see for ourselves when we can
| get our hands on it.
| reissbaker wrote:
| Ultra benchmarked around the original release of GPT-4, not
| the current model. My understanding is that was fairly
| accurate -- it's close to current GPT-4 but not quite equal.
| However, close-to-GPT-4 but 4x cheaper and 10x context length
| would be very impressive and IMO useful.
| m3kw9 wrote:
| imagine sending 5-10 mbs over the network per request, and the
| cost per token. You may accidently go broke after a big lag.
| system2 wrote:
| Let's hope this lowers the pricing of GPT-4 to GPT3.5 levels.
| Because of Open AI's ridiculous pricing, we can't use it
| regularly as it would cost us thousands of dollars per month.
| stolsvik wrote:
| So, this has native image/video modality. I wonder whether that
| gives it an edge in physical / world understanding? That is,
| handling and navigating our 3/4 dimensions? Cause and effect and
| so on?
| animanoir wrote:
| Google is so finished, they are so late on this.
| tmaly wrote:
| Is there a $20 a month option for 1.5 Ultra?
|
| If there is, where do I sign up?
| ancorevard wrote:
| Is this a blog post of did they actually ship?
| jstummbillig wrote:
| Imagine a day, when your new recording setting 10M token context
| model is not enough to make it to hn #1
|
| Wild times.
| thot_experiment wrote:
| I gotta say, I've been trying out Gemini recently and it's
| embarrassingly bad. I can't take anything google puts out
| seriously when their current offerings are so so much worse than
| ChatGPT (or even local llama!).
|
| As a particularly egregious example, yesterday night I gave
| Gemini a list of drinks and other cocktail ingredients I had
| laying around and asked for some recommendations for cute drinks
| that I could make. It's response:
|
| > I'm just a language model, so I can't help you with that.
|
| ChatGPT 3.5 came up with several delicious options with clear
| instructions, but it's not just this instance, I've NEVER gotten
| a response from Gemini that I even felt was _more useful_ than
| just a freaking bing search! Much less better than ChatGPT. I 'm
| just going to assume they're using cherrypicked metrics to make
| themselves feel better until proven otherwise. I have zero
| confidence in Google's AI plays, and I assume all their competent
| talent is now at OpenAI or Anthropic.
| mikeweiss wrote:
| Does anyone know what kinds of GPUs/Chips Google is using for
| Gemini? They aren't using Nvidia correct?
| sackfield wrote:
| TPUs: https://cloud.google.com/tpu?hl=en
| jakub_g wrote:
| Very off-topic but I can't help, the pace of change reminds of
| the "Bates 4000" sketch from The Onion Movie:
|
| https://m.youtube.com/watch?v=fw7FniaeaSo
___________________________________________________________________
(page generated 2024-02-15 23:00 UTC)