[HN Gopher] Gemini 2.0 is now available to everyone
___________________________________________________________________
Gemini 2.0 is now available to everyone
Author : meetpateltech
Score : 361 points
Date : 2025-02-05 16:03 UTC (6 hours ago)
(HTM) web link (blog.google)
(TXT) w3m dump (blog.google)
| lowmagnet wrote:
| Here's me not using Gemini 1 because the only use case for me for
| old assistant is setting a timer. Because of reports that Gemini
| is randomly incapable of setting one.
| progbits wrote:
| How does release of LLM API relate to assistant?
| w0m wrote:
| Pixels replaced Assistent w/ Gemini a while back and it was
| horrendous; would answer questions but not perform the basic
| tasks you actually used Assistant for (setting timer,
| navigating, home control, etc).
|
| Seems like they're approaching parity (finally) months and
| months later (alarms/tv control work at least now), but
| losing basic oft-used functionality is a serious fumble.
| KeplerBoy wrote:
| It's not just pixels. That feature rolled out to billions
| of android phones.
| VenturingVole wrote:
| I feel as though some of the rigour of systems engineering
| is missing from AI model development/integration. Not a
| negative per-se, as velocity is _incredibly_ important: But
| it seems a lot of lessons have to be learned again.
|
| I sometimes forget - it is still very early days relatively
| speaking.
|
| As a user of Gemini 2.0, so far I have been very impressed
| for the most part.
| progbits wrote:
| Thanks, didn't know, never really used these voice
| assistants.
|
| It's a weird choice, I suppose the endless handcrafted
| rules and tools don't scale across languages and usecases
| but then LLM are not good at reliability. And what's the
| point of using assistant that will not do the task
| reliably, if you have to double-check you are better of not
| using it...
| w0m wrote:
| The issue wasn't inconsistency it was "had no home
| integration at all" at launch. They rushed to roll out
| the 'new' assistant and didn't bother waiting for the
| basic feature set first.
|
| Today; it works ~perfectly for TV control/Alarm setting -
| I can't think of it not working first try in the last
| month or so for me. Maybe more consistent than prior?
|
| The rollout was simply borked from the PM/Decision making
| side.
| butlike wrote:
| Flash is back, baby.
|
| Next release should be called Gemini Macromedia
| VenturingVole wrote:
| Or perhaps it'll help people to weave their dreams together and
| so it should be called.. ahh I feel old all of a sudden.
| benob wrote:
| You made me feel old ;)
| VenturingVole wrote:
| Everyone old is new again!
|
| On a serious note - LLMs have actually brought me a lot of
| joy lately and elevated my productivity substantially
| within the domains in which I choose to use them. When
| witnessing the less experienced more readily accept outputs
| without understanding the nuances there's definitely
| additional value in being... experienced.
| sbruchmann wrote:
| Google Frontpage?
| VenturingVole wrote:
| I feel seen!
|
| Also just had to explain to the better half why I suddenly
| shuddered and pulled such a face of despair.
| butlike wrote:
| Why the p[artition?
| lamuswawir wrote:
| Dreamweaver!
| drewda wrote:
| Google Gemini MX 2026
| ChocolateGod wrote:
| This is going to send shockwaves through the industry.
| seydor wrote:
| .SWF is all we need
| sho_hn wrote:
| How about Gemini Director for the next agentic stuff.
| weatherlite wrote:
| Gemini Applets
| sho_hn wrote:
| Anyone have a take on how the coding performance (quality and
| speed) of the 2.0 Pro Experimental compares to o3-mini-high?
|
| The 2 million token window sure feels exciting.
| mohsen1 wrote:
| I don't know what those "needle in haystack" benchmarks are
| testing for because in my experience dumping a big amount of
| code in the context is not working as you'd expect. It works
| better if you keep the context small
| airstrike wrote:
| I think the sweet spot is to include _some_ context that is
| limited to the scope of the problem and benefit from the
| longer context window to keep longer conversations going. I
| often go back to an earlier message on that thread and
| rewrite with understanding from that longer conversation so
| that I can continue to manage the context window
| cma wrote:
| Claude works well for me loading code up to around 80% of its
| 200K context and then asking for changes. If the whole
| project can't fit I try to at least get in headers and then
| the most relevant files. It doesn't seem to degrade. If you
| are using something like an AI IDE a lot of times they don't
| really get the 200K context.
| TuxSH wrote:
| Bad (though I haven't tested autocompletion). It's
| underperforming other models on livebench.ai.
|
| With Copilot Pro and DeepSeek's website, I ran "find logic
| bugs" on a 1200 LOC file I actually needed code review for:
|
| - DeepSeek R1 found like 7 real bugs out of 10 suggested with
| the remaining 3 being acceptable false positives due to missing
| context
|
| - Claude was about the same with fewer remaining bugs; no
| hallucinations either
|
| - Meanwhile, Gemini had 100% false positive rate, with many
| hallucinations and unhelpful answers to the prompt
|
| I understand Gemini 2.0 is not a reasoning model, but
| DeepClaude remains the most effective LLM combo so far.
| gwern wrote:
| 2.0 Pro Experimental seems like the big news here?
|
| > Today, we're releasing an experimental version of Gemini 2.0
| Pro that responds to that feedback. It has the strongest coding
| performance and ability to handle complex prompts, with better
| understanding and reasoning of world knowledge, than any model
| we've released so far. It comes with our largest context window
| at 2 million tokens, which enables it to comprehensively analyze
| and understand vast amounts of information, as well as the
| ability to call tools like Google Search and code execution.
| Tiberium wrote:
| It's not _that_ big of a news because they already had gemini-
| exp-1206 on the API - they just didn 't say it was Gemini 2.0
| Pro until today. Now the AI Studio marks it as 2.0 Pro
| Experimental - basically an older snapshot, the newer one is
| gemini-2.0-pro-exp-02-05.
| Alifatisk wrote:
| Oh so the previous model gemini-exp-1206 is now
| gemini-2.0-pro-experimental on aistudio? Is it better than
| gemini-2.0-flash-thinking-exp?
| mohsen1 wrote:
| > available via the Gemini API in Google AI Studio and Vertex AI.
|
| > Gemini 2.0, 2.0 Pro and 2.0 Pro Experimental, Gemini 2.0 Flash,
| Gemini 2.0 Flash Lite
|
| 3 different ways of accessing the API, more than 5 different but
| extremely similarly named models. Benchmarks only comparing to
| their own models.
|
| Can't be more "Googley"!
| sho_hn wrote:
| I think this is a good summary:
| https://storage.googleapis.com/gweb-developer-goog-blog-asse...
| vdfs wrote:
| - Experimental(tm)
|
| - Preview(tm)
|
| - Coming soon(tm)
| esafak wrote:
| Don't forget the OG, "Beta".
|
| https://en.wikipedia.org/wiki/History_of_Gmail#Extended_bet
| a...
| llm_trw wrote:
| You missed the first sentence of the release:
|
| >In December, we kicked off the agentic era by releasing an
| experimental version of Gemini 2.0 Flash
|
| I guess I wasn't building AI agents in February last year.
| soulofmischief wrote:
| Yeah some of us have been working on agents predominately for
| years now, but at least people are finally paying attention.
| Can't wait to be told how I'm following a hype cycle again.
| raverbashing wrote:
| Honestly naming conventions in the AI world have been appalling
| regardless of the company
| belval wrote:
| Google isn't even the worst in my opinion. From the top of my
| head
|
| Anthropic:
|
| Claude 1 Claude Instant 1 Claude 2 Claude Haiku 3 Claude
| Sonnet 3 Claude Opus 3 Claude Haiku 3.5 Claude Sonnet 3.5
| Claude Sonnet 3.5v2
|
| OpenAI:
|
| GPT-3.5 GPT-4 GPT-4o-2024-08-06 GPT-4o GPT-4o-mini o1 o3-mini
| o1-mini
|
| Fun times when you try to setup throughput provisioning.
| jorvi wrote:
| I don't understand why if they're gonna use shorthands to
| make the tech seem cooler, they can't at least use mnemonic
| shorthands.
|
| Imagine if it went like this: Mnemonics:
| m(ini), r(easoning), t(echnical) Claude 3m
| Claude 3mr Claude 3mt Claude 3mtr Claude
| 3r Claude 3t Claude 3tr
| jug wrote:
| Google is the least confusing to me. Old school version
| number and Pro is better than Flash which is fast and for
| "simple" stuff (which can be effortless intermediate level
| coding at this point).
|
| OpenAI is crazy. There may be a day when we might have o5
| that is reasoning and 5o that is not, and where they belong
| to different generations too, snd where "o" meant "Omni"
| despite o1-o3 not being audiovisual anymore like 4o.
|
| Anthropic crazy too. Sonnets and Haikus, just why... and a
| 3.5 Sonnet that was released in October that was better than
| 3.5 Sonnet. (Not a typo) And no one knows why there never was
| a 3.5 Opus.
| NitpickLawyer wrote:
| > And no one knows why there never was a 3.5 Opus.
|
| If you read between the lines it's been pretty clear. The
| top labs are keeping the top models in house and use them
| to train the next generation (either SotA or faster/cheaper
| etc).
| esafak wrote:
| 4o is a more advanced model than o1 or o3, right!?
| dpkirchner wrote:
| Mistral vs mistral.rs, Llama and llama.cpp and ollama, groq
| and grok. It's all terrible.
| danielbln wrote:
| Claude Sonnet 3.5...no, not that 3.5, the new 3.5. o3-mini,
| no not o2. yes there was o1, yes it's better than gpt-4o.
| seanhunter wrote:
| I don't know why you're finding it confusing. There's Duff,
| Duff Lite and now there's also all-new Duff Dry.
| whynotminot wrote:
| I tend to prefer Duff Original Dry and Lite, but that's just
| me
| justanotheratom wrote:
| They actually have two "studios"
|
| Google AI Studio and Google Cloud Vertex AI Studio
|
| And both have their own documentation, different ways of
| "tuning" the model.
|
| Talk about shipping the org chart.
| bn-l wrote:
| > Talk about shipping the org chart.
|
| Good expression. I've been thinking about a way to say
| exactly this.
| lelandfe wrote:
| A pithy reworking of Conway's Law
| https://en.wikipedia.org/wiki/Conway%27s_law
| echelon wrote:
| I love this phrase so much.
|
| Google still has some unsettled demons.
| jiggawatts wrote:
| > Talk about shipping the org chart.
|
| To be fair, Microsoft has shipped like five AI portals in the
| last two years. Maybe four -- I don't even know any more.
| I've lost track of the renames and product (re)launches.
| felixg3 wrote:
| They made a new one to unite them all: Microsoft Fabric.
|
| https://xkcd.com/927/
| itissid wrote:
| I wonder what changelog of the two studio products tell us
| about internal org fights(strifes)?
| ssijak wrote:
| Working with google APIs is often an exercise in frustration. I
| like their base cloud offering the best actually, but their
| additional APIs can be all over the place. These AI related are
| the worst.
| butz wrote:
| Does "everyone" here means "users with google accounts"?
| esafak wrote:
| Benchmarks or it didn't happen. Anything better than
| https://lmarena.ai/?leaderboard?
|
| My experience with the Gemini 1.5 models has been positive. I
| think Google has caught up.
| og_kalu wrote:
| Livebench is better. llmarena is a vibes benchmark
| Hrun0 wrote:
| Some of my saved bookmarks:
|
| - https://aider.chat/docs/leaderboards/
|
| - https://www.prollm.ai/leaderboard
|
| - https://www.vellum.ai/llm-leaderboard
|
| - https://lmarena.ai/?leaderboard
| yogthos wrote:
| I wish the blog mentioned whether they backported DeepSeek ideas
| into their model to make it more efficient.
| weatherlite wrote:
| Only DeepSeek is allowed to take ideas from everyone else?
| singhrac wrote:
| What is the model I get at gemini.google.com (i.e. through my
| Workspace subscription)? It says "Gemini Advanced" but there are
| no other details. No model selection option.
|
| I find the lack of clarity very frustrating. If I want to try
| Google's "best" model, should I be purchasing something? AI
| Studio seems focused around building an LLM wrapper app, but I
| just want something to answer my questions.
|
| Edit: what I've learned through Googling: (1) if you search "is
| gemini advanced included with workspace" you get an AI overview
| answer that seems to be incorrect, since they now include Gemini
| Advanced (?) with every workspace subscription.(2) a page exists
| telling you to buy the add-on (Gemini for Google Workspace), but
| clicking on it says this is no longer available because of the
| above. (3) gemini.google.com says "Gemini Advanced" (no idea
| which model) at the top, but gemini.google.com/advanced redirects
| me to what I have deduced is the consumer site (?) which tells me
| that Gemini Advanced is another $20/month
|
| The problem, Google PMs if you're reading this, is that the
| gemini.google.com page does not have ANY information about what
| is going on. What model is this? What are the limits? Do I get
| access to "Deep Research"? Does this subscription give me
| something in aistudio? What about code artifacts? The settings
| option tells me I can change to dark mode (thanks!).
|
| Edit 2: I decided to use aistudio.google.com since it has a
| dropdown for me on my workspace plan.
| ysofunny wrote:
| hmm did you try clickin where it says 'gemini advanced'? I find
| it opens a drop down
| singhrac wrote:
| I just tried it but nothing happens when I click on that.
| You're talking about the thing on the upper left next to the
| open/close menu button?
| easychris wrote:
| Yes, very frustrating for me as well. I consider now
| purchasing Gemini Advance with another Non-Workspace
| account. :-(
|
| I also found this [1]: " Important:
|
| A chat can only use one model. If you switch between models
| in an existing chat, it automatically starts a new chat. If
| you're using Gemini Apps with a work or school Google
| Account, you can't switch between models. Learn more about
| using Gemini Apps with a work or school account."
|
| I have no idea why the workspace accounts are such
| restricted.
|
| [1] https://support.google.com/gemini/answer/14517446?hl=en
| &co=G...
| rickette wrote:
| "what model are you using, exact name please" is usually the
| first prompt I enter when trying out something.
| lxgr wrote:
| You'd be surprised at how confused some models are about who
| they are.
| freedomben wrote:
| Indeed, asking the model which model it is might be one of
| the worst ways to find that information out
| mynameisvlad wrote:
| Gemini 2.0 Flash Thinking responds with
|
| > I am currently running on the Gemini model.
|
| Gemini 1.5 Flash responds with
|
| > I'm using Gemini 2.0 Flash.
|
| I'm not even going to go on a limb here and say that question
| isn't going to give you an accurate response.
| jug wrote:
| Yeah, they need something in their system prompt to tell
| their name or else they have absolutely no idea what they
| are and will hallucinate to 100% based on training data. If
| you're lucky, the AI just might guess right based on these
| circumstances.
|
| It's not unusual for AI's to think they're OpenAI/ChatGPT
| because it's become so popular that it's leaked into the
| buzz it's trained on.
| miyuru wrote:
| changes must be rolling out now, I can see 3 Gemini 2.0 models
| in the dropdown, with blue "new" badges.
|
| screenshot: https://beeimg.com/images/g25051981724.png
| singhrac wrote:
| This works on my personal Google account, but not on my
| workspace one. So I guess there's no access to 2.0 Pro then?
| I'm ok trying out Flash for now and see if it fixes the
| mistakes I ran into yesterday.
|
| Edit: it does not. It continues to miss the fact that I'm
| (incorrectly) passing in a scaled query tensor to
| scaled_dot_product_attention. o3-mini-high gets this right.
| vel0city wrote:
| As someone with over a decade of Google Apps management
| history, my experiences is Workspace customers are
| practically always the last to get the shiny new features.
| Quite frustrating.
| basch wrote:
| Isn't that generally how it goes? Windows Vista was
| tested on consumers to make 7 Enterprise appropriate?
| panarky wrote:
| If you subscribe to Gemini the menu looks like this, with the
| addition of 2.0 Pro.
|
| https://imgur.com/a/xZ7hzag
| glerk wrote:
| It doesn't for workspace users. No dropdown appears.
| behnamoh wrote:
| The number one reason I don't use Google Gemini is because they
| truncate the input text. So I can't simply paste long documents
| or other kinds of things as raw text in the prompt box.
| radeeyate wrote:
| If you have the need to paste long documents, why don't you
| just upload the file at that point?
| heavyarms wrote:
| The last time I checked (a few days ago) it only had an
| "Upload Image" option... and I have been playing with
| Gemini on and off for months and I have never been able to
| actually upload an image.
|
| It's basically what I've come to expect from most Google
| products at this point: half-baked, buggy, confusing, not
| intuitive.
| Xiol32 wrote:
| Friction.
| johnisgood wrote:
| Claude automatically uploads it as "Pasted text" if it is
| too long that you paste into the textarea. Works either
| way anyways.
| behnamoh wrote:
| Because sometimes the text is the result of my whisper
| transcription.
| nudpiedo wrote:
| Today I wasted 1 hour looking in how to use or where to find
| "Deep Research".
|
| I could not. I have the business workplace standard, which
| contains the Gemini advance, not sure whether I need a VPN, pay
| a separate AI product, or even pay a higher workplace tier or
| what the heck is going on at all.
|
| There are so many confusing products interrelated and lack of
| focus everywhere that I really do not know anymore whether it
| is worth as an AI provider.
| gerad wrote:
| You need to pay for Gemini to access it. In my experience,
| it's not worth it. So much potential in the experience, but
| the AI isn't good enough.
|
| I'm curious about the OpenAI alternative, but am not willing
| to pay $200/month.
| nudpiedo wrote:
| if it would make whole market research on products and
| companies I would gladly pay for it... but a bit unsure
| from Europe where it seems to be everything restricted due
| political boundaries.
| A_D_E_P_T wrote:
| I have OpenAI Deep Research access in Europe and it is
| extremely good. It's also particularly good at niche
| market research in products and companies.
|
| Happy to give you a demo. If you want to send me a
| prompt, I can share a link to the resulting output.
| coolgoose wrote:
| Plus one on this it's so stupid, but also mandatory in a way.
| Sigh
| vldmrs wrote:
| This is funny how bad UI is on some of websites which are
| considered the best. Today I tried to find prices for Mistral
| models but I couldn't. Their prices page leads to 404...
| PhilippGille wrote:
| Just in case you're still interested in their pricing, it's
| towards the bottom of [1], section "How to buy", when
| changing the selection from "Self-hosted" to "Mistral Cloud".
|
| [1] https://mistral.ai/en/products/la-plateforme
| behnamoh wrote:
| if only these models were good at web development and could
| be used in agentic frameworks to build high quality
| website... wait...
| gallerdude wrote:
| Is there really no standalone app, like ChatGPT/Claude/DeepSeek,
| available yet for Gemini?
| bangaladore wrote:
| Presumably any app that is API agnostic works fine.
|
| I'm not sure why you would want an app for each anyways.
| silvajoao wrote:
| The standalone app is at https://gemini.google.com/app, and is
| similar to ChatGPT.
|
| You can also use https://aistudio.google.com to use base models
| directly.
| browningstreet wrote:
| What do you mean by an app? I have a Gemini app on my iPhone.
| pmayrgundter wrote:
| I tried voice chat. It's very good, except for the politics
|
| We started talking about my plans for the day, and I said I was
| making chili. G asked if I have a recipe or if I needed one. I
| said, I started with Obama's recipe many years ago and have
| worked on it from there.
|
| G gave me a form response that it can't talk politics.
|
| Oh, I'm not talking politics, I'm talking chili.
|
| G then repeated form response and tried to change conversation,
| and as long as I didn't use the O word, we were allowed to
| proceed. Phew
| xnorswap wrote:
| I find it horrifying and dystopian that the part where it
| "Can't talk politics" is just accepted and your complaint is
| that it interrupts your ability to talk chilli.
|
| "Go back to bed America." "You are free, to do as we tell you"
|
| https://youtu.be/TNPeYflsMdg?t=143
| falcor84 wrote:
| Hear, hear!
|
| There has to be a better way about it. As I see it, to be
| productive, AI agents have to be able to talk about politics,
| because at the end of the day politics are everywhere. So
| following up on what they do already, they'll have to define
| a model's political stance (whatever it is), and to have it
| hold its ground, voicing an opinion or abstaining from
| voicing an opinion, but continuing the conversation, as a
| person would (at least as those of us who don't rage-quit a
| conversation when they hear something slightly
| controversial).
| freedomben wrote:
| There aren't many mono-cultures as strong as silicon valley
| politics. Where this intersects with my beliefs I love it,
| but where it doesn't it is maddening. I suspect that's how
| most people feel.
|
| But anyway, when one is rarely or never challenged on their
| beliefs, they become rusty. Do you trust them to do a good
| job training their own views into the model, let alone
| training in the views of someone on the opposite side of
| the spectrum?
| falcor84 wrote:
| I don't know if I trust them as such, but they're doing
| it anyway, so I'd appreciate it being more explicit.
|
| Also, as long as it's not training the whole model on the
| fly as with the Tay fiasco, I'd actually be quite
| interested in an LLM that would debate you and possibly
| be convinced and change its stance for the rest of that
| conversation with you. "Strong opinions weakly held" and
| all.
| xnorswap wrote:
| Indeed, you can facilitate talking politics without having
| a set opinion.
|
| It's a fine line, but it is something the BBC managed to do
| for a very long time. The BBC does not itself present an
| opinion on Politics yet facilitates political discussion
| through shows like Newsnight and The Daily Politics (rip).
| ImHereToVote wrote:
| BBC is great at talking about the Gaza situation. Makes
| it seem like people are just dying from natural causes
| all the time.
| jay_kyburz wrote:
| Australia's ABC makes it fairly clear who is killing who
| but also manages to avoid taking sides.
| freedomben wrote:
| I agree it's ridiculous that the mention of a politician
| triggers the block so feels overly tightened (which is the
| story of existencer for Gemini), but the alternative is that
| the model will have the politics of it's creators/trainers.
| Is that preferable to you? (I suppose that depends on how
| well your politics align with Silicon Valley)
| danenania wrote:
| I think it will still have the politics of its creators
| even if it's censored with a superficial "no politics"
| rule. Politics is adjacent to almost everything, so it's
| going to leak through no matter what you talk about.
| duxup wrote:
| Online the idea of "no politics" is often used as a way to
| try to stifle / silence discussion too. It's disturbingly
| fitting to the Gemini example.
|
| I was a part of a nice small forum online. Most posts were
| everyday life posts / personal. The person who ran it seemed
| well meaning. Then a "no politics" rule appeared. It was fine
| for a while. I understood what they meant and even I only
| want so much outrage in my small forums.
|
| Yet one person posted about how their plans to adopt were in
| jeopardy over their state's new rules about who could adopt
| what child. This was a deeply important and personal topic
| for that individual.
|
| As you can guess the "no politics" rule put a stop to that.
| The folks who supported laws like were being proposed of
| course thought that they shouldn't discuss it because it is
| "politics", others felt that this was that individual talking
| about their rights and life, it wasn't "just politics". Whole
| forum fell apart after that debacle.
|
| Gemini's response here is sadly fitting internet discourse...
| in bad way.
| FeepingCreature wrote:
| To be honest, the limiting factor is often competent
| moderation.
| duxup wrote:
| Yup.
|
| I sometimes wish magically there could be a social
| network of:
|
| 1. Real people / real validated names and faces.
|
| 2. Paid for by the users...
|
| 3. Competent professional moderation.
|
| Don't get me wrong I like my slices of anonymity, and
| free services, but my positive impressions of such
| products is waning fast. Over time I want more real...
| redcobra762 wrote:
| Eh, OP isn't stopped from talking politics, Gemini('s owner,
| Google) is merely exercising its right to avoid talking about
| politics with OP. That said, the restriction seems too tight,
| since merely mentioning Obama ought not count as "politics".
| From a technical perspective that should be fixed.
|
| OP can go talk politics until he's blue in the face with
| someone willing to talk politics with them.
| bigstrat2003 wrote:
| There's nothing wrong with (and in fact much to be said in
| favor of) a "no politics" rule. When I was growing up it was
| common advice to not discuss politics/religion in mixed
| company. At one point I thought that was stupid fuddy-duddy
| advice, because people are adults and can act reasonably even
| if they disagree. But as I get older, I realize that I was
| wrong: people really, really can't control their emotions
| when politics comes up and it gets ugly. Turns out that the
| older generation was correct, and you really shouldn't talk
| politics in mixed company.
|
| Obviously in this specific case the user isn't trying to talk
| politics, but the rule isn't dystopian in and of itself. It's
| simply a reflection of human nature, and that someone at
| Google knows it's going to be a lot of trouble for no gain if
| the bot starts to get into politics with users.
| avar wrote:
| As an outsider's perspective: This aspect of American
| culture seems self-reinforcing.
|
| It's not like things can't get heated when people in much
| of the rest of the world discuss politics.
|
| But if the subject isn't entirely verboten, adults will
| have some practice in agreeing to disagree, and moving on.
|
| With AI this particular cultural export has gone from a
| quaint oddity, to something that, as a practical matter,
| can be really annoying sometimes.
| petre wrote:
| I find it kind of useless due to the no politics and I usually
| quickly lose my patience with it. Same with DeepSeek. Meanwhile
| you can have a decent conversation with Mistral, Claude, pi.ai
| and other LLMs. Even Chat GPT, although the patronizing
| appologizing tone is annoying.
| greenavocado wrote:
| Can censorship damage to LLMs be mitigated with LoRA fine-
| tuning?
| everdrive wrote:
| This is AI. Someone else decides what topics and what answers
| are acceptable.
| leetharris wrote:
| These names are unbelievably bad. Flash, Flash-Lite? How do these
| AI companies keep doing this?
|
| Sonnet 3.5 v2
|
| o3-mini-high
|
| Gemini Flash-Lite
|
| It's like a competition to see who can make the goofiest naming
| conventions.
|
| Regarding model quality, we experiment with Google models
| constantly at Rev and they are consistently the worst of all the
| major players. They always benchmark well and consistently fail
| in real tasks. If this is just a small update to the gemini-
| exp-1206 model, then I think they will still be in last place.
| falcor84 wrote:
| > It's like a competition to see who can make the goofiest
| naming conventions.
|
| I'm still waiting for one of them to overflow from version 360
| down to One.
| cheeze wrote:
| Just wait for One X, S, Series X, Series X Pro, Series X Pro
| with Super Fast Charging 2.0
| kridsdale3 wrote:
| Meanwhile:
|
| Playstation
|
| Playstation 2
|
| Playstation 3
|
| Playstation 4
|
| Playstation 5
| miohtama wrote:
| Unlike Sony, Google attempts to confuse people for people
| to use their limited models as they are free to run and
| most won't pay.
| tremarley wrote:
| Thankfully the names aren't as bad as how Sony names their
| products like earphones
| Skunkleton wrote:
| Haiku/sonnet/opus are easily the best named models imo.
| throwaway314155 wrote:
| you mean sonnet-3.5 (first edition, second edition)?
| kridsdale3 wrote:
| I have a signed copy of the first-edition
| risho wrote:
| as a person who thought they were arbitrary names when i
| first discovered them and spent an hour trying to figure out
| the difference i disagree. it gets even more confusion when
| you realize that opus, which according to their silly naming
| scheme is supposed to be the biggest and best model they
| offer is seemingly abandoned and that title has been given to
| sonnet which is supposed to be the middle of the road model.
| lamuswawir wrote:
| Flash Lite is the least bad.
| vok wrote:
| https://www.smbc-comics.com/comic/version
| mtaras wrote:
| Updates for Gemini models will always be exciting to me because
| of how generous free API tier is, I barely run into limits for
| personal use. Huge context window is a huge advantage for use in
| personal projects, too
| silvajoao wrote:
| Try out the new models at https://aistudio.google.com.
|
| It's a great way to experiment with all the Gemini models that
| are also available via the API.
|
| If you haven't yet, try also Live mode at
| https://aistudio.google.com/live.
|
| You can have a live conversation with Gemini and have the model
| see the world via your phone camera (or see your desktop via
| screenshare on the web), and talk about it. It's quite a cool
| experience! It made me feel the joy of programming and using
| computers that I had had so many times before.
| Ninjinka wrote:
| Pricing is CRAZY.
|
| Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for
| 2.0 Flash-Lite and 1.5 Flash.
|
| For gpt-4o-mini-audio-preview, it's $10 per million tokens of
| audio input.
| sunaookami wrote:
| Sadly: "Gemini can only infer responses to English-language
| speech."
|
| https://ai.google.dev/gemini-api/docs/audio?lang=rest#techni...
| KTibow wrote:
| The increase is likely because 1.5 Flash was actually cheaper
| than all other STT services. I wrote about this a while ago at
| https://ktibow.github.io/blog/geminiaudio/.
| radeeyate wrote:
| I feel that the audio interpreting aspects of the Gemini
| models aren't just STT. If you give it something like a song,
| it can give you information about it.
| denysvitali wrote:
| When will they release Gemini 2.0 Pro Max?
| msuvakov wrote:
| Gemini 2.0 works great with large context. A few hours ago, I
| posted a ShowHN about parsing an entire book in a single prompt.
| The goal was to extract characters, relationships, and
| descriptions that could then be used for image generation:
|
| https://news.ycombinator.com/item?id=42946317
| Alifatisk wrote:
| Which Gemini model is notebooklm using atm? Have they switched
| yet?
| msuvakov wrote:
| Not sure. I am using models/API keys from
| https://aistudio.google.com. They just added new models,
| e.g., gemini-2.0-pro-exp-02-05. Exp models are free of charge
| with some daily quota depending on model.
| jbarrow wrote:
| I've been very impressed by Gemini 2.0 Flash for multimodal
| tasks, including object detection and localization[1], plus
| document tasks. But the 15 requests per minute limit was a severe
| limiter while it was experimental. I'm really excited to be able
| to actually _do_ things with the model.
|
| In my experience, I'd reach for Gemini 2.0 Flash over 4o in a lot
| of multimodal/document use cases. Especially given the
| differences in price ($0.10/million input and $0.40/million
| output versus $2.50/million input and $10.00/million output).
|
| That being said, Qwen2.5 VL 72B and 7B seem even better at
| document image tasks and localization.
|
| [1]
| https://notes.penpusher.app/Misc/Google+Gemini+101+-+Object+...
| Alifatisk wrote:
| > In my experience, I'd reach for Gemini 2.0 Flash over 4o
|
| Why not use o1-mini?
| jbarrow wrote:
| Mostly because OpenAI's vision offerings aren't particularly
| compelling:
|
| - 4o can't really do localization, and ime is worse than
| Gemini 2.0 and Qwen2.5 at document tasks
|
| - 4o mini isn't cheaper than 4o for images because it uses a
| _lot_ of tokens per image compared to 4o (~5600 /tile vs
| 170/tile, where each tile is 512x512)
|
| - o1 has support for vision but is wildly expensive and slow
|
| - o3-mini doesn't yet have support for vision, and o1-mini
| never did
| SuperHeavy256 wrote:
| it sucks btw. I tried scheduling an event in google calendar
| through gemini, and it got the date wrong, the time wrong, and
| the timezone wrong. it set an event that's supposed to be
| tomorrow to next year.
| karaterobot wrote:
| > Gemini 2.0 is now being forced on everyone.
| rzz3 wrote:
| It's funny, I've never actually used Gemini and, though this may
| be incorrect, I automatically assume it's awful. I assume it's
| awful because the AI summaries at the top of Google Search are so
| awful, and that's made me never give Google AI a chance.
| Alifatisk wrote:
| The huge context window is a big selling point.
| blihp wrote:
| I don't think your take is incorrect. I give it a try from time
| to time and it's always been inferior to other offerings for me
| every time I've tested it. Which I find a bit strange as
| NotebookLM (until recently) had been great to use. Whatever...
| there are plenty of other good options out there.
| leonidasv wrote:
| That 1M tokens context window alone is going to kill a lot of RAG
| use cases. Crazy to see how we went from 4K tokens context
| windows (2023 ChatGPT-3.5) to 1M in less than 2 years.
| Alifatisk wrote:
| Gemini can in theory handle 10M tokens, I remember they saying
| it in one of their presentations.
| monsieurbanana wrote:
| Maybe someone knows, what's the usual recommendation regarding
| big context windows? Is it safe to use it to the max, or
| performance will degrade and we should adapt the maximum to our
| use case?
| Topfi wrote:
| We have heard this before when 100k and 200k were first being
| normalized by Anthropic way back when and I tend to be
| skeptical in general when it comes to such predictions, but in
| this case, I have to agree.
|
| Having used the previews for the last few weeks with different
| tasks and personally designed challenges, what I found is that
| these models are not only capable of processing larger context
| windows on paper, but are also far better at actually handling
| long, dense, complex documents in full. Referencing back to
| something upon specific request, doing extensive rewrites in
| full whilst handling previous context, etc. These models also
| have handled my private needle in haystack-type challenges
| without issues as of yet, though those have been limited to
| roughly 200k in fairness. Neither Anthropics, OpenAIs,
| Deepseeks or previous Google models handled even 75k+ in any
| comparable manner.
|
| Cost will of course remain a factor and will keep RAG a viable
| choice for a while, but for the first time I am tempted to
| agree that someone has delivered a solution which showcases
| that a larger context window can in many cases work reliably
| and far more seemlessly.
|
| Is also the first time a Google model actually surprised me
| (positively), neither Bard, nor AI answers or any previous
| Gemini model had any appeal to me, even when testing
| specificially for what other claimed to be strenghts (such as
| Gemini 1.5s alleged Flutter expertise which got beaten by both
| OpenAI and Anthropics equivalent at the time).
| torginus wrote:
| That's not really my experience. Error rate goes up the more
| stuff you cram into the context, and processing gets both
| slower and more expensive with the amount of input tokens.
|
| I'd say it makes sense to do RAG even if your stuff fits into
| context comfortably.
| lamuswawir wrote:
| Try exp-1206. That thing works on large context.
| Alifatisk wrote:
| Exciting news to see these models being released to the Gemini
| app, I would wish my preferences on which model I want to default
| to got saved for further sessions.
|
| How many tokens can gemini.google.com handle as input? How large
| is the context window before it forgets? A quick search said it's
| 128k token window but that applies to Gemini 1.5 Pro, how is it
| now then?
|
| My assumption is that "Gemini 2.0 Flash Thinking Experimental is
| just" "Gemini 2.0 Flash" with reasoning and "Gemini 2.0 Flash
| Thinking Experimental with apps" is just "Gemini 2.0 Flash
| Thinking Experimental" with access to the web and Googles other
| services, right? So sticking to "Gemini 2.0 Flash Thinking
| Experimental with apps" should be the optimal choice.
|
| Is there any reason why Gemini 1.5 Flash is still an option?
| Feels like it should be removed as an option unless it does
| something better than the other.
|
| I have difficulties understanding where each variant of the
| Gemini model is suited the most. Looking at aistudio.google.com,
| they have already update the available models.
|
| Is "Gemini 2.0 Flash Thinking Experimental" on gemini.google.com
| just "Gemini experiment 1206" or was it "Gemini Flash Thinking
| Experimental" aistudio.google.com?
|
| I have a note on my notes app where I rank every llm based on
| instructions following and math, to this day, I've had
| difficulties knowing where to place every Gemini model. I know
| there is a little popup when you hover over each model that tries
| to explain what each model does and which tasks it is best suited
| for, but these explanations have been very vague to me. And I
| haven't even started on the Gemini Advanced series or whatever I
| should call it.
|
| The available models on aistudio is now:
|
| - Gemini 2.0 Flash (gemini-2.0-flash)
|
| - Gemini 2.0 Flash Lite Preview (gemini-2.0-flash-lite-
| preview-02-05)
|
| - Gemini 2.0 Pro Experimental (gemini-2.0-pro-exp-02-05)
|
| - Gemini 2.0 Flash Thinking Experimental (gemini-2.0-flash-
| thinking-exp-01-21)
|
| If I had to sort these from most likely to fulfill my need to
| least likely, then it would probably be:
|
| gemini-2.0-flash-thinking-exp-01-21 > gemini-2.0-pro-exp-02-05 >
| gemini-2.0-flash-lite-preview-02-05 > gemini-2.0-flash
|
| Why? Because aistudio describes gemini-2.0-flash-thinking-
| exp-01-21 as being able to tackle most complex and difficult
| tasks while gemini-2.0-pro-exp-02-05 and gemini-2.0-flash-lite-
| preview-02-05 only differs with how much context they can handle.
|
| So with that out of the way, how does Gemini-2.0-flash-thinking-
| exp-01-21 compare against o3-mini, Qwen 2.5 Max, Kimi k1.5,
| DeepSeek R1, DeepSeek V3 and Sonnet 3.5?
|
| My current list of benchmarks I go through is
| artificialanalysis.ai, lmarena.ai, livebench.ai and aider.chat:s
| polygot benchmark but still, the whole Gemini suite is difficult
| to reason and sort out.
|
| I feel like this trend of having many different models with the
| same name but different suffix starts be an obstacle to my mental
| model.
| simonw wrote:
| I upgraded my llm-gemini plugin to handle this, and shared the
| results of my "Generate an SVG of a pelican riding a bicycle"
| benchmark here: https://simonwillison.net/2025/Feb/5/gemini-2/
|
| The pricing is interesting: Gemini 2.0 Flash-Lite is 7.5c/million
| input tokens and 30c/million output tokens - half the price of
| OpenAI's GPT-4o mini (15c/60c).
|
| Gemini 2.0 Flash isn't much more: 10c/million for text/image
| input, 70c/million for audio input, 40c/million for output.
| Again, cheaper than GPT-4o mini.
| zamadatix wrote:
| Is there a way to see/compare the shared results for all of the
| LLMs you've tested this prompt on in one place? The 2.0 pro
| result seems decent but I don't have a baseline if that's
| because it is or if the other 2 are just "extremely bad" or
| something.
| nolist_policy wrote:
| Search by tag: https://simonwillison.net/tags/pelican-riding-
| a-bicycle/
| iimaginary wrote:
| The only benchmark worth paying attention to.
| qingcharles wrote:
| Not a bad pelican from 2.0 Pro! The singularity is almost upon
| us :)
| mattlondon wrote:
| The SVGs are starting to look actually recognisable! You'll
| need a new benchmark soon :)
| serjester wrote:
| For anyone that parsing PDF's this is a game changer in term of
| price per dollar - I wrote a blog about it [1]. I think a lot of
| people were nervous about pricing since they released the beta,
| and although it's slightly more expensive than 1.5 Flash, this is
| still incredibly cost-effective. Looking forward to also
| benchmarking the lite version.
|
| [1] https://www.sergey.fyi/articles/gemini-flash-2
| mmanfrin wrote:
| It sure is cool that people who joined Google's pixel pass
| continue to be unable to give them money to access Advanced.
| staticman2 wrote:
| I have a fun query in AI studio where I pasted a 800,000 token
| Wuxia martial arts novel and ask it worldbuilding questions.
|
| 1.5 pro and the old 2.0 flash experimental generated responses in
| AI studio but the new 2.0 models respond with blank answers.
|
| I wonder if it's timing out or some sort of newer censorship
| models is preventing 2.0 from answering my query. The novel is
| pg-13 at most but references to "bronze skinned southern
| barbarians" "courtesans" "drugs" "demonic sects" and murder could
| I guess set it off.
| barrenko wrote:
| If you're Google and you're reading, please offer finetuning on
| multi-part dialogue.
| crowcroft wrote:
| Worth noting that with 2.0 they're now offering free search tool
| use for 1,500 queries per day.
|
| Their search costs 7x Perplexity Sonar's but imagine a lot of
| people will start with Google given they can get a pretty decent
| amount of search for free now.
| mistrial9 wrote:
| Why does no one mention that you must login with a Google
| account, with all of the record keeping, cross correlations and
| 3rd party access implied there..
| m_ppp wrote:
| I'm interested to know how well video processing works here. Ran
| into some problems when I was using vertex to serve longer
| youtube videos.
| bionhoward wrote:
| I always get to, "You may not use the Services to develop models
| that compete with the Services (e.g., Gemini API or Google AI
| Studio)." [1] and exit
|
| - [1] https://ai.google.dev/gemini-api/terms
| foresto wrote:
| Not to be confused with Project Gemini.
|
| https://geminiprotocol.net/
| user3939382 wrote:
| I wonder how common this is, but my interest in this product is 0
| simply because my level of trust and feeling of goodwill for
| Google almost couldn't be lower.
| sylware wrote:
| This is a lie since I don't have a google account, and cannot
| search on google anymore since noscript/basic (x)html browsers
| interop was broken a few weeks ago.
| tmaly wrote:
| I am on the iOS app and I see Gemini 2.0 and Gemini 1.5 as
| options in the drop down. I am on free tier
| dtquad wrote:
| Try the Gemini webapp. It has a powerful reasoning model with
| Google Search and Maps integration.
| CSMastermind wrote:
| Is it still the case that it doesn't really support video input?
|
| As in I have a video file I want to send it to the model and get
| a response about it. Not their 'live stream' or whatever
| functionality.
___________________________________________________________________
(page generated 2025-02-05 23:00 UTC)