[HN Gopher] Gemini 2.5 Flash
       ___________________________________________________________________
        
       Gemini 2.5 Flash
        
       Author : meetpateltech
       Score  : 407 points
       Date   : 2025-04-17 19:03 UTC (3 hours ago)
        
 (HTM) web link (developers.googleblog.com)
 (TXT) w3m dump (developers.googleblog.com)
        
       | xnx wrote:
       | 50% price increase from Gemini 2.0 Flash. That sounds like a lot,
       | but Flash is still so cheap when compared to other models of this
       | (or lesser) quality. https://developers.googleblog.com/en/start-
       | building-with-gem...
        
         | akudha wrote:
         | Is this cheaper than DeepSeek? Am I reading this right?
        
         | Tiberium wrote:
         | del
        
           | Havoc wrote:
           | You may want to consult Gemini on those percentage calcs .10
           | to .15 is not 25%
        
         | swyx wrote:
         | done pretty much inline with the price elo pareto frontier
         | https://x.com/swyx/status/1912959140743586206/photo/1
        
           | xnx wrote:
           | Love that chart! Am I imagining that I saw a version of that
           | somewhere that even showed how the boundary has moved out
           | over time?
        
             | swyx wrote:
             | https://x.com/swyx/status/1882933368444309723
             | 
             | https://x.com/swyx/status/1830866865884991999 (scroll up)
        
       | byefruit wrote:
       | It's interesting that there's a price nearly 6x price difference
       | between reasoning and no reasoning.
       | 
       | This implies it's not a hybrid model that can just skip reasoning
       | steps if requested.
       | 
       | Anyone know what else they might be doing?
       | 
       | Reasoning means contexts will be longer (for thinking tokens) and
       | there's an increase in cost to inference with a longer context
       | but it's not going to be 6x.
       | 
       | Or is it just market pricing?
        
         | vineyardmike wrote:
         | Based on their graph, it does look explicitly priced along
         | their "Pareto Frontier" curve. I'm guessing that is guiding the
         | price more than their underlying costs.
         | 
         | It's smart because it gives them room to drop prices later and
         | compete once other company actually get to a similar quality.
        
         | jsnell wrote:
         | > This implies it's not a hybrid model that can just skip
         | reasoning steps if requested.
         | 
         | It clearly is, since most of the post is dedicated to the
         | tunability (both manual and automatic) of the reasoning budget.
         | 
         | I don't know what they're doing with this pricing, and the blog
         | post does not do a good job explaining.
         | 
         | Could it be that they're not counting thinking tokens as output
         | tokens (since you don't get access to the full thinking trace
         | anyway), and this is the basically amortizing the thinking
         | tokens spend over the actual output tokens? Doesn't make sense
         | either, because then the user has no incentive to use anything
         | except 0/max thinking budgets.
        
         | RobinL wrote:
         | Does anyone know how this pricing works? Supposing I have a
         | classification prompt where I need the response to be a binary
         | yes/no. I need one token of output, but reasoning will
         | obviously add far more than 6 additional tokens. Is it still a
         | 6x price multiplier? That doesn't seem to make sense, but not
         | does paying 6x more for every token including reasoning ones
        
       | punkpeye wrote:
       | This is cool, but rate limits on all of these preview models are
       | PITA
        
         | Layvier wrote:
         | Agreed, it's not even possible to run an eval dataset. If
         | someone from google see this please at least increase the burst
         | rate limit
        
           | punkpeye wrote:
           | It is not without rate limits, but we do have elevated limits
           | for our accounts through:
           | 
           | https://glama.ai/models/gemini-2.5-flash-preview-04-17
           | 
           | So if you just want to run evals, that should do it.
           | 
           | Though the first couple of days after a model comes out are
           | usually pretty rough because everyone try to run their evals.
        
             | punkpeye wrote:
             | What I am noticing with every new Gemini model that comes
             | out is that the time to first token (TTFT) is not great. I
             | guess it is because they gradually transfer computer power
             | from old models to new models as the demand increases.
        
               | Filligree wrote:
               | If you're imagining that 2.5Pro gets dynamically loaded
               | during the time to first token, then you're vastly
               | overestimating what's physically possible.
               | 
               | It's more likely a latency-throughput tradeoff. Your
               | query might get put inside a large batch, for example.
        
             | Layvier wrote:
             | That's very interesting, thanks for sharing!
        
       | arnaudsm wrote:
       | Gemini flash models have the least hype, but in my experience in
       | production have the best bang for the buck and multimodal
       | tooling.
       | 
       | Google is silently winning the AI race.
        
         | belter wrote:
         | > Google is silently winning the AI race.
         | 
         | That is what we keep hearing here...The last Gemini I cancelled
         | the account, and can't help notice the new one they are
         | offering for free...
        
           | arnaudsm wrote:
           | Sorry I was talking of B2B APIs for my YC startup. Gemini is
           | still far behind for consumers indeed.
        
             | JeremyNT wrote:
             | I use Gemini almost exclusively as a normal user. What am I
             | missing out on that they are far behind on?
             | 
             | It seems shockingly good and I've watched it get much
             | better up to 2.5 Pro.
        
               | arnaudsm wrote:
               | Mostly brand recognition and the earlier Geminis had more
               | refusals.
               | 
               | As a consumer, I also really miss the Advanced voice mode
               | of ChatGPT, which is the most transformative tech in my
               | daily life. It's the only frontier model with true audio-
               | to-audio.
        
               | wavewrangler wrote:
               | What do you mean miss? You don't have the budget to keep
               | something you truly miss for $20? What am in missing here
               | / I don't mean to criticize I am just curious is all. I
               | would reword but I have to go
        
         | Layvier wrote:
         | Absolutely. So many use cases for it, and it's so
         | cheap/fast/reliable
        
           | danielbln wrote:
           | I want to use these almost too cheap to meter models like
           | Flash more, what are some interesting use cases for those?
        
           | SparkyMcUnicorn wrote:
           | And stellar OCR performance. Flash 2.0 is cheaper and more
           | accurate than AWS Textract, Google Document AI, etc.
           | 
           | Not only in benchmarks[0], but in my own production usage.
           | 
           | [0] https://getomni.ai/ocr-benchmark
        
         | Fairburn wrote:
         | Sorry, but no. Gemini isn't the fastest horse, yet. And it's
         | use within their ecosystem means it isn't geared to the masses
         | outside of their bubble. They are not leading the race but they
         | are a contender.
        
         | spruce_tips wrote:
         | i have a high volume task i wrote an eval for and was
         | pleasantly surprised at 2.0 flash's cost to value ratio
         | especially compared to gpt4.1-mini/nano
         | 
         | accuracy | input price | output price
         | 
         | Gemini Flash 2.0 Lite: 67% | $0.075 | $0.30
         | 
         | Gemini Flash 2.0: 93% | $0.10 | $0.40
         | 
         | GPT-4.1-mini: 93% | $0.40 | $1.60
         | 
         | GPT-4.1-nano: 43% | $0.10 | $0.40
         | 
         | excited to to try out 2.5 flash
        
           | jay_kyburz wrote:
           | Can I ask a serious question. What task are you writing where
           | its ok to get 7% error rate. I can't get my head around how
           | this can be used.
        
             | spruce_tips wrote:
             | low stakes text classification but it's something that
             | needs to be done and couldnt be done in reasonable time
             | frames or at reasonable price points by humans
        
             | omneity wrote:
             | In my case, I have workloads like this where it's possible
             | to verify the correctness of the result after inference, so
             | any success rate is better than 0 as it's possible to
             | identify the "good ones".
        
             | dist-epoch wrote:
             | Not OP, but for stuff like social networks
             | spam/manipulation 7% error rate is fine
        
               | wavewrangler wrote:
               | Yeah, general propaganda and psyops are actually more
               | effective around 12% - 15%, we find it is more accurate
               | to the user base, thus is questioned less for standing
               | out more /s
        
             | 16bytes wrote:
             | There are tons of AI/ML use-cases where 7% is acceptable.
             | 
             | Historically speaking, if you had a 15% word error rate in
             | speech recognition, it would generally be considered
             | useful. 7% would be performing well, and <5% would be near
             | the top of the market.
             | 
             | Typically, your error rate just needs to be below the
             | usefulness threshold and in many cases the cost of errors
             | is pretty small.
        
         | 42lux wrote:
         | The API is free, and it's great for everyday tasks. So yes
         | there is no better bang for the buck.
        
           | drusepth wrote:
           | Wait, the API is free? I thought you had to use their web
           | interface for it to be free. How do you use the API for free?
        
             | mlboss wrote:
             | using aistudio.google.com
        
             | spruce_tips wrote:
             | create an api key and dont set up billing. pretty low rate
             | limits and they use your data
        
             | dcre wrote:
             | You can get an API key and they don't bill you. Free tier
             | rate limits for some models (even decent ones like Gemini
             | 2.0 Flash) are quite high.
             | 
             | https://ai.google.dev/gemini-api/docs/pricing
             | 
             | https://ai.google.dev/gemini-api/docs/rate-limits#free-tier
        
               | NoahZuniga wrote:
               | The rate limits I've encountered with free api keys has
               | been way lower than the limits advertised.
        
             | midasz wrote:
             | I use Gemini 2.5 pro experimental via openrouter in my
             | openwebui for free. Was using sonnet 3.7 but I don't notice
             | much difference so just default to the free thing now.
        
         | statements wrote:
         | Absolutely agree. Granted, it is task dependent. But when it
         | comes to classification and attribute extraction, I've been
         | using 2.0 Flash with huge access across massive datasets. It
         | would not be even viable cost wise with other models.
        
           | sethkim wrote:
           | How "huge" are these datasets? Did you build your own tooling
           | to accomplish this?
        
         | xnx wrote:
         | Shhhh. You're going to give away the secret weapon!
        
         | gambiting wrote:
         | In my experience they are as dumb as a bag of bricks. The other
         | day I asked "can you edit a picture if I upload one"
         | 
         | And it replied "sure, here is a picture of a photo editing
         | prompt:"
         | 
         | https://g.co/gemini/share/5e298e7d7613
         | 
         | It's like "baby's first AI". The only good thing about it is
         | that it's free.
        
           | JFingleton wrote:
           | Prompt engineering is a thing.
           | 
           | Learning how to "speak llm" will give you great results.
           | There's loads of online resources that will teach you. Think
           | of it like learning a new API.
        
           | ghurtado wrote:
           | > in my experience they are as dumb as a bag of bricks
           | 
           | In my experience, anyone that describes LLMs using terms of
           | actual human intelligence is bound to struggle using the
           | tool.
           | 
           | Sometimes I wonder if these people enjoy feeling "smarter"
           | when the LLM fails to give them what they want.
        
             | mdp2021 wrote:
             | If those people are a subset of those who demand actual
             | intelligence, they will very often feel frustrated.
        
           | nowittyusername wrote:
           | Its because google hasn't realized the value of training the
           | model on information about its own capabilities and metadata.
           | My biggest pet peeve about google and the way they train
           | these models.
        
         | rvz wrote:
         | Google always has been winning the AI race as soon as DeepMind
         | was properly put to use to develop their AI models, instead of
         | the ones that built Bard (Google AI team).
        
         | GaggiX wrote:
         | Flash models are really good even for an end user because how
         | fast and good performance they have.
        
         | ghurtado wrote:
         | I know it's a single data point, but yesterday I showed it a
         | diagram of my fairly complex micropython program, (including
         | RP2 specific features, DMA and PIO) and it was able to describe
         | in detail not just the structure of the program, but also
         | exactly what it does and how it does it. This is before seeing
         | a single like of code, just going by boxes and arrows.
         | 
         | The other AIs I have shown the same diagram to, have all
         | struggled to make sense of it.
        
         | redbell wrote:
         | > Google is silently winning the AI race
         | 
         | Yep, I agree! This convinced me:
         | https://news.ycombinator.com/item?id=43661235
        
         | ramesh31 wrote:
         | >"Google is silently winning the AI race."
         | 
         | It's not surprising. What was surprising honestly was how they
         | were caught off guard by OpenAI. It feels like in 2022 just
         | about all the big players had a GPT-3 level system in the works
         | internally, but SamA and co. knew they had a winning hand at
         | the time, and just showed their cards first.
        
           | wkat4242 wrote:
           | True and their first mover advantage still works pretty well.
           | Despite "ChatGPT" being a really uncool name in terms of
           | marketing. People remember it because they were the first to
           | wow them.
        
         | russellbeattie wrote:
         | I have to say, I never doubted it would happen. They've been at
         | the forefront of AI and ML for well over a decade. Their
         | scientists were the authors of the "Attention is all you need"
         | paper, among thousands of others. A Google Scholar search
         | produces endless results. There just seemed to be a disconnect
         | between the research and product areas of the company. I think
         | they've got that worked out now.
         | 
         | They're getting their ass kicked in court though, which might
         | be making them much less aggressive than they would be
         | otherwise, or at least quieter about it.
        
         | Nihilartikel wrote:
         | 100% agree. I had Gemini flash 2 chew through thousands of
         | points of nasty unstructured client data and it did a 'better
         | than human intern' level conversion into clean structured
         | output for about $30 of API usage. I am sold. 2.5 pro
         | experimental is a different league though for coding. I'm
         | leveraging it for massive refactoring now and it is almost
         | magical.
        
           | jdthedisciple wrote:
           | > thousands of points of nasty unstructured client data
           | 
           | What I always wonder in these kinds of cases is: What makes
           | you confident the AI actually did a good job since presumably
           | you haven't looked at the thousands of client data yourself?
           | 
           | For all you know it made up 50% of the result.
        
         | no_wizard wrote:
         | I remember everyone saying its a two horse race between Google
         | and OpenAI, then DeepSeek happened.
         | 
         | Never count out the possibility of a dark horse competitor
         | ripping the sod right out from under
        
       | transformi wrote:
       | Bad day is going on google.
       | 
       | First the decleration of illegal monopoly..
       | 
       | and now... Google's latest innovation: programmable overthinking.
       | 
       | With Gemini 2.5 Flash, you too can now set a thinking_budget--
       | because nothing says "state-of-the-art AI" like manually capping
       | how long it's allowed to reason. Truly the dream: debugging a
       | production outage at 2am wondering if your LLM didn't answer
       | correctly because you cheaped out on tokens. lol.
       | 
       | "Turn thinking off for better performance." That's not a model
       | config, that's a metaphor for Google's entire AI strategy lately.
       | 
       | At this point, Gemini isn't an AI product--it's a latency-cost-
       | quality compromise simulator with a text interface. Meanwhile,
       | OpenAI and Anthropic are out here just... cooking the benchmarks
        
         | danielbln wrote:
         | Google's Gemini 2.5 pro model is incredibly strong, it's en par
         | and at times better than Claude 3.7 in coding performance,
         | being able to ingest entire videos into the context is
         | something I haven seen elsewhere either. Google AI products
         | have been anywhere between bad (Bard) to lackluster (Gemini
         | 1.5), but 2.5 is a contender, in all dimensions. Google is also
         | the only player that owns the entire stack, from research,
         | software , data, compute hardware. I think they were slow to
         | start but they've closed the gap since.
        
         | bsmith wrote:
         | Using AI to debug code at 2am sounds like pure insanity.
        
           | mring33621 wrote:
           | the new normal
        
           | spiderice wrote:
           | They're suggesting you'll be up at 2am debugging code because
           | your AI code failed. Not that you'll be using AI to do the
           | debugging.
        
       | hmaxwell wrote:
       | I did some testing this morning:
       | 
       | Prompt: "can you find any mistakes on my codebase? I put one in
       | there on purpose" + 70,000 tokens of codebase where in one line I
       | have an include for a non-existent file.
       | 
       | Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race
       | condition in the api of the admin interface that would be
       | triggered if two admins were to change the room order at the same
       | time. Claude suggested I group all sql queries in a single
       | transaction. I looked at the code and found that it already used
       | a transaction for all queries. I said: the order_update api is
       | already done with a transaction. Claude replied: "You're
       | absolutely right, and I apologize for my mistake. I was incorrect
       | to claim there was a race condition issue. The transaction
       | ensures atomicity and consistency of the updates, and the SQL
       | queries are properly structured for their intended purpose."
       | 
       | Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin
       | ui javascript code that suggested a potential alternative to
       | event handler cleanup that was not implemented because I decided
       | to go with a cleaner route. Then asked "Is this the issue you
       | intentionally included, or would you like me to look for other
       | potential problems?" I said: "The comment merely suggests an
       | alternative, right?" claude said: "Yes, you're absolutely right.
       | The comment is merely suggesting an alternative approach that
       | isn't being used in the code, rather than indicating a mistake.
       | So there's no actual bug or mistake in this part of the code -
       | just documentation of different possible approaches. I apologize
       | for misinterpreting this as an issue!"
       | 
       | Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of
       | the database to generate QR codes in the admin interface, Claude
       | says that my code both attempts to generate QR codes with
       | undefined data AS WELL AS saying that my error handling skips
       | undefined data. Claude contradicts itself within 2 sentences.
       | When asking about clarification Claude replies: Looking at the
       | code more carefully, I see that the code actually has proper
       | error handling. I incorrectly stated that it "still attempts to
       | call generateQRCode()" in the first part of my analysis, which
       | was wrong. The code properly handles the case when there's no
       | data-room attribute.
       | 
       | Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional
       | error and said I should stop putting db creds/api keys into the
       | codebase.
       | 
       | Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional
       | error and said I should stop putting db creds/api keys into the
       | codebase.
       | 
       | Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional
       | error and said I should stop putting db creds/api keys into the
       | codebase.
       | 
       | o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you
       | submitted was too long, please reload the conversation and submit
       | something shorter."
        
         | Tiberium wrote:
         | The thread is about 2.5 Flash though, not 2.5 Pro. Maybe you
         | can try again with 2.5 Flash specifically? Even though it's a
         | small model.
        
         | airstrike wrote:
         | Have you tried Claude Code?
        
         | danielbln wrote:
         | Those responses are very Claude, to. 3.7 has powered our
         | agentic workflows for weeks, but I've been using almost only
         | Gemini for the last week and feel the output is better
         | generally. It's gotten much better at agentic workflows (using
         | 2.0 in an agent setup was not working well at all) and I prefer
         | its tuning over Clause's, more to the point and less
         | meandering.
        
         | rendang wrote:
         | 3 different answers in 3 tries for Claude? Makes me curious how
         | many times you'd get the same answer if you asked 10/20/100
         | times
        
         | bambax wrote:
         | > _codebase where in one line I have an include for a non-
         | existent file_
         | 
         | Ok but you don't need AI for this; almost any IDE will issue a
         | warning for that kind of error...
        
       | Workaccount2 wrote:
       | OpenAI might win the college students but it looks like Google
       | will lock in enterprise.
        
         | xnx wrote:
         | ChatGPT seems to have a name recognition / first-mover
         | advantage with college students now, but is there any reason to
         | think that will stick when today's high school students are
         | using Gemini on their Chromebooks?
        
         | gundmc wrote:
         | Funny you should say that. Google just announced today that
         | they are giving all college students one year of free Gemini
         | advanced. I wonder how much that will actually move the needle
         | among the youth.
        
           | Workaccount2 wrote:
           | My guess is that they will use it and still call it
           | "ChatGPT"...
        
             | xnx wrote:
             | Chat Gemini Pretrained Transformer
        
             | tantalor wrote:
             | Pass the Kleenex. Can I get a Band-Aid? Here's a Sharpie. I
             | need a Chapstick. Let me Xerox that. Toss me that Frisbee.
        
               | drob518 wrote:
               | Exactly.
        
           | drob518 wrote:
           | And every professor just groaned at the thought of having to
           | read yet another AI-generated term paper.
        
             | jay_kyburz wrote:
             | They should just get AI to mark them. I genuinely think
             | this is one thing AI would do better than humans.
        
               | mdp2021 wrote:
               | Grading papers definitely requires intelligence.
        
               | jay_kyburz wrote:
               | My partner marked a PHD thesis yesterday and there was a
               | spelling mistake in the title.
               | 
               | There is some level of analysis and feedback than an LLM
               | could provide before a human reviews it. Even if it's
               | just a fancy spelling checker.
        
             | bufferoverflow wrote:
             | Take-home assignments are basically obsolete. Students who
             | want to cheat, can do so easily. Of course, in the end,
             | they cheat themselves, but that's not the point.
        
           | anovick wrote:
           | * Only in the U.S.
        
         | superfrank wrote:
         | Is there really lock in with AI models?
         | 
         | I built a product that uses and LLM and I got curious about the
         | quality of the output from different models. It took me a
         | weekend to go from just using OpenAI's API to having Gemini,
         | Claude, and DeepSeek all as options and a lot of that time was
         | research on what model from each provider that I wanted to use.
        
           | pydry wrote:
           | For enterprise practically any SaaS gets used as one more
           | thing to lock them into a platform they already have a
           | relationship with (either AWS, GCP or Azure).
           | 
           | It's actually pretty dangerous for the industry to have this
           | much vertical integration. Tech could end up like the car
           | industry.
        
             | superfrank wrote:
             | I'm aware of that. I'm an EM for a large tech company that
             | sells multiple enterprise SaaS product.
             | 
             | You're right that the lock in happens because of
             | relationships, but most big enterprise SaaS companies have
             | relationships with multiple vendors. My company
             | relationships with AWS, Azure, and GCP and we're currently
             | using products from all of them in different products. Even
             | on my specific product we're using all three.
             | 
             | When you've already got those relationships, the lock in is
             | more about switching costs. The time it takes to switch,
             | the knowledge needed to train people internally on the
             | differences after the switch, and the actual cost of the
             | new service vs the old one.
             | 
             | With AI models the time to switch from OpenAI to Gemini is
             | negligible and there's little retraining needed. If the
             | Google models (now or in the future) are comparable in
             | price and do a better job than OpenAI models, I don't see
             | where the lock in is coming from.
        
           | drob518 wrote:
           | There isn't much of a lock-in, and that's part of the problem
           | the industry is going to face. Everyone is spending gobs of
           | money on training and if someone else creates a better one
           | next week, the users can just swap it right in. We're going
           | to have another tech crash for AI companies, similar to what
           | happened in 2001 for .coms. Some will be winners but they
           | won't all be.
        
         | ein0p wrote:
         | How will it lock in the enterprise if its market share of
         | enterprise customers is half that of Azure (Azure also sells
         | OpenAI inference, btw), and one third that of AWS?
        
           | kccqzy wrote:
           | The same reason why people enjoy BigQuery enough that their
           | only use of GCP is BigQuery while they put their general
           | compute spend on AWS.
           | 
           | In other words, I believe talking about cloud market share as
           | a whole is misleading. One cloud could have one product
           | that's so compelling that people use that one product even
           | when they use other clouds for more commoditized products.
        
         | asadm wrote:
         | funny thing about younglings, they will migrate to something
         | else as fast as they came to you.
        
           | drob518 wrote:
           | I read about that on Facebook.
        
         | Oras wrote:
         | Enterprise has already been won by Microsoft (Azure), which
         | runs on OpenAI.
        
           | r00fus wrote:
           | That isn't what I'm seeing with my clientele (lots of
           | startups and mature non-tech companies). Most are using Azure
           | but very few have started to engage AI outside the periphery.
        
         | edaemon wrote:
         | It seems more and more like AI is less of a product and more of
         | a feature. Most people aren't going to care or even know about
         | the model or the company who made it, they're just going to use
         | the AI features built into the products they already use.
        
       | statements wrote:
       | Interesting to note that this might be the only model with
       | knowledge cut off as recent as 2025 January
        
         | Tiberium wrote:
         | Gemini 2.5 Pro has the same knowledge cutoff specified, but in
         | reality on more niche topics it's still limited to ~middle of
         | 2024.
        
         | brightball wrote:
         | Isn't Grok 3 basically real time now?
        
           | Tiberium wrote:
           | That's the web version (which has tools like search plugged
           | in), other models in their official frontends (Gemini on
           | gemini.google.com, GPT/o models on chatgpt.com) are also
           | "real time". But when served over API, most of those models
           | are just static.
        
           | bearjaws wrote:
           | No LLM is real time, and in fact, even a 2025 cut off isn't
           | entirely realistic. Without guidance to say, a new version of
           | a framework it will frequently "reference" documentation from
           | old versions and use that.
           | 
           | It's somewhat real time when it searches the web, of course
           | that data is getting populated into context rather than in
           | training.
        
           | jiocrag wrote:
           | Not at all. The model weights and training data remain the
           | same, it's just RAG'ing real-time twitter data into its
           | context window when returning results. It's like a worse
           | version of Perplexity.
        
       | ein0p wrote:
       | Absolutely decimated on metrics by o4-mini, straight out of the
       | gate, and not even that much cheaper on output tokens (o4-mini's
       | thinking can't be turned off IIRC).
        
         | gundmc wrote:
         | It's good to see some actual competition on this price range! A
         | lot of Flash 2.5's edge will depend on how well the dynamic
         | reasoning works. It's also helpful to have _significantly_
         | lower input token cost for a large context use cases.
        
         | rfw300 wrote:
         | o4-mini does look to be a better model, but this is actually a
         | lot cheaper! It's ~7x cheaper for both input and output tokens.
        
           | ein0p wrote:
           | These small models only make sense with "thinking" enabled.
           | And once you enable that, much of the cost advantage
           | vanishes, for output tokens.
        
             | overfeed wrote:
             | > These small models only make sense with "thinking"
             | enabled
             | 
             | This entirely depends on your use-cases.
        
         | vessenes wrote:
         | o4-mini costs 8x as much as 2.5 flash. I believe its useful
         | context window is also shorter, although I haven't verified
         | this directly.
        
           | mccraveiro wrote:
           | 2.5 flash with reasoning is just 20% cheaper than o4-mini
        
             | vessenes wrote:
             | Good point: reasoning costs more. Also impossible to tell
             | without tests is how verbose the reasoning mode is
        
         | mupuff1234 wrote:
         | Not sure "decimated" is a fitting word for "slightly higher
         | performance on some benchmarks".
        
           | fwip wrote:
           | Perhaps they were using the original meaning of "one-tenth
           | destroyed." :P
        
           | ein0p wrote:
           | 66.8% error rate reduction for o4-mini on AIME2025, and 21%
           | error rate reduction on MMMU isn't "slightly higher". It'll
           | be quite noticeable in practice.
        
         | kfajdsl wrote:
         | Anecdotally o4-mini doesn't perform as well on video
         | understanding tasks in our pipeline, and also in Cursor it
         | seems really not great.
         | 
         | During one session, it read the same file (same lines) several
         | times, ran 'python -c 'print("skip!")'' for no reason, and then
         | got into another file reading loop. Then after asking a
         | hypothetical about the potential performance implications of
         | different ffmpeg flags, it claimed that it ran a test and
         | determined conclusively that one particular set was faster,
         | even though it hadn't even attempted a tool call, let alone
         | have the results from a test that didn't exist.
        
       | xbmcuser wrote:
       | For a non programmer like me google is becoming shockingly good.
       | It is giving working code the first time. I was playing around
       | with it asked it to write code to scrape some data of a website
       | to analyse. I was expecting it to write something that would
       | scrape the data and later I would upload the data to it to
       | analyse. But it actually wrote code that scraped and analysed the
       | data. It was basic categorizing and counting of the data but I
       | was not expecting it to do that.
        
         | kccqzy wrote:
         | That's the opposite experience of my wife who's in tech but
         | also a non programmer. She wanted to ask Gemini to write code
         | to do some basic data analysis things in a more automated way
         | than Excel. More than once, Gemini wrote a long bash script
         | where some sed invocations are just plain wrong. More than once
         | I've had to debug Gemini-written bash scripts. As a programmer
         | I knew how bash scripts aren't great for readability so I told
         | my wife to ask Gemini to write Python. It resulted in higher
         | code quality, but still contained bugs that are impossible for
         | a non programmer to fix. Sometimes asking a follow up about the
         | bugs would cause Gemini to fix it, but doing so repeatedly will
         | result in Gemini forgetting what's being asked or simply
         | throwing an internal error.
         | 
         | Currently IMO you have to be a programmer to use Gemini to
         | write programs effectively.
        
           | sbarre wrote:
           | I've found that good prompting isn't just about asking for
           | results but also giving hints/advice/direction on how to go
           | about the work.
           | 
           | I suspect that if Gemini is giving you bash scripts it's
           | because you're note giving it enough direction. As you
           | pointed out, telling it to use Python, or giving it more
           | expectations about how to go about the work or how the output
           | should be, will give better results.
           | 
           | When I am prompting for technical or data-driven work, I tend
           | to almost walk through what I imagine the process would be,
           | including steps, tools, etc...
        
           | xbmcuser wrote:
           | I had similar experiences few months back that is why I am
           | saying it is becoming shockingly good the 2.5 is a lot better
           | than the 2.0 version. Another thing I have realized just like
           | google search in the past your query has a lot to do with the
           | results you get. So an example of what you want works at
           | getting better results
        
             | ac29 wrote:
             | > I am saying it is becoming shockingly good the 2.5 is a
             | lot better than the 2.0 version
             | 
             | Are you specifically talking about 2.5 Flash? It only came
             | out an hour ago, I dont know how you would have enough
             | experience with it already to come to your conclusion.
             | 
             | (I am very impressed with 2.5 Pro, but that is a different
             | model that's been available for several weeks now)
        
               | xbmcuser wrote:
               | I am talking about 2.5 Pro
        
           | 999900000999 wrote:
           | Let's hope that's the case for a while.
           | 
           | I want to be able to just tell chat GPT or whatever to create
           | a full project for me, but I know the moment it can do that
           | without any human intervention, I won't be able to find a
           | job.
        
           | drob518 wrote:
           | IMO, the only thing that's consistent about AIs is how
           | inconsistent they are. Sometimes, I ask them to write code
           | and I'm shocked at how well it works. Other times, I feel
           | like I'm trying to explain to a 5-year-old Alzheimer's
           | patient what I want and it just can't seem to do the simplest
           | stuff. And it's the same AI in both cases.
        
           | SweetSoftPillow wrote:
           | It must have something to do with the way your wife is
           | prompting. I've noticed this with my friends too. I usually
           | get working code from Gemini 2.5 Pro on the first try, and
           | with a couple of follow-up prompts, it often improves
           | significantly, while my friends seem to struggle
           | communicating their ideas to the AI and get worse results.
           | 
           | Good news: Prompting is a skill you can develop.
        
             | halfmatthalfcat wrote:
             | Or we can just learn to write it ourselves in the same
             | amount of time /shrug
        
               | viraptor wrote:
               | If you're going to need scripts like that every week -
               | sure. If you need it once a year on average... not
               | likely. There's a huge amount of things we could learn
               | but do them so infrequently that we outsource it to other
               | people.
        
             | gregorygoc wrote:
             | Is there a website with off the shelf prompts that work?
        
           | Workaccount2 wrote:
           | There is definitely an art to doing it, but the ability is
           | definitely there even if you don't know the language at all.
           | 
           | I have a few programs now that are written in Python (2 by
           | 3.7, one by 2.5) used for business daily, and I can tell you
           | I didn't, and frankly couldn't, check a single line of code.
           | One of them is ~500 LOC, the other two are 2200-2700 LOC.
        
         | ant6n wrote:
         | Last time I tried Gemini, it messed with my google photo data
         | plan and family sharing. I wish I could try the AI separate
         | from my Google account.
        
           | jsnell wrote:
           | > I wish I could try the AI separate from my Google account.
           | 
           | If that's a concern, just create another account. Doesn't
           | even require using a separate browser profile, you can be
           | logged into multiple accounts at once and use the account
           | picker in the top right of most their apps to switch.
        
         | ModernMech wrote:
         | I've been continually disappointed. I've been told it's getting
         | exponentially better and we won't be able to keep up with how
         | good they get, but I'm not convinced. I'm using them every
         | single day and I'm never shocked or awed by its competence, but
         | instead continually vexxed that isn't not living up to the hype
         | I keep reading.
         | 
         | Case in point: there was a post here recently about
         | implementing a JS algorithm that highlighted headings as you
         | scrolled (side note: can anyone remember what the title was? I
         | can't find it again), but I wanted to test the LLM for that
         | kind of task.
         | 
         | Pretty much no matter what I did, I couldn't get it to give me
         | a solution that would highlight all of the titles down to the
         | very last one.
         | 
         | I knew what the problem was, but even guiding the AI, it
         | couldn't fix the code. I tried multiple AIs, different
         | strategies. The best I could come up with was to guide it step
         | by step on how to fix the code. Even telling it _exactly_ what
         | the problem was, it couldn 't fix it.
         | 
         | So this goes out to the "you're prompting it wrong" crowd...
         | Can you show me a prompt or a conversation that will get an AI
         | to spit out working code for this task: JavaScript that will
         | highlighting headings as you scroll, to the very last one. The
         | challenge is to prompt it to do this without telling it how to
         | implement it.
         | 
         | I figure this should be easy for the AI because this kind of
         | thing is very standard, but maybe I'm just holding it wrong?
        
           | jsnell wrote:
           | Even as a human programmer I don't actually understand your
           | description of the problem well enough to be confident I
           | could correctly guess your intent.
           | 
           | What do you mean by "highlight as you scroll"? I guess you
           | want a single heading highlighted at a time, and it should be
           | somehow depending on the viewport. But even that is
           | ambiguous. Do you want the topmost heading in the viewport?
           | The bottom most? Depending on scroll direction?
           | 
           | This is what I got one-shot from Gemini 2.5 Pro, with my best
           | guess at what you meant:
           | https://gemini.google.com/share/d81c90ab0b9f
           | 
           | It seems pretty good. Handles scrolling via all possible
           | ways, does the highlighting at load too so that the
           | highlighting is in effect for the initial viewport too.
           | 
           | The prompt was "write me some javascript that higlights the
           | topmost heading (h1, h2, etc) in the viewport as the document
           | is scrolled in any way".
           | 
           | So I'm thinking your actual requirements are very different
           | than what you actually wrote. That might explain why you did
           | not have much luck with any LLMs.
        
           | croemer wrote:
           | "Overengineered anchor links":
           | https://news.ycombinator.com/item?id=43570324
        
       | __alexs wrote:
       | Does billing for the API actually work properly yet?
        
       | alecco wrote:
       | Gemini models are very good but in my experience they tend to
       | overdo the problems. When I give it things for context and
       | something to rework, Gemini often reworks the problem.
       | 
       | For software it is barely useful because you want small commits
       | for specific fixes not a whole refactor/rewrite. I tried many
       | prompts but it's hard. Even when I give it function signatures of
       | the APIs the code I want to fix uses, Gemini rewrites the API
       | functions.
       | 
       | If anybody knows a prompt hack to avoid this, I'm all ears.
       | Meanwhile I'm staying with Claude Pro.
        
         | byearthithatius wrote:
         | Yes, it will add INSANE amounts of "robust error handling" to
         | quick scripts where I can be confident about assumptions. This
         | turns my clean 40 lines of Python where I KNOW the JSONL I am
         | parsing is valid into 200+ lines filled with ten new try except
         | statements. Even when I tell it not to do this, it loves to
         | "find and help" in other ways. Quite annoying. But overall it
         | is pretty dang good. It even spotted a bug I missed the other
         | day in a big 400+ line complex data processing file.
        
           | zhengyi13 wrote:
           | I wonder how much of that sort of thing is driven by having
           | trained their models on their own internal codebases? Because
           | if that's the case, careful and defensive being the default
           | would be unsurprising.
        
           | stavros wrote:
           | I didn't realize this was a bigger trend, I asked it to write
           | a simple testing script that POSTed a string to a local HTTP
           | server as JSON, and it wrote a 40 line script, handling any
           | possible error. I just wanted two lines.
        
       | ks2048 wrote:
       | If this announcement is targeting people not up-to-date on the
       | models available, I think they should say what "flash" means. Is
       | there a "Gemini (non-flash)"?
       | 
       | I see the 4 Google model names in the chart here. Are these 4 the
       | main "families" of models to choose from?
       | 
       | - Gemini-Pro-Preview
       | 
       | - Gemini-Flash-Preview
       | 
       | - Gemini-Flash
       | 
       | - Gemini-Flash-Lite
        
         | mwest217 wrote:
         | Gemini has had 4 families of models, in order of decreasing
         | size:
         | 
         | - Ultra
         | 
         | - Pro
         | 
         | - Flash
         | 
         | - Flash-Lite
         | 
         | Versions with `-Preview` at the end haven't had their "official
         | release" and are technically in some form of "early access"
         | (though I'm not totally clear on exactly what that means given
         | that they're fully available and as of 2.5 Pro Preview, have
         | pricing attached to them - earlier versions were free during
         | Preview but had pretty strict rate limiting but now it seems
         | that Preview models are more or less fully usable).
        
           | drob518 wrote:
           | Is GMail still in beta?
        
             | mring33621 wrote:
             | so Sigma...
        
           | jsnell wrote:
           | The free-with-small-rate-limits designator was
           | "experimental", not "preview".
           | 
           | I _think_ the distinction between preview and full release is
           | that the preview models have no guarantees on how long they
           | 'll be available, the full release comes with a pre-set
           | discontinuation date. So if want the stability for a
           | production app, you wouldn't want to use a preview model.
        
       | AStonesThrow wrote:
       | I've been leveraging the services of 3 LLMs, mainly: Meta,
       | Gemini, and Copilot.
       | 
       | It depends on what I'm asking. If I'm looking for answers in the
       | realm of history or culture, religion, or I want something
       | creative such as a cute limerick, or a song or dramatic script,
       | I'll ask Copilot. Currently, Copilot has two modes: "Quick
       | Answer"; or "Think Deeply", if you want to wait about 30 seconds
       | for a good answer.
       | 
       | If I want info on a product, a business, an industry or a field
       | of employment, or on education, technology, etc., I'll inquire of
       | Gemini.
       | 
       | Both Copilot and Gemini have interactive voice conversation
       | modes. Thankfully, they will also write a transcript of what we
       | said. They also eagerly attempt to engage the user with further
       | questions and followups, with open questions such as "so what's
       | on your mind tonight?"
       | 
       | And if I want to know about pop stars, film actors, the social
       | world or something related to tourism or recreation in general, I
       | can ask Meta's AI through [Facebook] Messenger.
       | 
       | One thing I found to be extremely helpful and accurate was
       | Gemini's tax advice. I mean, it was way better than human beings
       | at the entry/poverty level. Commercial tax advisors, even when
       | I'd paid for the Premium Deluxe Tax Software from the Biggest
       | Name, they just went to Google stuff for me. I mean, they didn't
       | even seem to know where stuff was on irs.gov. When I asked for a
       | virtual or phone appointment, they were no-shows, with a litany
       | of excuses. I visited 3 offices in person; the first two were
       | closed, and the third one basically served Navajos living off the
       | reservation.
       | 
       | So when I asked Gemini about tax information -- simple stuff like
       | the terminology, definitions, categories of income, and things
       | like that -- Gemini was perfectly capable of giving lucid
       | answers. And citing its sources, so I could immediately go find
       | the IRS.GOV publication and read it "from the horse's mouth".
       | 
       | Oftentimes I'll ask an LLM just to jog my memory or inform me of
       | what specific terminology I should use. Like "Hey Gemini, what's
       | the PDU for Ethernet called?" and when Gemini says it's a "frame"
       | then I have that search term I can plug into Wikipedia for
       | further research. Or, for an introduction or overview to topics
       | I'm unfamiliar with.
       | 
       | LLMs are an important evolutionary step in the general-purpose
       | "search engine" industry. One problem was, you see, that it was
       | dangerous, annoying, or risky to go Googling around and click on
       | all those tempting sites. Google knew this: the dot-com sites and
       | all the SEO sites that surfaced to the top were traps, they were
       | bait, they were sometimes legitimate scams. So the LLM providers
       | are showing us that we can stay safe in a sandbox, without
       | clicking external links, without coughing up information about
       | our interests and setting cookies and revealing our IPv6
       | addresses: we can safely ask a local LLM, or an LLM in a trusted
       | service provider, about whatever piques our fancy. And I am glad
       | for this. I saw y'all complaining about how every search engine
       | was worthless, and the Internet was clogged with blogspam, and
       | there was no real information anymore. Well, perhaps LLMs, for
       | now, are a safe space, a sandbox to play in, where I don't need
       | to worry about drive-by-zero-click malware, or being inundated
       | with Joomla ads, or popups. For now.
        
       | cynicalpeace wrote:
       | 1. The main transformative aspect of LLMs has been in writing
       | code.
       | 
       | 2. LLMs have had less transformative aspects in 2025 than we
       | anticipated back in late 2022.
       | 
       | 3. LLMs are unlikely to be very transformative to society, even
       | as their intelligence increases, because intelligence is a minor
       | changemaker in society. Bigger changemakers are motivation,
       | courage, desire, taste, power, sex and hunger.
       | 
       | 4. LLMs are unlikely to develop these more important traits
       | because they are trained on text, not evolved in a rigamarole of
       | ecological challenges.
        
       | charcircuit wrote:
       | 500 RPD for the free tier is good enough for my coding needs.
       | Nice.
        
       | AbuAssar wrote:
       | I noticed that OpenAI don't compare their models to third party
       | models in their announcement posts, unlike google, meta and the
       | others.
        
         | jskherman wrote:
         | They're doing the Apple strategy. Less spotlight for other
         | third parties, and less awareness how they're lagging behind so
         | that those already ignorantly locked into OpenAI would not
         | switch. But at this point why would anyone do that when
         | switching costs are low?
        
       | mmaunder wrote:
       | More great innovation from Google. OpenAI have two major
       | problems.
       | 
       | The first is Google's vertically integrated chip pipeline and
       | deep supply chain and operational knowledge when it comes to
       | creating AI chips and putting them into production. They have a
       | massive cost advantage at every step. This translates into more
       | free services, cheaper paid services, more capabilities due to
       | more affordable compute, and far more growth.
       | 
       | Second problem is data starvation and the unfair advantage that
       | social media has when it comes to a source of continually
       | refreshed knowledge. Now that the foundational model providers
       | have churned through the common crawl and are competing to
       | consume things like video and whatever is left, new data is
       | becoming increasingly valuable as a differentiator, and more
       | importantly, as a provider of sustained value for years to come.
       | 
       | SamA has signaled both of these problems when he made noises
       | about building a fab a while back and is more recently making
       | noises about launching a social media platform off OpenAI. The
       | smart money among his investors know these issues to be
       | fundamental in deciding if OAI will succeed or not, and are
       | asking the hard questions.
       | 
       | If the only answer for both is "we'll build it from scratch",
       | OpenAI is in very big trouble. And it seems that that is the best
       | answer that SamA can come up with. I continue to believe that
       | OpenAI will be the Netscape of the AI revolution.
       | 
       | The win is Google's for the taking, if they can get out of their
       | own way.
        
         | jbverschoor wrote:
         | Except that they train their model even when you pay. So yeah..
         | I'd rather not use their "evil"
        
           | dayvigo wrote:
           | Source?
        
         | Keyframe wrote:
         | Google has the data and has the hardware, not to mention
         | software and infrastructure talent. Once this Bismarck turns
         | around and it looks like it is, who can parry it for real? They
         | have internet.zip and all the previous versions as well, they
         | have youtube, email, search, books, traffic, maps and business
         | on it, phones and habits around it, even the OG social network,
         | the usenet. It's a sleeping giant starting to wake up and it's
         | already causing commotion, let's see what it does when it
         | drinks morning coffee.
        
           | kriro wrote:
           | Agreed. One of Google's big advantages is the data access and
           | integrations. They are also positioned really well for the
           | "AI as entertainment" sector with youtube which will be huge
           | (imo). They also have the knowledge in adtech and well
           | injecting adds into AI is an obvious play. As is harvesting
           | AI chat data.
           | 
           | Meta and Google are the long term players to watch as Meta
           | also has similar access (Insta, FB, WhatsApp).
        
             | whoisthemachine wrote:
             | On-demand GenAI could definitely change the meaning of
             | "You" in "Youtube".
        
           | eastbound wrote:
           | They have the Excel spreadsheets of all startups and
           | businesses of the world (well 50/50 with Microsoft).
           | 
           | And Atlassian has all the project data.
        
             | Keyframe wrote:
             | I still can't understand how google missed on github,
             | especially since they were in the same space before with
             | google code. I do understand how they couldn't make a
             | github though.
        
         | whyenot wrote:
         | Another advantage that Google has is the deep integration of
         | Gemini into Google Office products and Gmail. I was part of a
         | pilot group and got to use a pre-release version and it's
         | really powerful and not something that will be easy for OpenAI
         | to match.
        
           | mmaunder wrote:
           | Agreed. Once they dial in the training for sheets it's going
           | to be incredible. I'm already using notebooklm to upload
           | finance PDFs, then having it generate tabular data and
           | copypasta into sheets, but it's a garage solution compared to
           | just telling it to create or update a sheet with parsed data
           | from other sheets, PDFs, docs, etc.
           | 
           | And as far as gmail goes, I periodically try to ask it to
           | unsubscribe from everything marketing related, and not from
           | my own company, but it's not even close to being there. I
           | think there will continue to be a gap in the market for more
           | aggressive email integration with AI, given how useless email
           | has become. I know A16Z has invested in a startup working on
           | this. I doubt Gmail will integrate as deep as is possible, so
           | the opportunity will remain.
        
           | Workaccount2 wrote:
           | I frankly am in doubt of future office products. In the last
           | month I have ditched two separate excel productivity
           | templates in favor of bespoke wrappers on sqlite databases,
           | written by Claude and Gemini. Easier to use and probably 10x
           | as fast.
           | 
           | You don't need a 50 function swiss army knife when your
           | pocket can just generate the exact tool you need.
        
           | jdgoesmarching wrote:
           | You say deep integration, yet there is still no way to send a
           | Gemini Canvas to Docs without a lot of tedious copy-pasting
           | and formatting because Docs still doesn't actually support
           | markdown. Gemini in Google Office in general has been a
           | massive disappointment for all but the most simplistic of
           | writing tasks.
           | 
           | They can have the most advanced infrastructure in the world,
           | but it doesn't mean much if Google continues its infamous
           | floundering approach to product. But hey, 2.5 pro with Cline
           | is pretty nice.
        
             | whyenot wrote:
             | Maybe I'm misunderstanding, but there is literally a Share
             | button in Canvas right below each response with the option
             | to export to Docs. Within Docs, you can also click on the
             | Gemini "star" at the upper right to get a prompt and then
             | also export into the open document. Note that this is a
             | with "experimental" Gemini 2.5 Pro.
        
           | chucky_z wrote:
           | I have access to this now and I want it to work so bad and
           | it's just proper shit. Absolute rubbish.
           | 
           | They really, truly need to fix this integration. Gemini in
           | Google Docs is barely acceptable, it doesn't work at all (for
           | me) in Gmail, and I've not yet had it do _anything_ other
           | than error in Google Sheets.
        
         | zoogeny wrote:
         | If the battle was between Altman and Pichai I'd have my doubts.
         | 
         | But the battle is between Altman and Hassabis.
         | 
         | I recall some advice on investment from Buffett regarding how
         | he invests in the management team.
        
           | mdp2021 wrote:
           | Could you please expand, on both your points?
        
             | zoogeny wrote:
             | It is more gut feel than a rational or carefully reasoned
             | argument.
             | 
             | I think Pichai has been an exceptional revenue maximizer
             | but he lacks vision. I think he is probably capable of
             | squeezing tremendous revenue out of AI once it has been
             | achieved.
             | 
             | I like Hassabis in a "good vibe" way when I hear him speak.
             | He reminds me of engineers that I have worked with
             | personally and have gained my respect. He feels less like a
             | product focused leader and more of a research focused
             | leader (AlphaZero/AlphaFold) which I think will be critical
             | to continue the advances necessary to push the envelope. I
             | like his focus on games and his background in RL.
             | 
             | Google's war chest of Ad money gives Hassabis the
             | flexibility to invest in non-revenue generating directions
             | in a way that Altman is unlikely to be able to do. Altman
             | made a decision to pivot the company towards product which
             | led to the exodus of early research talent.
        
             | mmaunder wrote:
             | Note sure why their comment was downvoted. Google the
             | names. Hassabis runs DeepMind at Google which makes Gemini
             | and he's quite brilliant and has an unbelievable track
             | record. Buffet investing in teams points out that there are
             | smart people out there that think good leadership is a good
             | predictor of future success.
        
               | zoogeny wrote:
               | It may not be relevant to everyone, but it is worth
               | noting that his contribution to AlpaFold won Hassabis a
               | Nobel prize in chemistry.
        
               | mdp2021 wrote:
               | Zoogeny got downvoted? I did not do that. His comments
               | deserved more details anyway (at the level of those
               | kindly provided).
               | 
               | > _Google the names_
               | 
               | Was that a wink about the submission (a milestone from
               | Google)? Read Zoogeny's delightful reply and see whether
               | it can compare a search engine result (not to mention
               | that I asked for Zoogeny's insight, not for trivia). And
               | as a listener to Buffet and Munger, I can surely say that
               | they rarely indulge in tautologies.
        
               | zoogeny wrote:
               | I wouldn't worry about downvotes, it isn't possible on HN
               | to downvote direct replies to your message (unlike
               | reddit), so you cannot be accused of downvoting me unless
               | you did so using an alt.
               | 
               | Some people see tech like they see sports teams and they
               | vote for their tribe without considering any other
               | reason. I'm not shy stating my opinion even when it may
               | invite these kinds of responses.
               | 
               | I do think it is important for people to "do their own
               | research" and not take one man's opinion as fact. I
               | recommend people watch a few videos of Hassabis, there
               | are many, and judge his character and intelligence for
               | themselves. They may find they don't vibe with him and
               | genuinely prefer Altman.
        
         | throwup238 wrote:
         | Nobody has really talked about what I think is an advantage
         | just as powerful as the custom chips: Google Books. They
         | already won a landmark fair use lawsuit against book
         | publishers, digitized more books than anyone on earth, and used
         | their Captcha service to crowdsource its OCR. They've got the
         | best* legal cover and all of the best sources of human
         | knowledge already there. Then Youtube for video.
         | 
         | The chips of course push them over the top. I don't know how
         | much Deep Research is costing them but it's by far the best
         | experience with AI I've had so far with a generous 20/day rate
         | limit. At this point I must be using up at least 5-10 compute
         | hours a _day_. Until about a week ago I had almost completely
         | written off Google.
         | 
         | * For what it's worth, I don't know. IANAL
        
           | dynm wrote:
           | The amount of text in books is surprisingly finite. My best
           | estimate was that there are ~1013 tokens available in all
           | books (https://dynomight.net/scaling/#scaling-data), which is
           | less than frontier models are already being trained on. On
           | the other hand, book tokens are probably much "better" than
           | random internet tokens. Wikipedia for example seems to get
           | much higher weight than other sources, and it's only ~3x1010
           | tokens.
        
             | dr_dshiv wrote:
             | We need more books! On it...
        
           | paxys wrote:
           | LibGen already exists, and all the top LLM publishers use it.
           | I don't know if Google's own book index provides a big
           | technical or legal advantage.
        
         | peterjliu wrote:
         | another advantage is people want the Google bot to crawl their
         | pages, unlike most AI companies
        
           | mmaunder wrote:
           | This is an underrated comment. Yes it's a big advantage and
           | probably a measurable pain point for Anthropic and OpenAI. In
           | fact you could just do a 1% survey of robots.txt out there
           | and get a reasonable picture. Maybe a fun project for an
           | HN'er.
        
           | jiocrag wrote:
           | Excellent point. If they can figure out how to either
           | remunerate or drive traffic to third parties in conjunction
           | with this, it would be huge.
        
         | stefan_ wrote:
         | I don't know man, for months now people keep telling me on HN
         | how "Google is winning", yet no normal person I ever asked
         | knows what the fuck "Gemini" is. I don't know what they are
         | winning, it might be internet points for all I know.
         | 
         | Actually, some of the people polled recalled the Google AI
         | efforts by their expert system recommending glue on pizza and
         | smoking in pregnancy. It's a big joke.
        
           | mmaunder wrote:
           | Try uploading a bunch of PDF bank statements to notebooklm
           | and ask it questions. Or the results of blood work. It's jaw
           | dropping. e.g. uploaded 7 brokerage account statements as
           | PDFs in a mess of formats and asked it to generate table
           | summary data which it nailed, and then asked it to generate
           | actual trades to go from current position to a new position
           | in shortest path, and it nailed that too.
           | 
           | Biggest issue we have when using notebooklm is a lack of
           | ambition when it comes to the questions we're asking. And the
           | pro version supports up to 300 documements.
           | 
           | Hell, we uploaded the entire Euro Cyber Resilience Act and
           | asked the same questions we were going to ask our big name
           | legal firm, and it nailed every one.
           | 
           | But you actually make a fair point, which I'm seeing too and
           | I find quite exciting. And it's that even among my early
           | adopter and technology minded friends, adoption of the most
           | powerful AI tools is very low. e.g. many of them don't even
           | know that notebookLM exists. My interpretation on this is
           | that it's VERY early days, which is suuuuuper exciting for us
           | builders and innovators here on HN.
        
           | kube-system wrote:
           | While there are some first-party B2C applications like chat
           | front-ends built using LLMs, once mature, the end game is
           | almost certainly that these are going to be B2B products
           | integrated into other things. The future here goes a lot
           | further than ChatGPT.
        
           | shmoogy wrote:
           | That was ages ago.
           | 
           | Their new models excel at many things. Image editing, parsing
           | PDFs, and coding are what I use it for. It's significantly
           | cheaper than the closest competing models (Gemini 2.5 pro,
           | and flash experimental with image generation).
           | 
           | Highly recommend testing against openai and anthropic models
           | - you'll likely be pleasantly surprised.
        
         | labrador wrote:
         | > If the only answer for both is "we'll build it from scratch",
         | OpenAI is in very big trouble
         | 
         | They could buy Google+ code from Google and resurrect it with
         | OpenAI branding. Alternately they could partner with Bluesky
        
           | parsimo2010 wrote:
           | I don't think the issue is solving the technical
           | implementation of a new social media platform. The issue is
           | whether a new social media platform from OpenAI will deliver
           | the kind of value that existing platforms deliver. If they
           | promise investors that they'll get TikTok/Meta/YouTube levels
           | of content+interaction (and all the data that comes with it),
           | but deliver Mastodon levels, then they are in trouble.
        
       | mark_l_watson wrote:
       | Nice! Low price, even with reasoning enabled. I have been working
       | on a short new book titled "Practical AI with Google: A Solo
       | Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs" but
       | with all of Google's recent announcements it might not be a short
       | book.
        
       | serjester wrote:
       | Just ran it on one of our internal PDF (3 pages, medium
       | difficulty) to json benchmarks:
       | 
       | gemini-flash-2.0: 60 ish% accuracy 6,250 pages per dollar
       | 
       | gemini-2.5-flash-preview (no thinking): 80 ish% accuracy 1,700
       | pages per dollar
       | 
       | gemini-2.5-flash-preview (with thinking): 80 ish% accuracy (not
       | sure what's going on here) 350 pages per dollar
       | 
       | gemini-flash-2.5: 90 ish% accuracy 150 pages per dollar
       | 
       | I do wish they separated the thinking variant from the regular
       | one - it's incredibly confusing when a model parameter
       | dramatically impacts pricing.
        
         | ValveFan6969 wrote:
         | I have been having similar performance issues, I believe they
         | intentionally made a worse model (Gemini 2.5) to get more money
         | out of you. However, there is a way where you can make money
         | off of Gemini 2.5.
         | 
         | If you set the thinking parameter lower and lower, you can make
         | the model spew absolute nonsense for the first response. It
         | costs 10 cents per input / output, and sometimes you get a
         | response that was just so bad your clients will ask for more
         | and more corrections.
        
       | zoogeny wrote:
       | Google making Gemini 2.5 Pro (Experimental) free was a big deal.
       | I haven't tried the more expensive OpenAI models so I can't even
       | compare, only to the free models I have used of theirs in the
       | past.
       | 
       | Gemini 2.5 Pro is so much of a step up (IME) that I've become
       | sold on Google's models in general. It not only is smarter than
       | me on most of the subjects I engage with it, it also isn't
       | completely obsequious. The model pushes back on me rather than
       | contorting itself to find a way to agree.
       | 
       | 100% of my casual AI usage is now in Gemini and I look forward to
       | asking it questions on deep topics because it consistently
       | provides me with insight. I am building new tools with the mind
       | to optimize my usage to increase it's value to me.
        
         | PerusingAround wrote:
         | This comment is exactly my experience, I feel like as if I had
         | wrote it myself.
        
         | cjohnson318 wrote:
         | Yeah, my wife pays for ChatGPT, but Gemini is fine enough for
         | me.
        
           | qwertox wrote:
           | Just be aware that if you don't add a key (and set up
           | billing) youre granting Google the right to train on your
           | data. To have persons read them and decide how to use them
           | for training.
        
         | dr_kiszonka wrote:
         | I was a big fan of that model but it has been replaced in AI
         | Studio by its preview version, which, by comparison, is pretty
         | bad. I hope Google makes the release version much closer to the
         | experimental one.
        
           | zoogeny wrote:
           | I can confirm the model name in Run Settings has been updated
           | to "Gemini 2.5 Pro Preview ..." when it used to be "Gemini
           | 2.5 Pro (Experimental) ...".
           | 
           | I cannot confirm if the quality is downgraded since I haven't
           | had enough time with it. But if what you are saying is
           | correct, I would be very sad. My big fear is the full-fat
           | Gemini 2.5 Pro will be prohibitively expensive, but a dumbed
           | down model (for the sake of cost) would also be saddening.
        
         | jeeeb wrote:
         | After comparing Gemini Pro and Claude Sonnet 3.7 coding answers
         | side by side a few times, I decided to cancel my Anthropic
         | subscription and just stick to Gemini.
        
           | wcarss wrote:
           | Google has killed so many amazing businesses -- entire
           | industries, even, by giving people something expensive for
           | free until the competition dies, and then they enshittify
           | hard.
           | 
           | It's cool to have access to it, but please be careful not to
           | mistake corporate loss leaders for authentic products.
        
             | JPKab wrote:
             | True. They are ONLY good when they have competition. The
             | sense of complacency that creeps in is so obvious as a
             | customer.
             | 
             | To this day, the Google Home (or is it called Nest now?)
             | speaker is the only physical product i've ever owned where
             | it lost features over time. I used to be able to play the
             | audio of a Youtube video (like a podcast) through it, but
             | then Google decided that it was very very important that I
             | only be able to play a Youtube video through a device with
             | a screen, because it is imperative that I see a still image
             | when I play a longform history podcast.
             | 
             | Obviously, this is a silly and highly specific example, but
             | it is emblematic of how they neglect or enshittify massive
             | swathes of their products as soon as the executive team
             | loses interest and puts their A team on some shiny new
             | object.
        
             | mark_l_watson wrote:
             | In this case, Google is a large investor in Anthropic.
             | 
             | I agree that giving away access to expensive models long
             | term is not a good idea on several fronts. Personally, I
             | subscribe to Gemini Advanced and I pay for using the Gemini
             | APIs.
        
             | bredren wrote:
             | [delayed]
        
         | fsndz wrote:
         | More and more people are coming to the realisation that Google
         | is actually winning at the model level right now.
        
       | minimaxir wrote:
       | One hidden note from Gemini 2.5 Flash when diving deep into the
       | documentation: for image inputs, not only can the model be
       | instructed to generated 2D bounding boxes of relevant subjects,
       | but it can also create segmentation masks!
       | https://ai.google.dev/gemini-api/docs/image-understanding#se...
       | 
       | At this price point with the Flash model, creating segmentation
       | masks is pretty nifty.
       | 
       | The segmentation masks are a bit of a galaxy brain implementation
       | by generating a b64 string representing the mask:
       | https://colab.research.google.com/github/google-gemini/cookb...
       | 
       | I am trying to test it in AI Studio but it sometimes errors out,
       | likely because it tries to decode the b64 lol.
        
         | behnamoh wrote:
         | Wait, did they just kill YOLO, at least for time-insensitive
         | tasks?
        
           | minimaxir wrote:
           | YOLO is probably still cheaper if bounding boxes are your
           | main goal. Good segmentation models that work for arbitrary
           | labels, however, are much more expensive to set up and run,
           | so this type of approach could be an interesting alternative
           | depending on performance.
        
           | daemonologist wrote:
           | No, the speed of YOLO/DETR inference makes it cheap as well -
           | probably at least five or six orders of magnitude cheaper.
           | Edit: After some experimentation, Gemini also seems to not
           | perform nearly as well as a purpose-tuned detection model.
           | 
           | It'll be interesting to test this capability and see how it
           | evolves though. At some point you might be able use it as a
           | "teacher" to generate training data for new tasks.
        
         | daemonologist wrote:
         | Interestingly if you run this in Gemini (instead of AI Studio)
         | you get:                   I am sorry, but I was unable to
         | generate the segmentation masks for _ in the image due to an
         | internal error with the tool required for this task.
         | 
         | (Not sure if that's a real or hallucinated error.)
        
         | ipsum2 wrote:
         | The performance is basically so bad it's unusable though,
         | segmentation models and object detection models are still the
         | best, for now.
        
         | msp26 wrote:
         | I've had mixed results with the bounding boxes even on 2.5 pro.
         | On complex images where a lot of boxes need to be drawn they're
         | in the general region but miss the exact location of objects.
        
       | simonw wrote:
       | I spotted something interesting in the Python API library code:
       | 
       | https://github.com/googleapis/python-genai/blob/473bf4b6b5a6...
       | class ThinkingConfig(_common.BaseModel):           """The
       | thinking features configuration."""
       | include_thoughts: Optional[bool] = Field(
       | default=None,               description="""Indicates whether to
       | include thoughts in the response. If true, thoughts are returned
       | only if the model supports thought and thoughts are available.
       | """,           )           thinking_budget: Optional[int] =
       | Field(               default=None,
       | description="""Indicates the thinking budget in tokens.
       | """,           )
       | 
       | That thinking_budget thing is documented, but what's the deal
       | with include_thoughts? It sounds like it's an option to have the
       | API return the thought summary... but I can't figure out how to
       | get it to work, and I've not found documentation or example code
       | that uses it.
       | 
       | Anyone managed to get Gemini to spit out thought summaries in its
       | API using this option?
        
         | phillypham wrote:
         | They removed the docs and support for it
         | https://github.com/googleapis/python-
         | genai/commit/af3b339a9d....
         | 
         | You can see the thoughts in AI Studio UI as per
         | https://ai.google.dev/gemini-api/docs/thinking#debugging-
         | and....
        
         | lemming wrote:
         | I maintain an alternative client which I build from the API
         | definitions at https://github.com/googleapis/googleapis, which
         | according to https://github.com/googleapis/python-
         | genai/issues/345 should be the right place. But neither the AI
         | Studio nor the Vertex definitions even have ThinkingConfig yet
         | - very frustrating. In general it's amazing how much API
         | munging is required to get a working client from the public API
         | definitions.
        
         | qwertox wrote:
         | In AI Studio the flash moddels has two toggles: Enable thinking
         | and Set thinking budget. If thinking budget is enabled, you can
         | set tue max number of tokens it can use to think, else it's
         | Auto.
        
         | Deathmax wrote:
         | It is gated behind the GOOGLE_INTERNAL visibility flag, which
         | only internal Google projects and Cursor have at the moment as
         | far as I know.
        
       | deanmoriarty wrote:
       | Genuine naive question: when it comes to Google HN has generally
       | a negative view of it (pick any random story on Chrome, ads,
       | search, web, working at faang, etc. and this should be obvious
       | from the comments), yet when it comes to AI there is a somewhat
       | notable "cheering effect" for Google to win the AI race that goes
       | beyond a conventional appreciation of a healthy competitive
       | landscape, which may appear as a bit of a double standard.
       | 
       | Why is this? Is it because OpenAI is seen as such a negative
       | player in this ecosystem that Google "gets a pass on this one"?
       | 
       | And bonus question: what do people think will happen to OpenAI if
       | Google wins the race? Do you think they'll literally just go
       | bust?
        
         | antirez wrote:
         | Maybe because Google is largely responsible, paying for the
         | research, of most of the results we are seeing now. I'm not a
         | Google fan, in the web side, and in their idea of what software
         | engineering is, but they deserve to win the AI race, because
         | right now all the other players provided a lot less than what
         | Google did as public research. Also, with Gemini 2.5 PRO, there
         | was a big hype moment, because the model is of unseen ability.
        
         | 01100011 wrote:
         | Didn't Google invent the transformer?
         | 
         | I think a lot of us see Google as both an evil advertiser and
         | as an innovator. Google winning AI is sort of nostalgic for
         | those of us who once cheered the "Do No Evil"(now mostly "Do
         | Know Evil") company.
         | 
         | I also like how Google is making quiet progress while other
         | companies take their latest incremental improvement and promote
         | it as hard as they can.
        
         | pkaye wrote:
         | I think for a while some people felt the Google AI models are
         | worse but now its getting much better. On the other hand Google
         | has their own hardware so they can drive down the costs of
         | using the models so it keeps pressure on Open AI do remain cost
         | competitive. Then you have Anthropic which has very good models
         | but is very expensive. But I've heard they are working with
         | Amazon to build a data center with Amazons custom AI chips so
         | maybe they can bring down their costs. In the end all these
         | companies will need a good model and lower cost hardware to
         | succeed.
        
       | krembo wrote:
       | How is this sustainable for Google from business POV? It feels
       | like Google is shooting itself in the foot while "winning" the AI
       | race.. From my experience I think Google lost 99% of the ads it
       | used to show me before in the search engine.
        
         | tomr75 wrote:
         | someone else will do it if they don't
        
       | jdthedisciple wrote:
       | Very excited to try it, but it _is_ noteworthy that o4-mini is
       | _strictly better_ according to the very benchmarks shown by
       | Google here.
       | 
       | Of course it's about 4x as expensive too (I believe), but still,
       | given the release of openai/codex as well, o4-mini will remain a
       | strong competitor for now.
        
       ___________________________________________________________________
       (page generated 2025-04-17 23:00 UTC)