[HN Gopher] Gemini 2.5 Flash
___________________________________________________________________
Gemini 2.5 Flash
Author : meetpateltech
Score : 407 points
Date : 2025-04-17 19:03 UTC (3 hours ago)
(HTM) web link (developers.googleblog.com)
(TXT) w3m dump (developers.googleblog.com)
| xnx wrote:
| 50% price increase from Gemini 2.0 Flash. That sounds like a lot,
| but Flash is still so cheap when compared to other models of this
| (or lesser) quality. https://developers.googleblog.com/en/start-
| building-with-gem...
| akudha wrote:
| Is this cheaper than DeepSeek? Am I reading this right?
| Tiberium wrote:
| del
| Havoc wrote:
| You may want to consult Gemini on those percentage calcs .10
| to .15 is not 25%
| swyx wrote:
| done pretty much inline with the price elo pareto frontier
| https://x.com/swyx/status/1912959140743586206/photo/1
| xnx wrote:
| Love that chart! Am I imagining that I saw a version of that
| somewhere that even showed how the boundary has moved out
| over time?
| swyx wrote:
| https://x.com/swyx/status/1882933368444309723
|
| https://x.com/swyx/status/1830866865884991999 (scroll up)
| byefruit wrote:
| It's interesting that there's a price nearly 6x price difference
| between reasoning and no reasoning.
|
| This implies it's not a hybrid model that can just skip reasoning
| steps if requested.
|
| Anyone know what else they might be doing?
|
| Reasoning means contexts will be longer (for thinking tokens) and
| there's an increase in cost to inference with a longer context
| but it's not going to be 6x.
|
| Or is it just market pricing?
| vineyardmike wrote:
| Based on their graph, it does look explicitly priced along
| their "Pareto Frontier" curve. I'm guessing that is guiding the
| price more than their underlying costs.
|
| It's smart because it gives them room to drop prices later and
| compete once other company actually get to a similar quality.
| jsnell wrote:
| > This implies it's not a hybrid model that can just skip
| reasoning steps if requested.
|
| It clearly is, since most of the post is dedicated to the
| tunability (both manual and automatic) of the reasoning budget.
|
| I don't know what they're doing with this pricing, and the blog
| post does not do a good job explaining.
|
| Could it be that they're not counting thinking tokens as output
| tokens (since you don't get access to the full thinking trace
| anyway), and this is the basically amortizing the thinking
| tokens spend over the actual output tokens? Doesn't make sense
| either, because then the user has no incentive to use anything
| except 0/max thinking budgets.
| RobinL wrote:
| Does anyone know how this pricing works? Supposing I have a
| classification prompt where I need the response to be a binary
| yes/no. I need one token of output, but reasoning will
| obviously add far more than 6 additional tokens. Is it still a
| 6x price multiplier? That doesn't seem to make sense, but not
| does paying 6x more for every token including reasoning ones
| punkpeye wrote:
| This is cool, but rate limits on all of these preview models are
| PITA
| Layvier wrote:
| Agreed, it's not even possible to run an eval dataset. If
| someone from google see this please at least increase the burst
| rate limit
| punkpeye wrote:
| It is not without rate limits, but we do have elevated limits
| for our accounts through:
|
| https://glama.ai/models/gemini-2.5-flash-preview-04-17
|
| So if you just want to run evals, that should do it.
|
| Though the first couple of days after a model comes out are
| usually pretty rough because everyone try to run their evals.
| punkpeye wrote:
| What I am noticing with every new Gemini model that comes
| out is that the time to first token (TTFT) is not great. I
| guess it is because they gradually transfer computer power
| from old models to new models as the demand increases.
| Filligree wrote:
| If you're imagining that 2.5Pro gets dynamically loaded
| during the time to first token, then you're vastly
| overestimating what's physically possible.
|
| It's more likely a latency-throughput tradeoff. Your
| query might get put inside a large batch, for example.
| Layvier wrote:
| That's very interesting, thanks for sharing!
| arnaudsm wrote:
| Gemini flash models have the least hype, but in my experience in
| production have the best bang for the buck and multimodal
| tooling.
|
| Google is silently winning the AI race.
| belter wrote:
| > Google is silently winning the AI race.
|
| That is what we keep hearing here...The last Gemini I cancelled
| the account, and can't help notice the new one they are
| offering for free...
| arnaudsm wrote:
| Sorry I was talking of B2B APIs for my YC startup. Gemini is
| still far behind for consumers indeed.
| JeremyNT wrote:
| I use Gemini almost exclusively as a normal user. What am I
| missing out on that they are far behind on?
|
| It seems shockingly good and I've watched it get much
| better up to 2.5 Pro.
| arnaudsm wrote:
| Mostly brand recognition and the earlier Geminis had more
| refusals.
|
| As a consumer, I also really miss the Advanced voice mode
| of ChatGPT, which is the most transformative tech in my
| daily life. It's the only frontier model with true audio-
| to-audio.
| wavewrangler wrote:
| What do you mean miss? You don't have the budget to keep
| something you truly miss for $20? What am in missing here
| / I don't mean to criticize I am just curious is all. I
| would reword but I have to go
| Layvier wrote:
| Absolutely. So many use cases for it, and it's so
| cheap/fast/reliable
| danielbln wrote:
| I want to use these almost too cheap to meter models like
| Flash more, what are some interesting use cases for those?
| SparkyMcUnicorn wrote:
| And stellar OCR performance. Flash 2.0 is cheaper and more
| accurate than AWS Textract, Google Document AI, etc.
|
| Not only in benchmarks[0], but in my own production usage.
|
| [0] https://getomni.ai/ocr-benchmark
| Fairburn wrote:
| Sorry, but no. Gemini isn't the fastest horse, yet. And it's
| use within their ecosystem means it isn't geared to the masses
| outside of their bubble. They are not leading the race but they
| are a contender.
| spruce_tips wrote:
| i have a high volume task i wrote an eval for and was
| pleasantly surprised at 2.0 flash's cost to value ratio
| especially compared to gpt4.1-mini/nano
|
| accuracy | input price | output price
|
| Gemini Flash 2.0 Lite: 67% | $0.075 | $0.30
|
| Gemini Flash 2.0: 93% | $0.10 | $0.40
|
| GPT-4.1-mini: 93% | $0.40 | $1.60
|
| GPT-4.1-nano: 43% | $0.10 | $0.40
|
| excited to to try out 2.5 flash
| jay_kyburz wrote:
| Can I ask a serious question. What task are you writing where
| its ok to get 7% error rate. I can't get my head around how
| this can be used.
| spruce_tips wrote:
| low stakes text classification but it's something that
| needs to be done and couldnt be done in reasonable time
| frames or at reasonable price points by humans
| omneity wrote:
| In my case, I have workloads like this where it's possible
| to verify the correctness of the result after inference, so
| any success rate is better than 0 as it's possible to
| identify the "good ones".
| dist-epoch wrote:
| Not OP, but for stuff like social networks
| spam/manipulation 7% error rate is fine
| wavewrangler wrote:
| Yeah, general propaganda and psyops are actually more
| effective around 12% - 15%, we find it is more accurate
| to the user base, thus is questioned less for standing
| out more /s
| 16bytes wrote:
| There are tons of AI/ML use-cases where 7% is acceptable.
|
| Historically speaking, if you had a 15% word error rate in
| speech recognition, it would generally be considered
| useful. 7% would be performing well, and <5% would be near
| the top of the market.
|
| Typically, your error rate just needs to be below the
| usefulness threshold and in many cases the cost of errors
| is pretty small.
| 42lux wrote:
| The API is free, and it's great for everyday tasks. So yes
| there is no better bang for the buck.
| drusepth wrote:
| Wait, the API is free? I thought you had to use their web
| interface for it to be free. How do you use the API for free?
| mlboss wrote:
| using aistudio.google.com
| spruce_tips wrote:
| create an api key and dont set up billing. pretty low rate
| limits and they use your data
| dcre wrote:
| You can get an API key and they don't bill you. Free tier
| rate limits for some models (even decent ones like Gemini
| 2.0 Flash) are quite high.
|
| https://ai.google.dev/gemini-api/docs/pricing
|
| https://ai.google.dev/gemini-api/docs/rate-limits#free-tier
| NoahZuniga wrote:
| The rate limits I've encountered with free api keys has
| been way lower than the limits advertised.
| midasz wrote:
| I use Gemini 2.5 pro experimental via openrouter in my
| openwebui for free. Was using sonnet 3.7 but I don't notice
| much difference so just default to the free thing now.
| statements wrote:
| Absolutely agree. Granted, it is task dependent. But when it
| comes to classification and attribute extraction, I've been
| using 2.0 Flash with huge access across massive datasets. It
| would not be even viable cost wise with other models.
| sethkim wrote:
| How "huge" are these datasets? Did you build your own tooling
| to accomplish this?
| xnx wrote:
| Shhhh. You're going to give away the secret weapon!
| gambiting wrote:
| In my experience they are as dumb as a bag of bricks. The other
| day I asked "can you edit a picture if I upload one"
|
| And it replied "sure, here is a picture of a photo editing
| prompt:"
|
| https://g.co/gemini/share/5e298e7d7613
|
| It's like "baby's first AI". The only good thing about it is
| that it's free.
| JFingleton wrote:
| Prompt engineering is a thing.
|
| Learning how to "speak llm" will give you great results.
| There's loads of online resources that will teach you. Think
| of it like learning a new API.
| ghurtado wrote:
| > in my experience they are as dumb as a bag of bricks
|
| In my experience, anyone that describes LLMs using terms of
| actual human intelligence is bound to struggle using the
| tool.
|
| Sometimes I wonder if these people enjoy feeling "smarter"
| when the LLM fails to give them what they want.
| mdp2021 wrote:
| If those people are a subset of those who demand actual
| intelligence, they will very often feel frustrated.
| nowittyusername wrote:
| Its because google hasn't realized the value of training the
| model on information about its own capabilities and metadata.
| My biggest pet peeve about google and the way they train
| these models.
| rvz wrote:
| Google always has been winning the AI race as soon as DeepMind
| was properly put to use to develop their AI models, instead of
| the ones that built Bard (Google AI team).
| GaggiX wrote:
| Flash models are really good even for an end user because how
| fast and good performance they have.
| ghurtado wrote:
| I know it's a single data point, but yesterday I showed it a
| diagram of my fairly complex micropython program, (including
| RP2 specific features, DMA and PIO) and it was able to describe
| in detail not just the structure of the program, but also
| exactly what it does and how it does it. This is before seeing
| a single like of code, just going by boxes and arrows.
|
| The other AIs I have shown the same diagram to, have all
| struggled to make sense of it.
| redbell wrote:
| > Google is silently winning the AI race
|
| Yep, I agree! This convinced me:
| https://news.ycombinator.com/item?id=43661235
| ramesh31 wrote:
| >"Google is silently winning the AI race."
|
| It's not surprising. What was surprising honestly was how they
| were caught off guard by OpenAI. It feels like in 2022 just
| about all the big players had a GPT-3 level system in the works
| internally, but SamA and co. knew they had a winning hand at
| the time, and just showed their cards first.
| wkat4242 wrote:
| True and their first mover advantage still works pretty well.
| Despite "ChatGPT" being a really uncool name in terms of
| marketing. People remember it because they were the first to
| wow them.
| russellbeattie wrote:
| I have to say, I never doubted it would happen. They've been at
| the forefront of AI and ML for well over a decade. Their
| scientists were the authors of the "Attention is all you need"
| paper, among thousands of others. A Google Scholar search
| produces endless results. There just seemed to be a disconnect
| between the research and product areas of the company. I think
| they've got that worked out now.
|
| They're getting their ass kicked in court though, which might
| be making them much less aggressive than they would be
| otherwise, or at least quieter about it.
| Nihilartikel wrote:
| 100% agree. I had Gemini flash 2 chew through thousands of
| points of nasty unstructured client data and it did a 'better
| than human intern' level conversion into clean structured
| output for about $30 of API usage. I am sold. 2.5 pro
| experimental is a different league though for coding. I'm
| leveraging it for massive refactoring now and it is almost
| magical.
| jdthedisciple wrote:
| > thousands of points of nasty unstructured client data
|
| What I always wonder in these kinds of cases is: What makes
| you confident the AI actually did a good job since presumably
| you haven't looked at the thousands of client data yourself?
|
| For all you know it made up 50% of the result.
| no_wizard wrote:
| I remember everyone saying its a two horse race between Google
| and OpenAI, then DeepSeek happened.
|
| Never count out the possibility of a dark horse competitor
| ripping the sod right out from under
| transformi wrote:
| Bad day is going on google.
|
| First the decleration of illegal monopoly..
|
| and now... Google's latest innovation: programmable overthinking.
|
| With Gemini 2.5 Flash, you too can now set a thinking_budget--
| because nothing says "state-of-the-art AI" like manually capping
| how long it's allowed to reason. Truly the dream: debugging a
| production outage at 2am wondering if your LLM didn't answer
| correctly because you cheaped out on tokens. lol.
|
| "Turn thinking off for better performance." That's not a model
| config, that's a metaphor for Google's entire AI strategy lately.
|
| At this point, Gemini isn't an AI product--it's a latency-cost-
| quality compromise simulator with a text interface. Meanwhile,
| OpenAI and Anthropic are out here just... cooking the benchmarks
| danielbln wrote:
| Google's Gemini 2.5 pro model is incredibly strong, it's en par
| and at times better than Claude 3.7 in coding performance,
| being able to ingest entire videos into the context is
| something I haven seen elsewhere either. Google AI products
| have been anywhere between bad (Bard) to lackluster (Gemini
| 1.5), but 2.5 is a contender, in all dimensions. Google is also
| the only player that owns the entire stack, from research,
| software , data, compute hardware. I think they were slow to
| start but they've closed the gap since.
| bsmith wrote:
| Using AI to debug code at 2am sounds like pure insanity.
| mring33621 wrote:
| the new normal
| spiderice wrote:
| They're suggesting you'll be up at 2am debugging code because
| your AI code failed. Not that you'll be using AI to do the
| debugging.
| hmaxwell wrote:
| I did some testing this morning:
|
| Prompt: "can you find any mistakes on my codebase? I put one in
| there on purpose" + 70,000 tokens of codebase where in one line I
| have an include for a non-existent file.
|
| Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race
| condition in the api of the admin interface that would be
| triggered if two admins were to change the room order at the same
| time. Claude suggested I group all sql queries in a single
| transaction. I looked at the code and found that it already used
| a transaction for all queries. I said: the order_update api is
| already done with a transaction. Claude replied: "You're
| absolutely right, and I apologize for my mistake. I was incorrect
| to claim there was a race condition issue. The transaction
| ensures atomicity and consistency of the updates, and the SQL
| queries are properly structured for their intended purpose."
|
| Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin
| ui javascript code that suggested a potential alternative to
| event handler cleanup that was not implemented because I decided
| to go with a cleaner route. Then asked "Is this the issue you
| intentionally included, or would you like me to look for other
| potential problems?" I said: "The comment merely suggests an
| alternative, right?" claude said: "Yes, you're absolutely right.
| The comment is merely suggesting an alternative approach that
| isn't being used in the code, rather than indicating a mistake.
| So there's no actual bug or mistake in this part of the code -
| just documentation of different possible approaches. I apologize
| for misinterpreting this as an issue!"
|
| Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of
| the database to generate QR codes in the admin interface, Claude
| says that my code both attempts to generate QR codes with
| undefined data AS WELL AS saying that my error handling skips
| undefined data. Claude contradicts itself within 2 sentences.
| When asking about clarification Claude replies: Looking at the
| code more carefully, I see that the code actually has proper
| error handling. I incorrectly stated that it "still attempts to
| call generateQRCode()" in the first part of my analysis, which
| was wrong. The code properly handles the case when there's no
| data-room attribute.
|
| Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional
| error and said I should stop putting db creds/api keys into the
| codebase.
|
| Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional
| error and said I should stop putting db creds/api keys into the
| codebase.
|
| Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional
| error and said I should stop putting db creds/api keys into the
| codebase.
|
| o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you
| submitted was too long, please reload the conversation and submit
| something shorter."
| Tiberium wrote:
| The thread is about 2.5 Flash though, not 2.5 Pro. Maybe you
| can try again with 2.5 Flash specifically? Even though it's a
| small model.
| airstrike wrote:
| Have you tried Claude Code?
| danielbln wrote:
| Those responses are very Claude, to. 3.7 has powered our
| agentic workflows for weeks, but I've been using almost only
| Gemini for the last week and feel the output is better
| generally. It's gotten much better at agentic workflows (using
| 2.0 in an agent setup was not working well at all) and I prefer
| its tuning over Clause's, more to the point and less
| meandering.
| rendang wrote:
| 3 different answers in 3 tries for Claude? Makes me curious how
| many times you'd get the same answer if you asked 10/20/100
| times
| bambax wrote:
| > _codebase where in one line I have an include for a non-
| existent file_
|
| Ok but you don't need AI for this; almost any IDE will issue a
| warning for that kind of error...
| Workaccount2 wrote:
| OpenAI might win the college students but it looks like Google
| will lock in enterprise.
| xnx wrote:
| ChatGPT seems to have a name recognition / first-mover
| advantage with college students now, but is there any reason to
| think that will stick when today's high school students are
| using Gemini on their Chromebooks?
| gundmc wrote:
| Funny you should say that. Google just announced today that
| they are giving all college students one year of free Gemini
| advanced. I wonder how much that will actually move the needle
| among the youth.
| Workaccount2 wrote:
| My guess is that they will use it and still call it
| "ChatGPT"...
| xnx wrote:
| Chat Gemini Pretrained Transformer
| tantalor wrote:
| Pass the Kleenex. Can I get a Band-Aid? Here's a Sharpie. I
| need a Chapstick. Let me Xerox that. Toss me that Frisbee.
| drob518 wrote:
| Exactly.
| drob518 wrote:
| And every professor just groaned at the thought of having to
| read yet another AI-generated term paper.
| jay_kyburz wrote:
| They should just get AI to mark them. I genuinely think
| this is one thing AI would do better than humans.
| mdp2021 wrote:
| Grading papers definitely requires intelligence.
| jay_kyburz wrote:
| My partner marked a PHD thesis yesterday and there was a
| spelling mistake in the title.
|
| There is some level of analysis and feedback than an LLM
| could provide before a human reviews it. Even if it's
| just a fancy spelling checker.
| bufferoverflow wrote:
| Take-home assignments are basically obsolete. Students who
| want to cheat, can do so easily. Of course, in the end,
| they cheat themselves, but that's not the point.
| anovick wrote:
| * Only in the U.S.
| superfrank wrote:
| Is there really lock in with AI models?
|
| I built a product that uses and LLM and I got curious about the
| quality of the output from different models. It took me a
| weekend to go from just using OpenAI's API to having Gemini,
| Claude, and DeepSeek all as options and a lot of that time was
| research on what model from each provider that I wanted to use.
| pydry wrote:
| For enterprise practically any SaaS gets used as one more
| thing to lock them into a platform they already have a
| relationship with (either AWS, GCP or Azure).
|
| It's actually pretty dangerous for the industry to have this
| much vertical integration. Tech could end up like the car
| industry.
| superfrank wrote:
| I'm aware of that. I'm an EM for a large tech company that
| sells multiple enterprise SaaS product.
|
| You're right that the lock in happens because of
| relationships, but most big enterprise SaaS companies have
| relationships with multiple vendors. My company
| relationships with AWS, Azure, and GCP and we're currently
| using products from all of them in different products. Even
| on my specific product we're using all three.
|
| When you've already got those relationships, the lock in is
| more about switching costs. The time it takes to switch,
| the knowledge needed to train people internally on the
| differences after the switch, and the actual cost of the
| new service vs the old one.
|
| With AI models the time to switch from OpenAI to Gemini is
| negligible and there's little retraining needed. If the
| Google models (now or in the future) are comparable in
| price and do a better job than OpenAI models, I don't see
| where the lock in is coming from.
| drob518 wrote:
| There isn't much of a lock-in, and that's part of the problem
| the industry is going to face. Everyone is spending gobs of
| money on training and if someone else creates a better one
| next week, the users can just swap it right in. We're going
| to have another tech crash for AI companies, similar to what
| happened in 2001 for .coms. Some will be winners but they
| won't all be.
| ein0p wrote:
| How will it lock in the enterprise if its market share of
| enterprise customers is half that of Azure (Azure also sells
| OpenAI inference, btw), and one third that of AWS?
| kccqzy wrote:
| The same reason why people enjoy BigQuery enough that their
| only use of GCP is BigQuery while they put their general
| compute spend on AWS.
|
| In other words, I believe talking about cloud market share as
| a whole is misleading. One cloud could have one product
| that's so compelling that people use that one product even
| when they use other clouds for more commoditized products.
| asadm wrote:
| funny thing about younglings, they will migrate to something
| else as fast as they came to you.
| drob518 wrote:
| I read about that on Facebook.
| Oras wrote:
| Enterprise has already been won by Microsoft (Azure), which
| runs on OpenAI.
| r00fus wrote:
| That isn't what I'm seeing with my clientele (lots of
| startups and mature non-tech companies). Most are using Azure
| but very few have started to engage AI outside the periphery.
| edaemon wrote:
| It seems more and more like AI is less of a product and more of
| a feature. Most people aren't going to care or even know about
| the model or the company who made it, they're just going to use
| the AI features built into the products they already use.
| statements wrote:
| Interesting to note that this might be the only model with
| knowledge cut off as recent as 2025 January
| Tiberium wrote:
| Gemini 2.5 Pro has the same knowledge cutoff specified, but in
| reality on more niche topics it's still limited to ~middle of
| 2024.
| brightball wrote:
| Isn't Grok 3 basically real time now?
| Tiberium wrote:
| That's the web version (which has tools like search plugged
| in), other models in their official frontends (Gemini on
| gemini.google.com, GPT/o models on chatgpt.com) are also
| "real time". But when served over API, most of those models
| are just static.
| bearjaws wrote:
| No LLM is real time, and in fact, even a 2025 cut off isn't
| entirely realistic. Without guidance to say, a new version of
| a framework it will frequently "reference" documentation from
| old versions and use that.
|
| It's somewhat real time when it searches the web, of course
| that data is getting populated into context rather than in
| training.
| jiocrag wrote:
| Not at all. The model weights and training data remain the
| same, it's just RAG'ing real-time twitter data into its
| context window when returning results. It's like a worse
| version of Perplexity.
| ein0p wrote:
| Absolutely decimated on metrics by o4-mini, straight out of the
| gate, and not even that much cheaper on output tokens (o4-mini's
| thinking can't be turned off IIRC).
| gundmc wrote:
| It's good to see some actual competition on this price range! A
| lot of Flash 2.5's edge will depend on how well the dynamic
| reasoning works. It's also helpful to have _significantly_
| lower input token cost for a large context use cases.
| rfw300 wrote:
| o4-mini does look to be a better model, but this is actually a
| lot cheaper! It's ~7x cheaper for both input and output tokens.
| ein0p wrote:
| These small models only make sense with "thinking" enabled.
| And once you enable that, much of the cost advantage
| vanishes, for output tokens.
| overfeed wrote:
| > These small models only make sense with "thinking"
| enabled
|
| This entirely depends on your use-cases.
| vessenes wrote:
| o4-mini costs 8x as much as 2.5 flash. I believe its useful
| context window is also shorter, although I haven't verified
| this directly.
| mccraveiro wrote:
| 2.5 flash with reasoning is just 20% cheaper than o4-mini
| vessenes wrote:
| Good point: reasoning costs more. Also impossible to tell
| without tests is how verbose the reasoning mode is
| mupuff1234 wrote:
| Not sure "decimated" is a fitting word for "slightly higher
| performance on some benchmarks".
| fwip wrote:
| Perhaps they were using the original meaning of "one-tenth
| destroyed." :P
| ein0p wrote:
| 66.8% error rate reduction for o4-mini on AIME2025, and 21%
| error rate reduction on MMMU isn't "slightly higher". It'll
| be quite noticeable in practice.
| kfajdsl wrote:
| Anecdotally o4-mini doesn't perform as well on video
| understanding tasks in our pipeline, and also in Cursor it
| seems really not great.
|
| During one session, it read the same file (same lines) several
| times, ran 'python -c 'print("skip!")'' for no reason, and then
| got into another file reading loop. Then after asking a
| hypothetical about the potential performance implications of
| different ffmpeg flags, it claimed that it ran a test and
| determined conclusively that one particular set was faster,
| even though it hadn't even attempted a tool call, let alone
| have the results from a test that didn't exist.
| xbmcuser wrote:
| For a non programmer like me google is becoming shockingly good.
| It is giving working code the first time. I was playing around
| with it asked it to write code to scrape some data of a website
| to analyse. I was expecting it to write something that would
| scrape the data and later I would upload the data to it to
| analyse. But it actually wrote code that scraped and analysed the
| data. It was basic categorizing and counting of the data but I
| was not expecting it to do that.
| kccqzy wrote:
| That's the opposite experience of my wife who's in tech but
| also a non programmer. She wanted to ask Gemini to write code
| to do some basic data analysis things in a more automated way
| than Excel. More than once, Gemini wrote a long bash script
| where some sed invocations are just plain wrong. More than once
| I've had to debug Gemini-written bash scripts. As a programmer
| I knew how bash scripts aren't great for readability so I told
| my wife to ask Gemini to write Python. It resulted in higher
| code quality, but still contained bugs that are impossible for
| a non programmer to fix. Sometimes asking a follow up about the
| bugs would cause Gemini to fix it, but doing so repeatedly will
| result in Gemini forgetting what's being asked or simply
| throwing an internal error.
|
| Currently IMO you have to be a programmer to use Gemini to
| write programs effectively.
| sbarre wrote:
| I've found that good prompting isn't just about asking for
| results but also giving hints/advice/direction on how to go
| about the work.
|
| I suspect that if Gemini is giving you bash scripts it's
| because you're note giving it enough direction. As you
| pointed out, telling it to use Python, or giving it more
| expectations about how to go about the work or how the output
| should be, will give better results.
|
| When I am prompting for technical or data-driven work, I tend
| to almost walk through what I imagine the process would be,
| including steps, tools, etc...
| xbmcuser wrote:
| I had similar experiences few months back that is why I am
| saying it is becoming shockingly good the 2.5 is a lot better
| than the 2.0 version. Another thing I have realized just like
| google search in the past your query has a lot to do with the
| results you get. So an example of what you want works at
| getting better results
| ac29 wrote:
| > I am saying it is becoming shockingly good the 2.5 is a
| lot better than the 2.0 version
|
| Are you specifically talking about 2.5 Flash? It only came
| out an hour ago, I dont know how you would have enough
| experience with it already to come to your conclusion.
|
| (I am very impressed with 2.5 Pro, but that is a different
| model that's been available for several weeks now)
| xbmcuser wrote:
| I am talking about 2.5 Pro
| 999900000999 wrote:
| Let's hope that's the case for a while.
|
| I want to be able to just tell chat GPT or whatever to create
| a full project for me, but I know the moment it can do that
| without any human intervention, I won't be able to find a
| job.
| drob518 wrote:
| IMO, the only thing that's consistent about AIs is how
| inconsistent they are. Sometimes, I ask them to write code
| and I'm shocked at how well it works. Other times, I feel
| like I'm trying to explain to a 5-year-old Alzheimer's
| patient what I want and it just can't seem to do the simplest
| stuff. And it's the same AI in both cases.
| SweetSoftPillow wrote:
| It must have something to do with the way your wife is
| prompting. I've noticed this with my friends too. I usually
| get working code from Gemini 2.5 Pro on the first try, and
| with a couple of follow-up prompts, it often improves
| significantly, while my friends seem to struggle
| communicating their ideas to the AI and get worse results.
|
| Good news: Prompting is a skill you can develop.
| halfmatthalfcat wrote:
| Or we can just learn to write it ourselves in the same
| amount of time /shrug
| viraptor wrote:
| If you're going to need scripts like that every week -
| sure. If you need it once a year on average... not
| likely. There's a huge amount of things we could learn
| but do them so infrequently that we outsource it to other
| people.
| gregorygoc wrote:
| Is there a website with off the shelf prompts that work?
| Workaccount2 wrote:
| There is definitely an art to doing it, but the ability is
| definitely there even if you don't know the language at all.
|
| I have a few programs now that are written in Python (2 by
| 3.7, one by 2.5) used for business daily, and I can tell you
| I didn't, and frankly couldn't, check a single line of code.
| One of them is ~500 LOC, the other two are 2200-2700 LOC.
| ant6n wrote:
| Last time I tried Gemini, it messed with my google photo data
| plan and family sharing. I wish I could try the AI separate
| from my Google account.
| jsnell wrote:
| > I wish I could try the AI separate from my Google account.
|
| If that's a concern, just create another account. Doesn't
| even require using a separate browser profile, you can be
| logged into multiple accounts at once and use the account
| picker in the top right of most their apps to switch.
| ModernMech wrote:
| I've been continually disappointed. I've been told it's getting
| exponentially better and we won't be able to keep up with how
| good they get, but I'm not convinced. I'm using them every
| single day and I'm never shocked or awed by its competence, but
| instead continually vexxed that isn't not living up to the hype
| I keep reading.
|
| Case in point: there was a post here recently about
| implementing a JS algorithm that highlighted headings as you
| scrolled (side note: can anyone remember what the title was? I
| can't find it again), but I wanted to test the LLM for that
| kind of task.
|
| Pretty much no matter what I did, I couldn't get it to give me
| a solution that would highlight all of the titles down to the
| very last one.
|
| I knew what the problem was, but even guiding the AI, it
| couldn't fix the code. I tried multiple AIs, different
| strategies. The best I could come up with was to guide it step
| by step on how to fix the code. Even telling it _exactly_ what
| the problem was, it couldn 't fix it.
|
| So this goes out to the "you're prompting it wrong" crowd...
| Can you show me a prompt or a conversation that will get an AI
| to spit out working code for this task: JavaScript that will
| highlighting headings as you scroll, to the very last one. The
| challenge is to prompt it to do this without telling it how to
| implement it.
|
| I figure this should be easy for the AI because this kind of
| thing is very standard, but maybe I'm just holding it wrong?
| jsnell wrote:
| Even as a human programmer I don't actually understand your
| description of the problem well enough to be confident I
| could correctly guess your intent.
|
| What do you mean by "highlight as you scroll"? I guess you
| want a single heading highlighted at a time, and it should be
| somehow depending on the viewport. But even that is
| ambiguous. Do you want the topmost heading in the viewport?
| The bottom most? Depending on scroll direction?
|
| This is what I got one-shot from Gemini 2.5 Pro, with my best
| guess at what you meant:
| https://gemini.google.com/share/d81c90ab0b9f
|
| It seems pretty good. Handles scrolling via all possible
| ways, does the highlighting at load too so that the
| highlighting is in effect for the initial viewport too.
|
| The prompt was "write me some javascript that higlights the
| topmost heading (h1, h2, etc) in the viewport as the document
| is scrolled in any way".
|
| So I'm thinking your actual requirements are very different
| than what you actually wrote. That might explain why you did
| not have much luck with any LLMs.
| croemer wrote:
| "Overengineered anchor links":
| https://news.ycombinator.com/item?id=43570324
| __alexs wrote:
| Does billing for the API actually work properly yet?
| alecco wrote:
| Gemini models are very good but in my experience they tend to
| overdo the problems. When I give it things for context and
| something to rework, Gemini often reworks the problem.
|
| For software it is barely useful because you want small commits
| for specific fixes not a whole refactor/rewrite. I tried many
| prompts but it's hard. Even when I give it function signatures of
| the APIs the code I want to fix uses, Gemini rewrites the API
| functions.
|
| If anybody knows a prompt hack to avoid this, I'm all ears.
| Meanwhile I'm staying with Claude Pro.
| byearthithatius wrote:
| Yes, it will add INSANE amounts of "robust error handling" to
| quick scripts where I can be confident about assumptions. This
| turns my clean 40 lines of Python where I KNOW the JSONL I am
| parsing is valid into 200+ lines filled with ten new try except
| statements. Even when I tell it not to do this, it loves to
| "find and help" in other ways. Quite annoying. But overall it
| is pretty dang good. It even spotted a bug I missed the other
| day in a big 400+ line complex data processing file.
| zhengyi13 wrote:
| I wonder how much of that sort of thing is driven by having
| trained their models on their own internal codebases? Because
| if that's the case, careful and defensive being the default
| would be unsurprising.
| stavros wrote:
| I didn't realize this was a bigger trend, I asked it to write
| a simple testing script that POSTed a string to a local HTTP
| server as JSON, and it wrote a 40 line script, handling any
| possible error. I just wanted two lines.
| ks2048 wrote:
| If this announcement is targeting people not up-to-date on the
| models available, I think they should say what "flash" means. Is
| there a "Gemini (non-flash)"?
|
| I see the 4 Google model names in the chart here. Are these 4 the
| main "families" of models to choose from?
|
| - Gemini-Pro-Preview
|
| - Gemini-Flash-Preview
|
| - Gemini-Flash
|
| - Gemini-Flash-Lite
| mwest217 wrote:
| Gemini has had 4 families of models, in order of decreasing
| size:
|
| - Ultra
|
| - Pro
|
| - Flash
|
| - Flash-Lite
|
| Versions with `-Preview` at the end haven't had their "official
| release" and are technically in some form of "early access"
| (though I'm not totally clear on exactly what that means given
| that they're fully available and as of 2.5 Pro Preview, have
| pricing attached to them - earlier versions were free during
| Preview but had pretty strict rate limiting but now it seems
| that Preview models are more or less fully usable).
| drob518 wrote:
| Is GMail still in beta?
| mring33621 wrote:
| so Sigma...
| jsnell wrote:
| The free-with-small-rate-limits designator was
| "experimental", not "preview".
|
| I _think_ the distinction between preview and full release is
| that the preview models have no guarantees on how long they
| 'll be available, the full release comes with a pre-set
| discontinuation date. So if want the stability for a
| production app, you wouldn't want to use a preview model.
| AStonesThrow wrote:
| I've been leveraging the services of 3 LLMs, mainly: Meta,
| Gemini, and Copilot.
|
| It depends on what I'm asking. If I'm looking for answers in the
| realm of history or culture, religion, or I want something
| creative such as a cute limerick, or a song or dramatic script,
| I'll ask Copilot. Currently, Copilot has two modes: "Quick
| Answer"; or "Think Deeply", if you want to wait about 30 seconds
| for a good answer.
|
| If I want info on a product, a business, an industry or a field
| of employment, or on education, technology, etc., I'll inquire of
| Gemini.
|
| Both Copilot and Gemini have interactive voice conversation
| modes. Thankfully, they will also write a transcript of what we
| said. They also eagerly attempt to engage the user with further
| questions and followups, with open questions such as "so what's
| on your mind tonight?"
|
| And if I want to know about pop stars, film actors, the social
| world or something related to tourism or recreation in general, I
| can ask Meta's AI through [Facebook] Messenger.
|
| One thing I found to be extremely helpful and accurate was
| Gemini's tax advice. I mean, it was way better than human beings
| at the entry/poverty level. Commercial tax advisors, even when
| I'd paid for the Premium Deluxe Tax Software from the Biggest
| Name, they just went to Google stuff for me. I mean, they didn't
| even seem to know where stuff was on irs.gov. When I asked for a
| virtual or phone appointment, they were no-shows, with a litany
| of excuses. I visited 3 offices in person; the first two were
| closed, and the third one basically served Navajos living off the
| reservation.
|
| So when I asked Gemini about tax information -- simple stuff like
| the terminology, definitions, categories of income, and things
| like that -- Gemini was perfectly capable of giving lucid
| answers. And citing its sources, so I could immediately go find
| the IRS.GOV publication and read it "from the horse's mouth".
|
| Oftentimes I'll ask an LLM just to jog my memory or inform me of
| what specific terminology I should use. Like "Hey Gemini, what's
| the PDU for Ethernet called?" and when Gemini says it's a "frame"
| then I have that search term I can plug into Wikipedia for
| further research. Or, for an introduction or overview to topics
| I'm unfamiliar with.
|
| LLMs are an important evolutionary step in the general-purpose
| "search engine" industry. One problem was, you see, that it was
| dangerous, annoying, or risky to go Googling around and click on
| all those tempting sites. Google knew this: the dot-com sites and
| all the SEO sites that surfaced to the top were traps, they were
| bait, they were sometimes legitimate scams. So the LLM providers
| are showing us that we can stay safe in a sandbox, without
| clicking external links, without coughing up information about
| our interests and setting cookies and revealing our IPv6
| addresses: we can safely ask a local LLM, or an LLM in a trusted
| service provider, about whatever piques our fancy. And I am glad
| for this. I saw y'all complaining about how every search engine
| was worthless, and the Internet was clogged with blogspam, and
| there was no real information anymore. Well, perhaps LLMs, for
| now, are a safe space, a sandbox to play in, where I don't need
| to worry about drive-by-zero-click malware, or being inundated
| with Joomla ads, or popups. For now.
| cynicalpeace wrote:
| 1. The main transformative aspect of LLMs has been in writing
| code.
|
| 2. LLMs have had less transformative aspects in 2025 than we
| anticipated back in late 2022.
|
| 3. LLMs are unlikely to be very transformative to society, even
| as their intelligence increases, because intelligence is a minor
| changemaker in society. Bigger changemakers are motivation,
| courage, desire, taste, power, sex and hunger.
|
| 4. LLMs are unlikely to develop these more important traits
| because they are trained on text, not evolved in a rigamarole of
| ecological challenges.
| charcircuit wrote:
| 500 RPD for the free tier is good enough for my coding needs.
| Nice.
| AbuAssar wrote:
| I noticed that OpenAI don't compare their models to third party
| models in their announcement posts, unlike google, meta and the
| others.
| jskherman wrote:
| They're doing the Apple strategy. Less spotlight for other
| third parties, and less awareness how they're lagging behind so
| that those already ignorantly locked into OpenAI would not
| switch. But at this point why would anyone do that when
| switching costs are low?
| mmaunder wrote:
| More great innovation from Google. OpenAI have two major
| problems.
|
| The first is Google's vertically integrated chip pipeline and
| deep supply chain and operational knowledge when it comes to
| creating AI chips and putting them into production. They have a
| massive cost advantage at every step. This translates into more
| free services, cheaper paid services, more capabilities due to
| more affordable compute, and far more growth.
|
| Second problem is data starvation and the unfair advantage that
| social media has when it comes to a source of continually
| refreshed knowledge. Now that the foundational model providers
| have churned through the common crawl and are competing to
| consume things like video and whatever is left, new data is
| becoming increasingly valuable as a differentiator, and more
| importantly, as a provider of sustained value for years to come.
|
| SamA has signaled both of these problems when he made noises
| about building a fab a while back and is more recently making
| noises about launching a social media platform off OpenAI. The
| smart money among his investors know these issues to be
| fundamental in deciding if OAI will succeed or not, and are
| asking the hard questions.
|
| If the only answer for both is "we'll build it from scratch",
| OpenAI is in very big trouble. And it seems that that is the best
| answer that SamA can come up with. I continue to believe that
| OpenAI will be the Netscape of the AI revolution.
|
| The win is Google's for the taking, if they can get out of their
| own way.
| jbverschoor wrote:
| Except that they train their model even when you pay. So yeah..
| I'd rather not use their "evil"
| dayvigo wrote:
| Source?
| Keyframe wrote:
| Google has the data and has the hardware, not to mention
| software and infrastructure talent. Once this Bismarck turns
| around and it looks like it is, who can parry it for real? They
| have internet.zip and all the previous versions as well, they
| have youtube, email, search, books, traffic, maps and business
| on it, phones and habits around it, even the OG social network,
| the usenet. It's a sleeping giant starting to wake up and it's
| already causing commotion, let's see what it does when it
| drinks morning coffee.
| kriro wrote:
| Agreed. One of Google's big advantages is the data access and
| integrations. They are also positioned really well for the
| "AI as entertainment" sector with youtube which will be huge
| (imo). They also have the knowledge in adtech and well
| injecting adds into AI is an obvious play. As is harvesting
| AI chat data.
|
| Meta and Google are the long term players to watch as Meta
| also has similar access (Insta, FB, WhatsApp).
| whoisthemachine wrote:
| On-demand GenAI could definitely change the meaning of
| "You" in "Youtube".
| eastbound wrote:
| They have the Excel spreadsheets of all startups and
| businesses of the world (well 50/50 with Microsoft).
|
| And Atlassian has all the project data.
| Keyframe wrote:
| I still can't understand how google missed on github,
| especially since they were in the same space before with
| google code. I do understand how they couldn't make a
| github though.
| whyenot wrote:
| Another advantage that Google has is the deep integration of
| Gemini into Google Office products and Gmail. I was part of a
| pilot group and got to use a pre-release version and it's
| really powerful and not something that will be easy for OpenAI
| to match.
| mmaunder wrote:
| Agreed. Once they dial in the training for sheets it's going
| to be incredible. I'm already using notebooklm to upload
| finance PDFs, then having it generate tabular data and
| copypasta into sheets, but it's a garage solution compared to
| just telling it to create or update a sheet with parsed data
| from other sheets, PDFs, docs, etc.
|
| And as far as gmail goes, I periodically try to ask it to
| unsubscribe from everything marketing related, and not from
| my own company, but it's not even close to being there. I
| think there will continue to be a gap in the market for more
| aggressive email integration with AI, given how useless email
| has become. I know A16Z has invested in a startup working on
| this. I doubt Gmail will integrate as deep as is possible, so
| the opportunity will remain.
| Workaccount2 wrote:
| I frankly am in doubt of future office products. In the last
| month I have ditched two separate excel productivity
| templates in favor of bespoke wrappers on sqlite databases,
| written by Claude and Gemini. Easier to use and probably 10x
| as fast.
|
| You don't need a 50 function swiss army knife when your
| pocket can just generate the exact tool you need.
| jdgoesmarching wrote:
| You say deep integration, yet there is still no way to send a
| Gemini Canvas to Docs without a lot of tedious copy-pasting
| and formatting because Docs still doesn't actually support
| markdown. Gemini in Google Office in general has been a
| massive disappointment for all but the most simplistic of
| writing tasks.
|
| They can have the most advanced infrastructure in the world,
| but it doesn't mean much if Google continues its infamous
| floundering approach to product. But hey, 2.5 pro with Cline
| is pretty nice.
| whyenot wrote:
| Maybe I'm misunderstanding, but there is literally a Share
| button in Canvas right below each response with the option
| to export to Docs. Within Docs, you can also click on the
| Gemini "star" at the upper right to get a prompt and then
| also export into the open document. Note that this is a
| with "experimental" Gemini 2.5 Pro.
| chucky_z wrote:
| I have access to this now and I want it to work so bad and
| it's just proper shit. Absolute rubbish.
|
| They really, truly need to fix this integration. Gemini in
| Google Docs is barely acceptable, it doesn't work at all (for
| me) in Gmail, and I've not yet had it do _anything_ other
| than error in Google Sheets.
| zoogeny wrote:
| If the battle was between Altman and Pichai I'd have my doubts.
|
| But the battle is between Altman and Hassabis.
|
| I recall some advice on investment from Buffett regarding how
| he invests in the management team.
| mdp2021 wrote:
| Could you please expand, on both your points?
| zoogeny wrote:
| It is more gut feel than a rational or carefully reasoned
| argument.
|
| I think Pichai has been an exceptional revenue maximizer
| but he lacks vision. I think he is probably capable of
| squeezing tremendous revenue out of AI once it has been
| achieved.
|
| I like Hassabis in a "good vibe" way when I hear him speak.
| He reminds me of engineers that I have worked with
| personally and have gained my respect. He feels less like a
| product focused leader and more of a research focused
| leader (AlphaZero/AlphaFold) which I think will be critical
| to continue the advances necessary to push the envelope. I
| like his focus on games and his background in RL.
|
| Google's war chest of Ad money gives Hassabis the
| flexibility to invest in non-revenue generating directions
| in a way that Altman is unlikely to be able to do. Altman
| made a decision to pivot the company towards product which
| led to the exodus of early research talent.
| mmaunder wrote:
| Note sure why their comment was downvoted. Google the
| names. Hassabis runs DeepMind at Google which makes Gemini
| and he's quite brilliant and has an unbelievable track
| record. Buffet investing in teams points out that there are
| smart people out there that think good leadership is a good
| predictor of future success.
| zoogeny wrote:
| It may not be relevant to everyone, but it is worth
| noting that his contribution to AlpaFold won Hassabis a
| Nobel prize in chemistry.
| mdp2021 wrote:
| Zoogeny got downvoted? I did not do that. His comments
| deserved more details anyway (at the level of those
| kindly provided).
|
| > _Google the names_
|
| Was that a wink about the submission (a milestone from
| Google)? Read Zoogeny's delightful reply and see whether
| it can compare a search engine result (not to mention
| that I asked for Zoogeny's insight, not for trivia). And
| as a listener to Buffet and Munger, I can surely say that
| they rarely indulge in tautologies.
| zoogeny wrote:
| I wouldn't worry about downvotes, it isn't possible on HN
| to downvote direct replies to your message (unlike
| reddit), so you cannot be accused of downvoting me unless
| you did so using an alt.
|
| Some people see tech like they see sports teams and they
| vote for their tribe without considering any other
| reason. I'm not shy stating my opinion even when it may
| invite these kinds of responses.
|
| I do think it is important for people to "do their own
| research" and not take one man's opinion as fact. I
| recommend people watch a few videos of Hassabis, there
| are many, and judge his character and intelligence for
| themselves. They may find they don't vibe with him and
| genuinely prefer Altman.
| throwup238 wrote:
| Nobody has really talked about what I think is an advantage
| just as powerful as the custom chips: Google Books. They
| already won a landmark fair use lawsuit against book
| publishers, digitized more books than anyone on earth, and used
| their Captcha service to crowdsource its OCR. They've got the
| best* legal cover and all of the best sources of human
| knowledge already there. Then Youtube for video.
|
| The chips of course push them over the top. I don't know how
| much Deep Research is costing them but it's by far the best
| experience with AI I've had so far with a generous 20/day rate
| limit. At this point I must be using up at least 5-10 compute
| hours a _day_. Until about a week ago I had almost completely
| written off Google.
|
| * For what it's worth, I don't know. IANAL
| dynm wrote:
| The amount of text in books is surprisingly finite. My best
| estimate was that there are ~1013 tokens available in all
| books (https://dynomight.net/scaling/#scaling-data), which is
| less than frontier models are already being trained on. On
| the other hand, book tokens are probably much "better" than
| random internet tokens. Wikipedia for example seems to get
| much higher weight than other sources, and it's only ~3x1010
| tokens.
| dr_dshiv wrote:
| We need more books! On it...
| paxys wrote:
| LibGen already exists, and all the top LLM publishers use it.
| I don't know if Google's own book index provides a big
| technical or legal advantage.
| peterjliu wrote:
| another advantage is people want the Google bot to crawl their
| pages, unlike most AI companies
| mmaunder wrote:
| This is an underrated comment. Yes it's a big advantage and
| probably a measurable pain point for Anthropic and OpenAI. In
| fact you could just do a 1% survey of robots.txt out there
| and get a reasonable picture. Maybe a fun project for an
| HN'er.
| jiocrag wrote:
| Excellent point. If they can figure out how to either
| remunerate or drive traffic to third parties in conjunction
| with this, it would be huge.
| stefan_ wrote:
| I don't know man, for months now people keep telling me on HN
| how "Google is winning", yet no normal person I ever asked
| knows what the fuck "Gemini" is. I don't know what they are
| winning, it might be internet points for all I know.
|
| Actually, some of the people polled recalled the Google AI
| efforts by their expert system recommending glue on pizza and
| smoking in pregnancy. It's a big joke.
| mmaunder wrote:
| Try uploading a bunch of PDF bank statements to notebooklm
| and ask it questions. Or the results of blood work. It's jaw
| dropping. e.g. uploaded 7 brokerage account statements as
| PDFs in a mess of formats and asked it to generate table
| summary data which it nailed, and then asked it to generate
| actual trades to go from current position to a new position
| in shortest path, and it nailed that too.
|
| Biggest issue we have when using notebooklm is a lack of
| ambition when it comes to the questions we're asking. And the
| pro version supports up to 300 documements.
|
| Hell, we uploaded the entire Euro Cyber Resilience Act and
| asked the same questions we were going to ask our big name
| legal firm, and it nailed every one.
|
| But you actually make a fair point, which I'm seeing too and
| I find quite exciting. And it's that even among my early
| adopter and technology minded friends, adoption of the most
| powerful AI tools is very low. e.g. many of them don't even
| know that notebookLM exists. My interpretation on this is
| that it's VERY early days, which is suuuuuper exciting for us
| builders and innovators here on HN.
| kube-system wrote:
| While there are some first-party B2C applications like chat
| front-ends built using LLMs, once mature, the end game is
| almost certainly that these are going to be B2B products
| integrated into other things. The future here goes a lot
| further than ChatGPT.
| shmoogy wrote:
| That was ages ago.
|
| Their new models excel at many things. Image editing, parsing
| PDFs, and coding are what I use it for. It's significantly
| cheaper than the closest competing models (Gemini 2.5 pro,
| and flash experimental with image generation).
|
| Highly recommend testing against openai and anthropic models
| - you'll likely be pleasantly surprised.
| labrador wrote:
| > If the only answer for both is "we'll build it from scratch",
| OpenAI is in very big trouble
|
| They could buy Google+ code from Google and resurrect it with
| OpenAI branding. Alternately they could partner with Bluesky
| parsimo2010 wrote:
| I don't think the issue is solving the technical
| implementation of a new social media platform. The issue is
| whether a new social media platform from OpenAI will deliver
| the kind of value that existing platforms deliver. If they
| promise investors that they'll get TikTok/Meta/YouTube levels
| of content+interaction (and all the data that comes with it),
| but deliver Mastodon levels, then they are in trouble.
| mark_l_watson wrote:
| Nice! Low price, even with reasoning enabled. I have been working
| on a short new book titled "Practical AI with Google: A Solo
| Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs" but
| with all of Google's recent announcements it might not be a short
| book.
| serjester wrote:
| Just ran it on one of our internal PDF (3 pages, medium
| difficulty) to json benchmarks:
|
| gemini-flash-2.0: 60 ish% accuracy 6,250 pages per dollar
|
| gemini-2.5-flash-preview (no thinking): 80 ish% accuracy 1,700
| pages per dollar
|
| gemini-2.5-flash-preview (with thinking): 80 ish% accuracy (not
| sure what's going on here) 350 pages per dollar
|
| gemini-flash-2.5: 90 ish% accuracy 150 pages per dollar
|
| I do wish they separated the thinking variant from the regular
| one - it's incredibly confusing when a model parameter
| dramatically impacts pricing.
| ValveFan6969 wrote:
| I have been having similar performance issues, I believe they
| intentionally made a worse model (Gemini 2.5) to get more money
| out of you. However, there is a way where you can make money
| off of Gemini 2.5.
|
| If you set the thinking parameter lower and lower, you can make
| the model spew absolute nonsense for the first response. It
| costs 10 cents per input / output, and sometimes you get a
| response that was just so bad your clients will ask for more
| and more corrections.
| zoogeny wrote:
| Google making Gemini 2.5 Pro (Experimental) free was a big deal.
| I haven't tried the more expensive OpenAI models so I can't even
| compare, only to the free models I have used of theirs in the
| past.
|
| Gemini 2.5 Pro is so much of a step up (IME) that I've become
| sold on Google's models in general. It not only is smarter than
| me on most of the subjects I engage with it, it also isn't
| completely obsequious. The model pushes back on me rather than
| contorting itself to find a way to agree.
|
| 100% of my casual AI usage is now in Gemini and I look forward to
| asking it questions on deep topics because it consistently
| provides me with insight. I am building new tools with the mind
| to optimize my usage to increase it's value to me.
| PerusingAround wrote:
| This comment is exactly my experience, I feel like as if I had
| wrote it myself.
| cjohnson318 wrote:
| Yeah, my wife pays for ChatGPT, but Gemini is fine enough for
| me.
| qwertox wrote:
| Just be aware that if you don't add a key (and set up
| billing) youre granting Google the right to train on your
| data. To have persons read them and decide how to use them
| for training.
| dr_kiszonka wrote:
| I was a big fan of that model but it has been replaced in AI
| Studio by its preview version, which, by comparison, is pretty
| bad. I hope Google makes the release version much closer to the
| experimental one.
| zoogeny wrote:
| I can confirm the model name in Run Settings has been updated
| to "Gemini 2.5 Pro Preview ..." when it used to be "Gemini
| 2.5 Pro (Experimental) ...".
|
| I cannot confirm if the quality is downgraded since I haven't
| had enough time with it. But if what you are saying is
| correct, I would be very sad. My big fear is the full-fat
| Gemini 2.5 Pro will be prohibitively expensive, but a dumbed
| down model (for the sake of cost) would also be saddening.
| jeeeb wrote:
| After comparing Gemini Pro and Claude Sonnet 3.7 coding answers
| side by side a few times, I decided to cancel my Anthropic
| subscription and just stick to Gemini.
| wcarss wrote:
| Google has killed so many amazing businesses -- entire
| industries, even, by giving people something expensive for
| free until the competition dies, and then they enshittify
| hard.
|
| It's cool to have access to it, but please be careful not to
| mistake corporate loss leaders for authentic products.
| JPKab wrote:
| True. They are ONLY good when they have competition. The
| sense of complacency that creeps in is so obvious as a
| customer.
|
| To this day, the Google Home (or is it called Nest now?)
| speaker is the only physical product i've ever owned where
| it lost features over time. I used to be able to play the
| audio of a Youtube video (like a podcast) through it, but
| then Google decided that it was very very important that I
| only be able to play a Youtube video through a device with
| a screen, because it is imperative that I see a still image
| when I play a longform history podcast.
|
| Obviously, this is a silly and highly specific example, but
| it is emblematic of how they neglect or enshittify massive
| swathes of their products as soon as the executive team
| loses interest and puts their A team on some shiny new
| object.
| mark_l_watson wrote:
| In this case, Google is a large investor in Anthropic.
|
| I agree that giving away access to expensive models long
| term is not a good idea on several fronts. Personally, I
| subscribe to Gemini Advanced and I pay for using the Gemini
| APIs.
| bredren wrote:
| [delayed]
| fsndz wrote:
| More and more people are coming to the realisation that Google
| is actually winning at the model level right now.
| minimaxir wrote:
| One hidden note from Gemini 2.5 Flash when diving deep into the
| documentation: for image inputs, not only can the model be
| instructed to generated 2D bounding boxes of relevant subjects,
| but it can also create segmentation masks!
| https://ai.google.dev/gemini-api/docs/image-understanding#se...
|
| At this price point with the Flash model, creating segmentation
| masks is pretty nifty.
|
| The segmentation masks are a bit of a galaxy brain implementation
| by generating a b64 string representing the mask:
| https://colab.research.google.com/github/google-gemini/cookb...
|
| I am trying to test it in AI Studio but it sometimes errors out,
| likely because it tries to decode the b64 lol.
| behnamoh wrote:
| Wait, did they just kill YOLO, at least for time-insensitive
| tasks?
| minimaxir wrote:
| YOLO is probably still cheaper if bounding boxes are your
| main goal. Good segmentation models that work for arbitrary
| labels, however, are much more expensive to set up and run,
| so this type of approach could be an interesting alternative
| depending on performance.
| daemonologist wrote:
| No, the speed of YOLO/DETR inference makes it cheap as well -
| probably at least five or six orders of magnitude cheaper.
| Edit: After some experimentation, Gemini also seems to not
| perform nearly as well as a purpose-tuned detection model.
|
| It'll be interesting to test this capability and see how it
| evolves though. At some point you might be able use it as a
| "teacher" to generate training data for new tasks.
| daemonologist wrote:
| Interestingly if you run this in Gemini (instead of AI Studio)
| you get: I am sorry, but I was unable to
| generate the segmentation masks for _ in the image due to an
| internal error with the tool required for this task.
|
| (Not sure if that's a real or hallucinated error.)
| ipsum2 wrote:
| The performance is basically so bad it's unusable though,
| segmentation models and object detection models are still the
| best, for now.
| msp26 wrote:
| I've had mixed results with the bounding boxes even on 2.5 pro.
| On complex images where a lot of boxes need to be drawn they're
| in the general region but miss the exact location of objects.
| simonw wrote:
| I spotted something interesting in the Python API library code:
|
| https://github.com/googleapis/python-genai/blob/473bf4b6b5a6...
| class ThinkingConfig(_common.BaseModel): """The
| thinking features configuration."""
| include_thoughts: Optional[bool] = Field(
| default=None, description="""Indicates whether to
| include thoughts in the response. If true, thoughts are returned
| only if the model supports thought and thoughts are available.
| """, ) thinking_budget: Optional[int] =
| Field( default=None,
| description="""Indicates the thinking budget in tokens.
| """, )
|
| That thinking_budget thing is documented, but what's the deal
| with include_thoughts? It sounds like it's an option to have the
| API return the thought summary... but I can't figure out how to
| get it to work, and I've not found documentation or example code
| that uses it.
|
| Anyone managed to get Gemini to spit out thought summaries in its
| API using this option?
| phillypham wrote:
| They removed the docs and support for it
| https://github.com/googleapis/python-
| genai/commit/af3b339a9d....
|
| You can see the thoughts in AI Studio UI as per
| https://ai.google.dev/gemini-api/docs/thinking#debugging-
| and....
| lemming wrote:
| I maintain an alternative client which I build from the API
| definitions at https://github.com/googleapis/googleapis, which
| according to https://github.com/googleapis/python-
| genai/issues/345 should be the right place. But neither the AI
| Studio nor the Vertex definitions even have ThinkingConfig yet
| - very frustrating. In general it's amazing how much API
| munging is required to get a working client from the public API
| definitions.
| qwertox wrote:
| In AI Studio the flash moddels has two toggles: Enable thinking
| and Set thinking budget. If thinking budget is enabled, you can
| set tue max number of tokens it can use to think, else it's
| Auto.
| Deathmax wrote:
| It is gated behind the GOOGLE_INTERNAL visibility flag, which
| only internal Google projects and Cursor have at the moment as
| far as I know.
| deanmoriarty wrote:
| Genuine naive question: when it comes to Google HN has generally
| a negative view of it (pick any random story on Chrome, ads,
| search, web, working at faang, etc. and this should be obvious
| from the comments), yet when it comes to AI there is a somewhat
| notable "cheering effect" for Google to win the AI race that goes
| beyond a conventional appreciation of a healthy competitive
| landscape, which may appear as a bit of a double standard.
|
| Why is this? Is it because OpenAI is seen as such a negative
| player in this ecosystem that Google "gets a pass on this one"?
|
| And bonus question: what do people think will happen to OpenAI if
| Google wins the race? Do you think they'll literally just go
| bust?
| antirez wrote:
| Maybe because Google is largely responsible, paying for the
| research, of most of the results we are seeing now. I'm not a
| Google fan, in the web side, and in their idea of what software
| engineering is, but they deserve to win the AI race, because
| right now all the other players provided a lot less than what
| Google did as public research. Also, with Gemini 2.5 PRO, there
| was a big hype moment, because the model is of unseen ability.
| 01100011 wrote:
| Didn't Google invent the transformer?
|
| I think a lot of us see Google as both an evil advertiser and
| as an innovator. Google winning AI is sort of nostalgic for
| those of us who once cheered the "Do No Evil"(now mostly "Do
| Know Evil") company.
|
| I also like how Google is making quiet progress while other
| companies take their latest incremental improvement and promote
| it as hard as they can.
| pkaye wrote:
| I think for a while some people felt the Google AI models are
| worse but now its getting much better. On the other hand Google
| has their own hardware so they can drive down the costs of
| using the models so it keeps pressure on Open AI do remain cost
| competitive. Then you have Anthropic which has very good models
| but is very expensive. But I've heard they are working with
| Amazon to build a data center with Amazons custom AI chips so
| maybe they can bring down their costs. In the end all these
| companies will need a good model and lower cost hardware to
| succeed.
| krembo wrote:
| How is this sustainable for Google from business POV? It feels
| like Google is shooting itself in the foot while "winning" the AI
| race.. From my experience I think Google lost 99% of the ads it
| used to show me before in the search engine.
| tomr75 wrote:
| someone else will do it if they don't
| jdthedisciple wrote:
| Very excited to try it, but it _is_ noteworthy that o4-mini is
| _strictly better_ according to the very benchmarks shown by
| Google here.
|
| Of course it's about 4x as expensive too (I believe), but still,
| given the release of openai/codex as well, o4-mini will remain a
| strong competitor for now.
___________________________________________________________________
(page generated 2025-04-17 23:00 UTC)