[HN Gopher] Gemini 2.5 Flash
       ___________________________________________________________________
        
       Gemini 2.5 Flash
        
       Author : meetpateltech
       Score  : 995 points
       Date   : 2025-04-17 19:03 UTC (1 days ago)
        
 (HTM) web link (developers.googleblog.com)
 (TXT) w3m dump (developers.googleblog.com)
        
       | xnx wrote:
       | 50% price increase from Gemini 2.0 Flash. That sounds like a lot,
       | but Flash is still so cheap when compared to other models of this
       | (or lesser) quality. https://developers.googleblog.com/en/start-
       | building-with-gem...
        
         | akudha wrote:
         | Is this cheaper than DeepSeek? Am I reading this right?
        
           | vdfs wrote:
           | Only if you don't use reasoning
        
         | Tiberium wrote:
         | del
        
           | Havoc wrote:
           | You may want to consult Gemini on those percentage calcs .10
           | to .15 is not 25%
        
         | swyx wrote:
         | done pretty much inline with the price elo pareto frontier
         | https://x.com/swyx/status/1912959140743586206/photo/1
        
           | xnx wrote:
           | Love that chart! Am I imagining that I saw a version of that
           | somewhere that even showed how the boundary has moved out
           | over time?
        
             | swyx wrote:
             | https://x.com/swyx/status/1882933368444309723
             | 
             | https://x.com/swyx/status/1830866865884991999 (scroll up)
        
           | oezi wrote:
           | So if I see it right flash 2.5 doesn't push the pareto front
           | forward, right? It just sits between 2.5 pro and 2.0 flash.
           | 
           | https://storage.googleapis.com/gweb-developer-goog-blog-
           | asse...
        
             | swyx wrote:
             | yeah but 1) its useful to have the point there on the curve
             | if you need it, 2) intelligence is multidimensional, maybe
             | in 2.5 flash you get qualitatively a better set of
             | capabilities for your needs than 2.5 pro
        
         | onlyrealcuzzo wrote:
         | Why isn't Phi-3, Llama 3, or Mistral in the comparison?
         | 
         | Aren't there a lot of hosted options? How do they compare in
         | terms of cost?
        
       | byefruit wrote:
       | It's interesting that there's a price nearly 6x price difference
       | between reasoning and no reasoning.
       | 
       | This implies it's not a hybrid model that can just skip reasoning
       | steps if requested.
       | 
       | Anyone know what else they might be doing?
       | 
       | Reasoning means contexts will be longer (for thinking tokens) and
       | there's an increase in cost to inference with a longer context
       | but it's not going to be 6x.
       | 
       | Or is it just market pricing?
        
         | vineyardmike wrote:
         | Based on their graph, it does look explicitly priced along
         | their "Pareto Frontier" curve. I'm guessing that is guiding the
         | price more than their underlying costs.
         | 
         | It's smart because it gives them room to drop prices later and
         | compete once other company actually get to a similar quality.
        
         | jsnell wrote:
         | > This implies it's not a hybrid model that can just skip
         | reasoning steps if requested.
         | 
         | It clearly is, since most of the post is dedicated to the
         | tunability (both manual and automatic) of the reasoning budget.
         | 
         | I don't know what they're doing with this pricing, and the blog
         | post does not do a good job explaining.
         | 
         | Could it be that they're not counting thinking tokens as output
         | tokens (since you don't get access to the full thinking trace
         | anyway), and this is the basically amortizing the thinking
         | tokens spend over the actual output tokens? Doesn't make sense
         | either, because then the user has no incentive to use anything
         | except 0/max thinking budgets.
        
         | RobinL wrote:
         | Does anyone know how this pricing works? Supposing I have a
         | classification prompt where I need the response to be a binary
         | yes/no. I need one token of output, but reasoning will
         | obviously add far more than 6 additional tokens. Is it still a
         | 6x price multiplier? That doesn't seem to make sense, but not
         | does paying 6x more for every token including reasoning ones
        
           | coder543 wrote:
           | "When you have thinking turned on, all output tokens
           | (including thoughts) are charged at the $3.50 / 1M rate"[0]
           | 
           | [0]: https://x.com/OfficialLoganK/status/1912981986085323231
        
       | punkpeye wrote:
       | This is cool, but rate limits on all of these preview models are
       | PITA
        
         | Layvier wrote:
         | Agreed, it's not even possible to run an eval dataset. If
         | someone from google see this please at least increase the burst
         | rate limit
        
           | punkpeye wrote:
           | It is not without rate limits, but we do have elevated limits
           | for our accounts through:
           | 
           | https://glama.ai/models/gemini-2.5-flash-preview-04-17
           | 
           | So if you just want to run evals, that should do it.
           | 
           | Though the first couple of days after a model comes out are
           | usually pretty rough because everyone try to run their evals.
        
             | punkpeye wrote:
             | What I am noticing with every new Gemini model that comes
             | out is that the time to first token (TTFT) is not great. I
             | guess it is because they gradually transfer computer power
             | from old models to new models as the demand increases.
        
               | Filligree wrote:
               | If you're imagining that 2.5Pro gets dynamically loaded
               | during the time to first token, then you're vastly
               | overestimating what's physically possible.
               | 
               | It's more likely a latency-throughput tradeoff. Your
               | query might get put inside a large batch, for example.
        
             | Layvier wrote:
             | That's very interesting, thanks for sharing!
        
       | arnaudsm wrote:
       | Gemini flash models have the least hype, but in my experience in
       | production have the best bang for the buck and multimodal
       | tooling.
       | 
       | Google is silently winning the AI race.
        
         | belter wrote:
         | > Google is silently winning the AI race.
         | 
         | That is what we keep hearing here...The last Gemini I cancelled
         | the account, and can't help notice the new one they are
         | offering for free...
        
           | arnaudsm wrote:
           | Sorry I was talking of B2B APIs for my YC startup. Gemini is
           | still far behind for consumers indeed.
        
             | JeremyNT wrote:
             | I use Gemini almost exclusively as a normal user. What am I
             | missing out on that they are far behind on?
             | 
             | It seems shockingly good and I've watched it get much
             | better up to 2.5 Pro.
        
               | arnaudsm wrote:
               | Mostly brand recognition and the earlier Geminis had more
               | refusals.
               | 
               | As a consumer, I also really miss the Advanced voice mode
               | of ChatGPT, which is the most transformative tech in my
               | daily life. It's the only frontier model with true audio-
               | to-audio.
        
               | wavewrangler wrote:
               | What do you mean miss? You don't have the budget to keep
               | something you truly miss for $20? What am in missing here
               | / I don't mean to criticize I am just curious is all. I
               | would reword but I have to go
        
               | what_ever wrote:
               | What is true audio-to-audio in this case?
        
               | jorvi wrote:
               | > and the earlier Geminis had more refusals.
               | 
               | Its more so that almost every company is running a
               | classifier on their web chat's output.
               | 
               | It isn't actually the model refusing, but rather if the
               | classifier hits a threshold, it'll swap the model's out
               | with "Sorry, let's talk about something else."
               | 
               | This is most apparent with DeepSeek. If you use their web
               | chat with V3 and then jailbreak it, you'll get uncensored
               | output but it is then swapped with "Let's talk about
               | something else" halfway through the output. And if you
               | ask the model, it has no idea its previous output got
               | swapped and you can even ask it build on its previous
               | answer. But if you use the API, you can push it pretty
               | far with a simple jailbreak.
               | 
               | These classifiers are virtually always ran on a separate
               | track, meaning you cannot jailbreak them.
               | 
               | If you use an API, you only have to deal with the
               | inherent training data bias, neutering by tuning and
               | neutering by pre-prompt. The last two are, depending on
               | the model, fairly trivial to overcome.
               | 
               | I still think the first big AI company that has the guts
               | to say "our LLM is like a pen and brush, what you write
               | or draw with it is on you" and publishes a completely
               | unneutered model will be the one to take a huge slice of
               | marketshare. If I had to bet on anyone doing that, it
               | would be xAI with Grok. And by not neutering it, the
               | model will perform better in SFW tasks too.
        
               | whistle650 wrote:
               | Have you tried the Gemini Live audio-to-audio in the free
               | Gemini iOS app? I find it feels far more natural than
               | ChatGPT Advanced Voice Mode.
        
               | Jensson wrote:
               | > and the earlier Geminis had more refusals.
               | 
               | You can turn off those, Google lets you decide how much
               | it censors you can completely turn it off.
               | 
               | It has separate sliders for sexually explicit, hate,
               | dangerous and harassment. It is by far the best at this,
               | since sometimes you want those refusals/filters.
        
             | int_19h wrote:
             | They used to be, but not anymore, not since Gemini Pro 2.5.
             | Their "deep research" offering is the best available on the
             | market right now, IMO - better than both ChatGPT and
             | Claude.
        
         | Layvier wrote:
         | Absolutely. So many use cases for it, and it's so
         | cheap/fast/reliable
        
           | danielbln wrote:
           | I want to use these almost too cheap to meter models like
           | Flash more, what are some interesting use cases for those?
        
           | SparkyMcUnicorn wrote:
           | And stellar OCR performance. Flash 2.0 is cheaper and more
           | accurate than AWS Textract, Google Document AI, etc.
           | 
           | Not only in benchmarks[0], but in my own production usage.
           | 
           | [0] https://getomni.ai/ocr-benchmark
        
         | Fairburn wrote:
         | Sorry, but no. Gemini isn't the fastest horse, yet. And it's
         | use within their ecosystem means it isn't geared to the masses
         | outside of their bubble. They are not leading the race but they
         | are a contender.
        
         | spruce_tips wrote:
         | i have a high volume task i wrote an eval for and was
         | pleasantly surprised at 2.0 flash's cost to value ratio
         | especially compared to gpt4.1-mini/nano
         | 
         | accuracy | input price | output price
         | 
         | Gemini Flash 2.0 Lite: 67% | $0.075 | $0.30
         | 
         | Gemini Flash 2.0: 93% | $0.10 | $0.40
         | 
         | GPT-4.1-mini: 93% | $0.40 | $1.60
         | 
         | GPT-4.1-nano: 43% | $0.10 | $0.40
         | 
         | excited to to try out 2.5 flash
        
           | jay_kyburz wrote:
           | Can I ask a serious question. What task are you writing where
           | its ok to get 7% error rate. I can't get my head around how
           | this can be used.
        
             | spruce_tips wrote:
             | low stakes text classification but it's something that
             | needs to be done and couldnt be done in reasonable time
             | frames or at reasonable price points by humans
        
             | omneity wrote:
             | In my case, I have workloads like this where it's possible
             | to verify the correctness of the result after inference, so
             | any success rate is better than 0 as it's possible to
             | identify the "good ones".
        
               | nonethewiser wrote:
               | Aren't you basically just saying you are able to measure
               | the error rate? I mean that's good, but already a given
               | in this scenario where hes reporting the 7% error rate.
        
               | jsnell wrote:
               | No. If you're able to verify correctness of individual
               | items of work, you can accept the 93% of verified items
               | as-is and send the remaining 7% to some more expensive
               | slow path.
               | 
               | That's very different from just knowing the aggregate
               | error rate.
        
               | yjftsjthsd-h wrote:
               | No, it's anything that's harder to write than verify. A
               | simple example is a logic puzzle; it's hard to come up
               | with a solution, but once you have a possible answer it's
               | really easy to check it. In fact, it can be easier to vet
               | _multiple_ answers and tell the machine to try again than
               | solve it once manually.
        
             | 16bytes wrote:
             | There are tons of AI/ML use-cases where 7% is acceptable.
             | 
             | Historically speaking, if you had a 15% word error rate in
             | speech recognition, it would generally be considered
             | useful. 7% would be performing well, and <5% would be near
             | the top of the market.
             | 
             | Typically, your error rate just needs to be below the
             | usefulness threshold and in many cases the cost of errors
             | is pretty small.
        
             | muzani wrote:
             | I expect some manual correction after the work is done. I
             | actually mentally counted all the times I pressed backspace
             | while writing this paragraph, and it comes down to 45. I'm
             | not counting the next paragraph or changing the number.
             | 
             | Humans make a ton of errors as well. I didn't even notice
             | how many I was making here until I started counting it. AI
             | is super useful to just write get a first draft out, not
             | for the final work.
        
             | sroussey wrote:
             | You could be OCRing a page that includes a summation line,
             | then add up all the numbers and check against the sum.
        
         | 42lux wrote:
         | The API is free, and it's great for everyday tasks. So yes
         | there is no better bang for the buck.
        
           | drusepth wrote:
           | Wait, the API is free? I thought you had to use their web
           | interface for it to be free. How do you use the API for free?
        
             | mlboss wrote:
             | using aistudio.google.com
        
             | spruce_tips wrote:
             | create an api key and dont set up billing. pretty low rate
             | limits and they use your data
        
             | dcre wrote:
             | You can get an API key and they don't bill you. Free tier
             | rate limits for some models (even decent ones like Gemini
             | 2.0 Flash) are quite high.
             | 
             | https://ai.google.dev/gemini-api/docs/pricing
             | 
             | https://ai.google.dev/gemini-api/docs/rate-limits#free-tier
        
               | NoahZuniga wrote:
               | The rate limits I've encountered with free api keys has
               | been way lower than the limits advertised.
        
               | jmacd wrote:
               | I agree. I found it unusable for anything but casual
               | usage due to the rate limiting. I wonder if I am just
               | missing something?
        
               | tempthrow wrote:
               | I think it's the small TPM limits. I'll be way under the
               | 10-30 requests per minute while using Cline, but it
               | appears that the input tokens count towards the rate
               | limit so I'll find myself limited to one message a minute
               | if I let the conversation go on for too long, ironically
               | due to Gemini's long context window. AFAIK Cline doesn't
               | currently offer an option to limit the context explosion
               | to lower than model capacity.
        
               | nolok wrote:
               | I'm pretty sure that's a google maps' level of free where
               | once in control they will massively bill it
        
               | dcre wrote:
               | There is no reason to expect the other entrants in the
               | market to drop out and give them monopoly power. The paid
               | tier is also among the cheapest. People say it's because
               | they built their own their inference hardware and are
               | genuinely able to serve it cheaper.
        
             | midasz wrote:
             | I use Gemini 2.5 pro experimental via openrouter in my
             | openwebui for free. Was using sonnet 3.7 but I don't notice
             | much difference so just default to the free thing now.
        
         | statements wrote:
         | Absolutely agree. Granted, it is task dependent. But when it
         | comes to classification and attribute extraction, I've been
         | using 2.0 Flash with huge access across massive datasets. It
         | would not be even viable cost wise with other models.
        
           | sethkim wrote:
           | How "huge" are these datasets? Did you build your own tooling
           | to accomplish this?
        
         | xnx wrote:
         | Shhhh. You're going to give away the secret weapon!
        
         | gambiting wrote:
         | In my experience they are as dumb as a bag of bricks. The other
         | day I asked "can you edit a picture if I upload one"
         | 
         | And it replied "sure, here is a picture of a photo editing
         | prompt:"
         | 
         | https://g.co/gemini/share/5e298e7d7613
         | 
         | It's like "baby's first AI". The only good thing about it is
         | that it's free.
        
           | JFingleton wrote:
           | Prompt engineering is a thing.
           | 
           | Learning how to "speak llm" will give you great results.
           | There's loads of online resources that will teach you. Think
           | of it like learning a new API.
        
             | abletonlive wrote:
             | for now. one would hope that this is a transitory moment in
             | llms and that we can just use intuition in the future.
        
             | asadotzler wrote:
             | LLM's whole thing is language. They make great translators
             | and perform all kinds of other language tasks well, but
             | somehow they can't interpret my English language prompts
             | unless I go to school to learn how to speak LLM-flavored
             | English?
             | 
             | WTF?
        
               | pplante wrote:
               | I like to think of my interactions with an LLM like I'm
               | explaining a request to a junior engineer or non
               | engineering person. You have to be more verbose to
               | someone who has zero context in order for them to execute
               | a task correctly. The LLM only has the context you
               | provided so they fail hard like a junior engineer would
               | at a complicated task with no experience.
        
               | pplante wrote:
               | I like to think of my interactions with an LLM like I'm
               | explaining a request to a junior engineer or non
               | engineering person. You have to be more verbose to
               | someone who has zero context in order for them to execute
               | a task correctly. The LLM only has the context you
               | provided so they fail hard like a junior engineer would
               | at a complicated task with no experience.
        
               | JFingleton wrote:
               | They are not humans - so yeah I can totally see having to
               | "go to school" to learn how to interact with them.
        
               | int_19h wrote:
               | It's a natural language processor, yes. It's not AGI. It
               | has numerous limitations that have to be recognized and
               | worked around to make use of it. Doesn't mean that it's
               | not useful, though.
        
               | th0ma5 wrote:
               | You have the right perspective. All of these people hand
               | waving away the core issue here don't realize their own
               | biases. Some of the best these things tout as much as 97%
               | accuracy on tasks but if a person was completely randomly
               | wrong at 3% of what they say you'd call an ambulance and
               | no doctor would be able to diagnose their condition (the
               | kinds of errors that people make with brain injuries are
               | a major diagnostic tool and the kinds of errors are known
               | for major types of common injuries ... Conversely there
               | is no way to tell within an LLM system if any specific
               | token is actually correct or not and its incorrectness is
               | not even categorizable.)
        
             | gambiting wrote:
             | This was using Gemini on my phone - which both Samsung and
             | Google advertise as "just talk to it".
        
           | ghurtado wrote:
           | > in my experience they are as dumb as a bag of bricks
           | 
           | In my experience, anyone that describes LLMs using terms of
           | actual human intelligence is bound to struggle using the
           | tool.
           | 
           | Sometimes I wonder if these people enjoy feeling "smarter"
           | when the LLM fails to give them what they want.
        
             | mdp2021 wrote:
             | If those people are a subset of those who demand actual
             | intelligence, they will very often feel frustrated.
        
           | nowittyusername wrote:
           | Its because google hasn't realized the value of training the
           | model on information about its own capabilities and metadata.
           | My biggest pet peeve about google and the way they train
           | these models.
        
         | rvz wrote:
         | Google always has been winning the AI race as soon as DeepMind
         | was properly put to use to develop their AI models, instead of
         | the ones that built Bard (Google AI team).
        
         | GaggiX wrote:
         | Flash models are really good even for an end user because how
         | fast and good performance they have.
        
         | ghurtado wrote:
         | I know it's a single data point, but yesterday I showed it a
         | diagram of my fairly complex micropython program, (including
         | RP2 specific features, DMA and PIO) and it was able to describe
         | in detail not just the structure of the program, but also
         | exactly what it does and how it does it. This is before seeing
         | a single like of code, just going by boxes and arrows.
         | 
         | The other AIs I have shown the same diagram to, have all
         | struggled to make sense of it.
        
         | redbell wrote:
         | > Google is silently winning the AI race
         | 
         | Yep, I agree! This convinced me:
         | https://news.ycombinator.com/item?id=43661235
        
         | ramesh31 wrote:
         | >"Google is silently winning the AI race."
         | 
         | It's not surprising. What was surprising honestly was how they
         | were caught off guard by OpenAI. It feels like in 2022 just
         | about all the big players had a GPT-3 level system in the works
         | internally, but SamA and co. knew they had a winning hand at
         | the time, and just showed their cards first.
        
           | wkat4242 wrote:
           | True and their first mover advantage still works pretty well.
           | Despite "ChatGPT" being a really uncool name in terms of
           | marketing. People remember it because they were the first to
           | wow them.
        
             | golergka wrote:
             | It feels more authentically engineer-coded.
        
             | kaoD wrote:
             | How is ChatGPT bad in terms of marketing? It's recognizable
             | and rolls off the tongue in many many many languages.
             | 
             | Gemini is what sucks from a marketing perspective. Generic-
             | ass name.
        
               | simonw wrote:
               | Generative Pre-trained Transformer is a horrible term to
               | have an acronym for.
        
               | kaoD wrote:
               | Do you think the mass market thinks GPT is an acronym?
               | It's just a name. Currently synonymous with AI.
               | 
               | Ask anyone outside the tech bubble about "Gemini" though.
               | You'll get astrology.
        
               | wkat4242 wrote:
               | True I guess they treat it just like SMS.
               | 
               | I still think they'd have taken off more if they'd given
               | it a catchy name from the start and made the interface a
               | bit more consumer friendly.
        
         | russellbeattie wrote:
         | I have to say, I never doubted it would happen. They've been at
         | the forefront of AI and ML for well over a decade. Their
         | scientists were the authors of the "Attention is all you need"
         | paper, among thousands of others. A Google Scholar search
         | produces endless results. There just seemed to be a disconnect
         | between the research and product areas of the company. I think
         | they've got that worked out now.
         | 
         | They're getting their ass kicked in court though, which might
         | be making them much less aggressive than they would be
         | otherwise, or at least quieter about it.
        
         | Nihilartikel wrote:
         | 100% agree. I had Gemini flash 2 chew through thousands of
         | points of nasty unstructured client data and it did a 'better
         | than human intern' level conversion into clean structured
         | output for about $30 of API usage. I am sold. 2.5 pro
         | experimental is a different league though for coding. I'm
         | leveraging it for massive refactoring now and it is almost
         | magical.
        
           | jdthedisciple wrote:
           | > thousands of points of nasty unstructured client data
           | 
           | What I always wonder in these kinds of cases is: What makes
           | you confident the AI actually did a good job since presumably
           | you haven't looked at the thousands of client data yourself?
           | 
           | For all you know it made up 50% of the result.
        
             | golergka wrote:
             | Many types of data have very easily checkable aggregates.
             | Think accounting books.
        
             | pamplemoose wrote:
             | You take a sample and check
        
             | tominous wrote:
             | In my case I had hundreds of invoices in a not-very-
             | consistent PDF format which I had contemporaneously tracked
             | in spreadsheets. After data extraction (pdftotext + OpenAI
             | API), I cross-checked against the spreadsheets, and for any
             | discrepancies I reviewed the original PDFs and old bank
             | statements.
             | 
             | The main issue I had was it was surprisingly hard to get
             | the model to consistently strip commas from dollar values,
             | which broke the csv output I asked for. I gave up on prompt
             | engineering it to perfection, and just looped around it
             | with a regex check.
             | 
             | Otherwise, accuracy was extremely good and it surfaced a
             | few errors in my spreadsheets over the years.
        
               | jofzar wrote:
               | I hope there is a future where csv comma's don't screw up
               | data. I know it will never happen but it's a nightmare.
               | 
               | Everyone has a story of a csv formatting nightmare
        
             | summerlight wrote:
             | Though the same logic can be applied to everywhere, right?
             | Even if it's done by human interns, you need to audit
             | everything to be 100% confident or just have some trust on
             | them.
        
               | andrei_says_ wrote:
               | Not the same logic because interns can make meaning out
               | of the data - that's built-in error correction.
               | 
               | They also remember what they did - if you spot one
               | misunderstanding, there's a chance they'll be able to
               | check all similar scenarios.
               | 
               | Comparing the mechanics of an LLM to human intelligence
               | shows deep misunderstanding of one, the other, or both -
               | if done in good faith of course.
        
               | summerlight wrote:
               | Not sure why you're trying to conflate intellectual
               | capability problems into this and complicate the
               | argument? The problem layout is the same. You delegate
               | the works to someone so you cannot understand all the
               | details. This makes a fundamental tension between trust
               | and confidence. Their parameters might be different due
               | to intellectual capability, but whoever you're going to
               | delegate, you cannot evade this trade-off.
               | 
               | BTW, not sure if you have experiences of delegating some
               | works to human interns or new grads and being rewarded by
               | disastrous results? I've done that multiple times and
               | don't trust anyone too much. This is why we typically
               | develop review processes, guardrails etc etc.
        
             | Nihilartikel wrote:
             | For what it's worth, I did check over many hundreds of
             | them. Formatted things for side by side comparison and
             | ordered by some heuristics of data nastiness.
             | 
             | It wasn't a one shot deal at all. I found the ambiguous
             | modalities in the data and hand corrected examples to
             | include in the prompt. After about 10 corrections and some
             | exposition about the cases it seemed to misundestand, it
             | got really good. Edit: not too different from a feedback
             | loop with an intern ;)
        
             | jofzar wrote:
             | It also depends on what you are using the data for, if it's
             | for non (precise) data based decisions then it's fine.
             | Specially if you looking for "vibe" based decisions before
             | then dedicating time to "actually" process the data for
             | confirmation.
             | 
             | 30$ to get an view into data that would take at least x
             | many hours of someone's time is actually super cheap,
             | specially if the decision of that result is then to invest
             | or not invest the x many hours to confirm it.
        
             | mediaman wrote:
             | This was solved a hundred years ago.
             | 
             | It's the same problem factories have: they produce a lot of
             | parts, and it's very expensive to put a full operator or
             | more on a machine to do 100% part inspection. And the
             | machines aren't perfect, so we can't just trust that they
             | work.
             | 
             | So starting in the 1920s Walter Shewhart and Edward Deming
             | came up with Statistical Process Control. We accept the
             | quality of the product produced based on the variance we
             | see of samples, and how they measure against upper and
             | lower control limits.
             | 
             | Based on that, we can estimate a "good parts rate" (which
             | later got used in ideas like Six Sigma to describe the
             | probability of bad parts being passed).
             | 
             | The software industry was built on determinism, but now
             | software engineers will need to learn the statistical
             | methods created by engineers who have forever lived in the
             | stochastic world of making physical products.
        
               | thawawaycold wrote:
               | I hope you're being sarcastic. SPC is necessary because
               | mechanical parts have physical tolerances and
               | manufacturing processes are affected by unavoidable
               | statistical variations; it is beyond idiotic to be
               | provided with a machine that can execute deterministic,
               | repeatable processes and then throw that all into the
               | gutter for mere convenience, justifying that simply
               | because "the time is ripe for SWE to learn statistics"
        
               | int_19h wrote:
               | We don't know how to implement a "deterministic,
               | repeatable process" that can look at a bug in a repo and
               | implement a fix end-to-end.
        
               | thawawaycold wrote:
               | that is not what OP was talking about though.
        
               | rorytbyrne wrote:
               | LLMs are literally stochastic, so the point is the same
               | no matter what the example application is.
        
               | warkdarrior wrote:
               | Humans are literally stochastic, so the point is the same
               | no matter what the example application is.
        
               | perching_aix wrote:
               | The deterministic, repeatable process of human (and now
               | machine) judgement and semantic processing?
        
             | visarga wrote:
             | In my professional opinion they can extract data at 85-95%
             | accuracy.
        
             | FooBarWidget wrote:
             | You can use AI to verify its own work. Last time I split a
             | C++ header file into header + implementation file. I
             | noticed some code got rewritten in a wrong manner, so I
             | asked it to compare the new implementation file against the
             | original header file, but to do so one method at a time.
             | For each method, say whether the code is exactly the same
             | and has the same behavior, ignoring superficial syntax
             | changes and renames. Took me a few times to get the prompt
             | right, though.
        
           | cdelsolar wrote:
           | what tool are you using 2.5-pro-exp through? Cline? Or the
           | browser directly?
        
             | Nihilartikel wrote:
             | For 2.5 pro exp I've been attaching files into AIStudio in
             | the browser in some cases. In others, I have been using
             | vscode's Gemini Code Assist which I believe recently
             | started using 2.5 Pro. Though at one point I noticed that
             | it was acting noticeably dumber, and over in the corner,
             | sure enough it warned that it had reverted to 2.0 due to
             | heavy traffic.
             | 
             | For the bulk data processing I just used the python API and
             | Jupyter notebooks to build things out, since it was a one-
             | time effort.
        
             | manmal wrote:
             | Copilot experimental (need VSCode Insiders) has it. I've
             | thought about trying aider ---watch-files though, also
             | works with multiple files.
        
           | roygbiv2 wrote:
           | Isn't it better to get gemini to create a tool to format the
           | data? Or was it in such a state that that would have been
           | impossible?
        
           | tcgv wrote:
           | > I'm leveraging it for massive refactoring now and it is
           | almost magical.
           | 
           | Can you share more about your strategy for "massive
           | refactoring" with Gemini?
           | 
           | Like the steps in general for processing your codebase, and
           | even your main goals for the refactoring.
        
         | no_wizard wrote:
         | I remember everyone saying its a two horse race between Google
         | and OpenAI, then DeepSeek happened.
         | 
         | Never count out the possibility of a dark horse competitor
         | ripping the sod right out from under
        
           | nonethewiser wrote:
           | How is deepseak doing though? It seemed like they probably
           | just ingested ChatGPT. https://www.forbes.com/sites/torconsta
           | ntino/2025/03/03/deeps...
           | 
           | Still impressive but would really put a cap on expectations
           | for them.
        
             | gs17 wrote:
             | They supposedly have a new R2 model coming within a month.
        
             | FooBarWidget wrote:
             | Everybody else also trains on ChatGPT data, have you never
             | heard of public ChatGPT conversation data sets? Yes they
             | trained on ChatGPT data. No it's not "just".
        
         | bhl wrote:
         | It's cheap but also lazy. It sometimes generates empty strings
         | or empty arrays for tool calls, and then I just re-route the
         | request to a stronger model for the tool call.
         | 
         | I've spent a lot of time on prompts and tool-calls to get Flash
         | models to reason and execute well. When I give the same context
         | to stronger models like 4o or Gemini 2.5 Pro, it's able to get
         | to the same answers in less steps but at higher token cost.
         | 
         | Which is to be expected: more guardrails for smaller, weaker
         | models. But then it's a tradeoff; no easy way to pick which
         | models to use.
         | 
         | Instead of SQL optimization, it's now model optimization.
        
         | paulcole wrote:
         | > Google is silently winning the AI race.
         | 
         | It's not clear to me what either the "race" or "winning" is.
         | 
         | I use ChatGPT for 99% of my personal and professional use. I've
         | just gotten used to the interface and quirks. It's a good
         | consumer product that I like to pay $20/month for and use. My
         | work doesn't require much in the way of monthly tokens but I
         | just pay for the OpenAI API and use that.
         | 
         | Is that winning? Becoming the de facto "AI" tool for consumers?
         | 
         | Or is the race to become what's used by developers inside of
         | apps and software?
         | 
         | The race isn't to have the best model (I don't think) because
         | it seems like the 3rd best model is very very good for many
         | people's uses.
        
       | transformi wrote:
       | Bad day is going on google.
       | 
       | First the decleration of illegal monopoly..
       | 
       | and now... Google's latest innovation: programmable overthinking.
       | 
       | With Gemini 2.5 Flash, you too can now set a thinking_budget--
       | because nothing says "state-of-the-art AI" like manually capping
       | how long it's allowed to reason. Truly the dream: debugging a
       | production outage at 2am wondering if your LLM didn't answer
       | correctly because you cheaped out on tokens. lol.
       | 
       | "Turn thinking off for better performance." That's not a model
       | config, that's a metaphor for Google's entire AI strategy lately.
       | 
       | At this point, Gemini isn't an AI product--it's a latency-cost-
       | quality compromise simulator with a text interface. Meanwhile,
       | OpenAI and Anthropic are out here just... cooking the benchmarks
        
         | danielbln wrote:
         | Google's Gemini 2.5 pro model is incredibly strong, it's en par
         | and at times better than Claude 3.7 in coding performance,
         | being able to ingest entire videos into the context is
         | something I haven seen elsewhere either. Google AI products
         | have been anywhere between bad (Bard) to lackluster (Gemini
         | 1.5), but 2.5 is a contender, in all dimensions. Google is also
         | the only player that owns the entire stack, from research,
         | software , data, compute hardware. I think they were slow to
         | start but they've closed the gap since.
        
         | bsmith wrote:
         | Using AI to debug code at 2am sounds like pure insanity.
        
           | mring33621 wrote:
           | the new normal
        
           | spiderice wrote:
           | They're suggesting you'll be up at 2am debugging code because
           | your AI code failed. Not that you'll be using AI to do the
           | debugging.
        
       | hmaxwell wrote:
       | I did some testing this morning:
       | 
       | Prompt: "can you find any mistakes on my codebase? I put one in
       | there on purpose" + 70,000 tokens of codebase where in one line I
       | have an include for a non-existent file.
       | 
       | Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race
       | condition in the api of the admin interface that would be
       | triggered if two admins were to change the room order at the same
       | time. Claude suggested I group all sql queries in a single
       | transaction. I looked at the code and found that it already used
       | a transaction for all queries. I said: the order_update api is
       | already done with a transaction. Claude replied: "You're
       | absolutely right, and I apologize for my mistake. I was incorrect
       | to claim there was a race condition issue. The transaction
       | ensures atomicity and consistency of the updates, and the SQL
       | queries are properly structured for their intended purpose."
       | 
       | Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin
       | ui javascript code that suggested a potential alternative to
       | event handler cleanup that was not implemented because I decided
       | to go with a cleaner route. Then asked "Is this the issue you
       | intentionally included, or would you like me to look for other
       | potential problems?" I said: "The comment merely suggests an
       | alternative, right?" claude said: "Yes, you're absolutely right.
       | The comment is merely suggesting an alternative approach that
       | isn't being used in the code, rather than indicating a mistake.
       | So there's no actual bug or mistake in this part of the code -
       | just documentation of different possible approaches. I apologize
       | for misinterpreting this as an issue!"
       | 
       | Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of
       | the database to generate QR codes in the admin interface, Claude
       | says that my code both attempts to generate QR codes with
       | undefined data AS WELL AS saying that my error handling skips
       | undefined data. Claude contradicts itself within 2 sentences.
       | When asking about clarification Claude replies: Looking at the
       | code more carefully, I see that the code actually has proper
       | error handling. I incorrectly stated that it "still attempts to
       | call generateQRCode()" in the first part of my analysis, which
       | was wrong. The code properly handles the case when there's no
       | data-room attribute.
       | 
       | Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional
       | error and said I should stop putting db creds/api keys into the
       | codebase.
       | 
       | Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional
       | error and said I should stop putting db creds/api keys into the
       | codebase.
       | 
       | Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional
       | error and said I should stop putting db creds/api keys into the
       | codebase.
       | 
       | o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you
       | submitted was too long, please reload the conversation and submit
       | something shorter."
        
         | Tiberium wrote:
         | The thread is about 2.5 Flash though, not 2.5 Pro. Maybe you
         | can try again with 2.5 Flash specifically? Even though it's a
         | small model.
        
           | dyauspitr wrote:
           | I don't particularly care about the non frontier models
           | though, I found the comment very useful.
        
         | airstrike wrote:
         | Have you tried Claude Code?
        
         | danielbln wrote:
         | Those responses are very Claude, to. 3.7 has powered our
         | agentic workflows for weeks, but I've been using almost only
         | Gemini for the last week and feel the output is better
         | generally. It's gotten much better at agentic workflows (using
         | 2.0 in an agent setup was not working well at all) and I prefer
         | its tuning over Clause's, more to the point and less
         | meandering.
        
         | rendang wrote:
         | 3 different answers in 3 tries for Claude? Makes me curious how
         | many times you'd get the same answer if you asked 10/20/100
         | times
        
         | bambax wrote:
         | > _codebase where in one line I have an include for a non-
         | existent file_
         | 
         | Ok but you don't need AI for this; almost any IDE will issue a
         | warning for that kind of error...
        
         | fandorin wrote:
         | how did you put your whole codebase in a prompt for gemini?
        
       | Workaccount2 wrote:
       | OpenAI might win the college students but it looks like Google
       | will lock in enterprise.
        
         | xnx wrote:
         | ChatGPT seems to have a name recognition / first-mover
         | advantage with college students now, but is there any reason to
         | think that will stick when today's high school students are
         | using Gemini on their Chromebooks?
        
         | gundmc wrote:
         | Funny you should say that. Google just announced today that
         | they are giving all college students one year of free Gemini
         | advanced. I wonder how much that will actually move the needle
         | among the youth.
        
           | Workaccount2 wrote:
           | My guess is that they will use it and still call it
           | "ChatGPT"...
        
             | xnx wrote:
             | Chat Gemini Pretrained Transformer
        
             | tantalor wrote:
             | Pass the Kleenex. Can I get a Band-Aid? Here's a Sharpie. I
             | need a Chapstick. Let me Xerox that. Toss me that Frisbee.
        
               | drob518 wrote:
               | Exactly.
        
               | esafak wrote:
               | Do you prefer those brands or just use their names? I
               | google stuff on Kagi...
        
           | drob518 wrote:
           | And every professor just groaned at the thought of having to
           | read yet another AI-generated term paper.
        
             | jay_kyburz wrote:
             | They should just get AI to mark them. I genuinely think
             | this is one thing AI would do better than humans.
        
               | mdp2021 wrote:
               | Grading papers definitely requires intelligence.
        
               | jay_kyburz wrote:
               | My partner marked a PHD thesis yesterday and there was a
               | spelling mistake in the title.
               | 
               | There is some level of analysis and feedback than an LLM
               | could provide before a human reviews it. Even if it's
               | just a fancy spelling checker.
        
               | mdp2021 wrote:
               | I'd like to burst into a post a number of the
               | unbelievable akin mishandlings of academic tasks I was
               | reported, but. I do have a number of prize-worthy
               | anecdotes that compete with yours. Nonetheless. Let us
               | fight farce with rigour.
               | 
               | Even when the tasks are not in-depth, but easier to
               | assess, you still require a /reliable evaluator/. LLMs
               | are not. Could they be at least employed as a virtual
               | assistant, "parse and suggest, then I'll check"? If so,
               | not randomly ("pick a bot"), but in full awareness of the
               | specific instrument. That stage is not here.
        
             | bufferoverflow wrote:
             | Take-home assignments are basically obsolete. Students who
             | want to cheat, can do so easily. Of course, in the end,
             | they cheat themselves, but that's not the point.
        
           | anovick wrote:
           | * Only in the U.S.
        
         | superfrank wrote:
         | Is there really lock in with AI models?
         | 
         | I built a product that uses and LLM and I got curious about the
         | quality of the output from different models. It took me a
         | weekend to go from just using OpenAI's API to having Gemini,
         | Claude, and DeepSeek all as options and a lot of that time was
         | research on what model from each provider that I wanted to use.
        
           | pydry wrote:
           | For enterprise practically any SaaS gets used as one more
           | thing to lock them into a platform they already have a
           | relationship with (either AWS, GCP or Azure).
           | 
           | It's actually pretty dangerous for the industry to have this
           | much vertical integration. Tech could end up like the car
           | industry.
        
             | superfrank wrote:
             | I'm aware of that. I'm an EM for a large tech company that
             | sells multiple enterprise SaaS product.
             | 
             | You're right that the lock in happens because of
             | relationships, but most big enterprise SaaS companies have
             | relationships with multiple vendors. My company
             | relationships with AWS, Azure, and GCP and we're currently
             | using products from all of them in different products. Even
             | on my specific product we're using all three.
             | 
             | When you've already got those relationships, the lock in is
             | more about switching costs. The time it takes to switch,
             | the knowledge needed to train people internally on the
             | differences after the switch, and the actual cost of the
             | new service vs the old one.
             | 
             | With AI models the time to switch from OpenAI to Gemini is
             | negligible and there's little retraining needed. If the
             | Google models (now or in the future) are comparable in
             | price and do a better job than OpenAI models, I don't see
             | where the lock in is coming from.
        
           | drob518 wrote:
           | There isn't much of a lock-in, and that's part of the problem
           | the industry is going to face. Everyone is spending gobs of
           | money on training and if someone else creates a better one
           | next week, the users can just swap it right in. We're going
           | to have another tech crash for AI companies, similar to what
           | happened in 2001 for .coms. Some will be winners but they
           | won't all be.
        
         | ein0p wrote:
         | How will it lock in the enterprise if its market share of
         | enterprise customers is half that of Azure (Azure also sells
         | OpenAI inference, btw), and one third that of AWS?
        
           | kccqzy wrote:
           | The same reason why people enjoy BigQuery enough that their
           | only use of GCP is BigQuery while they put their general
           | compute spend on AWS.
           | 
           | In other words, I believe talking about cloud market share as
           | a whole is misleading. One cloud could have one product
           | that's so compelling that people use that one product even
           | when they use other clouds for more commoditized products.
        
         | asadm wrote:
         | funny thing about younglings, they will migrate to something
         | else as fast as they came to you.
        
           | drob518 wrote:
           | I read about that on Facebook.
        
         | Oras wrote:
         | Enterprise has already been won by Microsoft (Azure), which
         | runs on OpenAI.
        
           | r00fus wrote:
           | That isn't what I'm seeing with my clientele (lots of
           | startups and mature non-tech companies). Most are using Azure
           | but very few have started to engage AI outside the periphery.
        
           | jimbob45 wrote:
           | Came to say this. No respectable CTO would ever push a Google
           | product to their superiors knowing Google will kill it in 1-3
           | years and they'll look foolish for having pushed it.
        
         | edaemon wrote:
         | It seems more and more like AI is less of a product and more of
         | a feature. Most people aren't going to care or even know about
         | the model or the company who made it, they're just going to use
         | the AI features built into the products they already use.
        
           | esafak wrote:
           | That's going to be true until we reach AGI, when there will
           | be a qualitative difference and we will lose our ability to
           | discern which is better since they're too far ahead of us.
        
       | statements wrote:
       | Interesting to note that this might be the only model with
       | knowledge cut off as recent as 2025 January
        
         | Tiberium wrote:
         | Gemini 2.5 Pro has the same knowledge cutoff specified, but in
         | reality on more niche topics it's still limited to ~middle of
         | 2024.
        
         | brightball wrote:
         | Isn't Grok 3 basically real time now?
        
           | Tiberium wrote:
           | That's the web version (which has tools like search plugged
           | in), other models in their official frontends (Gemini on
           | gemini.google.com, GPT/o models on chatgpt.com) are also
           | "real time". But when served over API, most of those models
           | are just static.
        
           | bearjaws wrote:
           | No LLM is real time, and in fact, even a 2025 cut off isn't
           | entirely realistic. Without guidance to say, a new version of
           | a framework it will frequently "reference" documentation from
           | old versions and use that.
           | 
           | It's somewhat real time when it searches the web, of course
           | that data is getting populated into context rather than in
           | training.
        
           | jiocrag wrote:
           | Not at all. The model weights and training data remain the
           | same, it's just RAG'ing real-time twitter data into its
           | context window when returning results. It's like a worse
           | version of Perplexity.
        
             | flashblaze wrote:
             | Why worse? Doesn't Grok also search the web along with
             | Twitter?
        
       | ein0p wrote:
       | Absolutely decimated on metrics by o4-mini, straight out of the
       | gate, and not even that much cheaper on output tokens (o4-mini's
       | thinking can't be turned off IIRC).
        
         | gundmc wrote:
         | It's good to see some actual competition on this price range! A
         | lot of Flash 2.5's edge will depend on how well the dynamic
         | reasoning works. It's also helpful to have _significantly_
         | lower input token cost for a large context use cases.
        
         | rfw300 wrote:
         | o4-mini does look to be a better model, but this is actually a
         | lot cheaper! It's ~7x cheaper for both input and output tokens.
        
           | ein0p wrote:
           | These small models only make sense with "thinking" enabled.
           | And once you enable that, much of the cost advantage
           | vanishes, for output tokens.
        
             | overfeed wrote:
             | > These small models only make sense with "thinking"
             | enabled
             | 
             | This entirely depends on your use-cases.
        
         | vessenes wrote:
         | o4-mini costs 8x as much as 2.5 flash. I believe its useful
         | context window is also shorter, although I haven't verified
         | this directly.
        
           | mccraveiro wrote:
           | 2.5 flash with reasoning is just 20% cheaper than o4-mini
        
             | vessenes wrote:
             | Good point: reasoning costs more. Also impossible to tell
             | without tests is how verbose the reasoning mode is
        
         | mupuff1234 wrote:
         | Not sure "decimated" is a fitting word for "slightly higher
         | performance on some benchmarks".
        
           | fwip wrote:
           | Perhaps they were using the original meaning of "one-tenth
           | destroyed." :P
        
           | ein0p wrote:
           | 66.8% error rate reduction for o4-mini on AIME2025, and 21%
           | error rate reduction on MMMU isn't "slightly higher". It'll
           | be quite noticeable in practice.
        
         | kfajdsl wrote:
         | Anecdotally o4-mini doesn't perform as well on video
         | understanding tasks in our pipeline, and also in Cursor it
         | seems really not great.
         | 
         | During one session, it read the same file (same lines) several
         | times, ran 'python -c 'print("skip!")'' for no reason, and then
         | got into another file reading loop. Then after asking a
         | hypothetical about the potential performance implications of
         | different ffmpeg flags, it claimed that it ran a test and
         | determined conclusively that one particular set was faster,
         | even though it hadn't even attempted a tool call, let alone
         | have the results from a test that didn't exist.
        
       | xbmcuser wrote:
       | For a non programmer like me google is becoming shockingly good.
       | It is giving working code the first time. I was playing around
       | with it asked it to write code to scrape some data of a website
       | to analyse. I was expecting it to write something that would
       | scrape the data and later I would upload the data to it to
       | analyse. But it actually wrote code that scraped and analysed the
       | data. It was basic categorizing and counting of the data but I
       | was not expecting it to do that.
        
         | kccqzy wrote:
         | That's the opposite experience of my wife who's in tech but
         | also a non programmer. She wanted to ask Gemini to write code
         | to do some basic data analysis things in a more automated way
         | than Excel. More than once, Gemini wrote a long bash script
         | where some sed invocations are just plain wrong. More than once
         | I've had to debug Gemini-written bash scripts. As a programmer
         | I knew how bash scripts aren't great for readability so I told
         | my wife to ask Gemini to write Python. It resulted in higher
         | code quality, but still contained bugs that are impossible for
         | a non programmer to fix. Sometimes asking a follow up about the
         | bugs would cause Gemini to fix it, but doing so repeatedly will
         | result in Gemini forgetting what's being asked or simply
         | throwing an internal error.
         | 
         | Currently IMO you have to be a programmer to use Gemini to
         | write programs effectively.
        
           | sbarre wrote:
           | I've found that good prompting isn't just about asking for
           | results but also giving hints/advice/direction on how to go
           | about the work.
           | 
           | I suspect that if Gemini is giving you bash scripts it's
           | because you're note giving it enough direction. As you
           | pointed out, telling it to use Python, or giving it more
           | expectations about how to go about the work or how the output
           | should be, will give better results.
           | 
           | When I am prompting for technical or data-driven work, I tend
           | to almost walk through what I imagine the process would be,
           | including steps, tools, etc...
        
           | xbmcuser wrote:
           | I had similar experiences few months back that is why I am
           | saying it is becoming shockingly good the 2.5 is a lot better
           | than the 2.0 version. Another thing I have realized just like
           | google search in the past your query has a lot to do with the
           | results you get. So an example of what you want works at
           | getting better results
        
             | ac29 wrote:
             | > I am saying it is becoming shockingly good the 2.5 is a
             | lot better than the 2.0 version
             | 
             | Are you specifically talking about 2.5 Flash? It only came
             | out an hour ago, I dont know how you would have enough
             | experience with it already to come to your conclusion.
             | 
             | (I am very impressed with 2.5 Pro, but that is a different
             | model that's been available for several weeks now)
        
               | xbmcuser wrote:
               | I am talking about 2.5 Pro
        
           | 999900000999 wrote:
           | Let's hope that's the case for a while.
           | 
           | I want to be able to just tell chat GPT or whatever to create
           | a full project for me, but I know the moment it can do that
           | without any human intervention, I won't be able to find a
           | job.
        
           | drob518 wrote:
           | IMO, the only thing that's consistent about AIs is how
           | inconsistent they are. Sometimes, I ask them to write code
           | and I'm shocked at how well it works. Other times, I feel
           | like I'm trying to explain to a 5-year-old Alzheimer's
           | patient what I want and it just can't seem to do the simplest
           | stuff. And it's the same AI in both cases.
        
             | greyadept wrote:
             | I wouldn't be surprised if AI tools are frequently
             | throttled in the backend to save on costs, resulting in
             | this type of inconsistency.
        
           | SweetSoftPillow wrote:
           | It must have something to do with the way your wife is
           | prompting. I've noticed this with my friends too. I usually
           | get working code from Gemini 2.5 Pro on the first try, and
           | with a couple of follow-up prompts, it often improves
           | significantly, while my friends seem to struggle
           | communicating their ideas to the AI and get worse results.
           | 
           | Good news: Prompting is a skill you can develop.
        
             | halfmatthalfcat wrote:
             | Or we can just learn to write it ourselves in the same
             | amount of time /shrug
        
               | viraptor wrote:
               | If you're going to need scripts like that every week -
               | sure. If you need it once a year on average... not
               | likely. There's a huge amount of things we could learn
               | but do them so infrequently that we outsource it to other
               | people.
        
               | rgoulter wrote:
               | Right.
               | 
               | This is one case where I've found writing code with LLMs
               | to be effective.
               | 
               | With some unfamiliar tool I don't care about too much
               | (e.g. GitHub Actions YAML or some build script), I just
               | want it to work, & then focus on other things.
               | 
               | I can spend time to try and come up with something that
               | works; something that's robust & idiomatic.. but, likely
               | I won't be able to re-use that knowledge before I forget
               | it.
               | 
               | With an LLM, I'll likely get just as good a result; or if
               | not, will have a good starting point to go from.
        
               | SweetSoftPillow wrote:
               | You can't.
        
               | halfmatthalfcat wrote:
               | Not with that attitude.
        
             | gregorygoc wrote:
             | Is there a website with off the shelf prompts that work?
        
           | Workaccount2 wrote:
           | There is definitely an art to doing it, but the ability is
           | definitely there even if you don't know the language at all.
           | 
           | I have a few programs now that are written in Python (2 by
           | 3.7, one by 2.5) used for business daily, and I can tell you
           | I didn't, and frankly couldn't, check a single line of code.
           | One of them is ~500 LOC, the other two are 2200-2700 LOC.
        
           | yakz wrote:
           | Ask it to write tests with the code and then ask it to fix
           | the errors from the tests rather than just pointing out bugs.
           | If you have an IDE that supports tool use (Claude Code, Roo
           | Code) it can automate this process.
        
           | jiggawatts wrote:
           | The AIs like many things out there work like an "evil genie".
           | They'll give you what you asked for. The problem is typically
           | that users ask for the wrong thing.
           | 
           | I've noticed beginners make mistakes like using singular
           | terms when they should have used plural ("find the bug" vs
           | "find the bugs"), or they fail to specify their preferred
           | platform, language, or approach.
           | 
           | You mentioned your wife is using Excel, which is primarily
           | used on Windows desktops and/or with the Microsoft ecosystem
           | of products such as Power BI, PowerShell, Azure, SQL Server,
           | etc...
           | 
           | Yet you mention she got a bash script using sed, both of
           | which are from the Linux / GNU ecosystem. That implies that
           | your wife didn't specify that she wanted a Microsoft-centric
           | solution to her problem!
           | 
           | The correct answer here would have likely to have been to use
           | Microsoft Fabric, which is an entire bag of data analysis and
           | reporting tools that has data pipelines, automation,
           | publishing, etc...
           | 
           | Or... just use the MashUp engine that's built-in to both
           | Excel and PowerBI, which allows a surprisingly complex set of
           | text, semi-structured, and tabular data processing. It can
           | re-run the import and update graphs and charts with the new
           | data.
           | 
           | PS: This is similar to going up to a Node.js programmer with
           | a request. It doesn't matter what it is, they will recommend
           | writing JavaScript to solve the problem. Similarly, a C++
           | developer will reach for C++ to solve everything they're
           | asked to do. Right now, the AIs strongly prefer Linux,
           | JavaScript, and especially Python for problem solving,
           | because that's the bulk of the open-source code they were
           | trained with.
        
           | dmos62 wrote:
           | Which Gemini was it? I've been using 2.5 Flash all day for
           | programming ClojureScript via roo code and it's been great.
           | Provided I'm using agent orchestration, a memory bank, and
           | having it write docs for code it will work on.
        
         | ant6n wrote:
         | Last time I tried Gemini, it messed with my google photo data
         | plan and family sharing. I wish I could try the AI separate
         | from my Google account.
        
           | jsnell wrote:
           | > I wish I could try the AI separate from my Google account.
           | 
           | If that's a concern, just create another account. Doesn't
           | even require using a separate browser profile, you can be
           | logged into multiple accounts at once and use the account
           | picker in the top right of most their apps to switch.
        
         | ModernMech wrote:
         | I've been continually disappointed. I've been told it's getting
         | exponentially better and we won't be able to keep up with how
         | good they get, but I'm not convinced. I'm using them every
         | single day and I'm never shocked or awed by its competence, but
         | instead continually vexxed that isn't not living up to the hype
         | I keep reading.
         | 
         | Case in point: there was a post here recently about
         | implementing a JS algorithm that highlighted headings as you
         | scrolled (side note: can anyone remember what the title was? I
         | can't find it again), but I wanted to test the LLM for that
         | kind of task.
         | 
         | Pretty much no matter what I did, I couldn't get it to give me
         | a solution that would highlight all of the titles down to the
         | very last one.
         | 
         | I knew what the problem was, but even guiding the AI, it
         | couldn't fix the code. I tried multiple AIs, different
         | strategies. The best I could come up with was to guide it step
         | by step on how to fix the code. Even telling it _exactly_ what
         | the problem was, it couldn 't fix it.
         | 
         | So this goes out to the "you're prompting it wrong" crowd...
         | Can you show me a prompt or a conversation that will get an AI
         | to spit out working code for this task: JavaScript that will
         | highlighting headings as you scroll, to the very last one. The
         | challenge is to prompt it to do this without telling it how to
         | implement it.
         | 
         | I figure this should be easy for the AI because this kind of
         | thing is very standard, but maybe I'm just holding it wrong?
        
           | jsnell wrote:
           | Even as a human programmer I don't actually understand your
           | description of the problem well enough to be confident I
           | could correctly guess your intent.
           | 
           | What do you mean by "highlight as you scroll"? I guess you
           | want a single heading highlighted at a time, and it should be
           | somehow depending on the viewport. But even that is
           | ambiguous. Do you want the topmost heading in the viewport?
           | The bottom most? Depending on scroll direction?
           | 
           | This is what I got one-shot from Gemini 2.5 Pro, with my best
           | guess at what you meant:
           | https://gemini.google.com/share/d81c90ab0b9f
           | 
           | It seems pretty good. Handles scrolling via all possible
           | ways, does the highlighting at load too so that the
           | highlighting is in effect for the initial viewport too.
           | 
           | The prompt was "write me some javascript that higlights the
           | topmost heading (h1, h2, etc) in the viewport as the document
           | is scrolled in any way".
           | 
           | So I'm thinking your actual requirements are very different
           | than what you actually wrote. That might explain why you did
           | not have much luck with any LLMs.
        
             | ModernMech wrote:
             | > Even as a human programmer I don't actually understand
             | your description of the problem well enough to be confident
             | I could correctly guess your intent.
             | 
             | Yeah, you understand what I meant. The code Gemini gave you
             | implements the behavior, and the AI I used gave me pretty
             | much the same thing. There's a problem with the algorithm
             | tho -- if there's a heading too close to the bottom of the
             | page it will never highlight. The page doesn't exhibit the
             | bug because it provides enough padding at the bottom.
             | 
             | But my point wasn't that it couldn't one-shot the code; my
             | point was that I couldn't interrogate it into giving me
             | code that behaved as I wanted. It seemed too anchored to
             | the solution it had provided me, where it said it was
             | offering fixes that didn't do anything, and when I pointed
             | that out it apologized and proceeded to lie about fixing
             | the code again. It appeared to be an infinite loop.
             | 
             | I think what's happened here is the opposite of what you
             | suggest; this is a very common tutorial problem, you can
             | find solutions of the variety you showed me all over the
             | internet, and that's essentially what Gemini gave you. But
             | being tutorial code, it's very basic and tries not to
             | implement a more robust solution that is needed in
             | production websites. When I asked AI for that extra
             | robustness, it didn't want to stray too far from the
             | template, and the bug persisted.
             | 
             | Maybe you can coax it into getting a better result? I want
             | to understand how.
        
               | jsnell wrote:
               | I clearly didn't understand what you meant, because you
               | did in fact have additional unstated requirements that I
               | could not even have imagined existed and were not in any
               | way hinted at by your initial spec.
               | 
               | And I still don't know what you want! Like, you want some
               | kind of special case where the last heading is handled
               | differently. But what kind of special case? You didn't
               | specify. "It's wrong, fix it".
               | 
               | Fix it how? When the page is scrolled all the way to the
               | bottom, should the last heading always be highlighted?
               | That would just move the complaint to the second heading
               | from the bottom if three headings fit on the last screen.
               | Add padding? Can't be that, since it's exactly what this
               | solution already did and you thought it wasn't good
               | enough.
               | 
               | Sorry, I will not be playing another round of this. I
               | don't know if you don't realize how inadequate your
               | specifications are (in which case that's your problem
               | with the LLMs too), or if this is some kind of a bit, but
               | either way it doesn't feel like a good use of my time.
               | 
               | But if your problem is that the LLMs give a bad initial
               | answer, get anchored on it, and can't iterate, just give
               | all the requirements up front. If they're requirements
               | you didn't realize existed until you saw the proposed
               | answer, just start again with a fresh context. That tends
               | to work well for me in Aider.
        
               | ModernMech wrote:
               | > I clearly didn't understand what you meant, because you
               | did in fact have additional unstated requirements
               | 
               | Okay, but the AI understood the requirements; It
               | confirmed to me what I intended it to do, but it couldn't
               | produce code that met its textual descriptions. It kept
               | returning the tutorial code.
               | 
               | > You didn't specify. "It's wrong, fix it". Fix it how?
               | 
               | Maybe I wasn't clear here but I'm not replying as if I'm
               | prompting you like an AI. The problem domain is described
               | better in the link in the sibling comment. When the AI
               | gave back the initial code, I had inquired the following:
               | how does this work when it gets to the bottom. Let's say
               | I have a heading at the very bottom of the page. Will the
               | "active" one be the bottom one or the top?
               | 
               | It replied:                 What happens at the bottom of
               | the page?        When you're near the bottom of the page
               | and the last section (section3, say) starts to enter the
               | top of the viewport (within 100px), it becomes the active
               | one -- exactly as you'd want.            But here's the
               | catch: if the last section is short, or you scroll all
               | the way to the very bottom (past the heading), then no
               | section's top may be <= 100px, especially if the last
               | heading has already scrolled off the top. In that case,
               | the last matched section (likely the one before the last)
               | stays active, which feels wrong.            We can fix
               | this either by:       Making the threshold smarter (e.g.,
               | using .bottom or checking if any part of the section is
               | visible).       Or better: use IntersectionObserver to
               | handle this robustly.             Why this is better:
               | It triggers when a section is entering the viewport
               | (around 30% from top).       Works even when you scroll
               | to the very bottom.       It's less sensitive to section
               | height or viewport size.
               | 
               | So it identified the edge case, it identified that the
               | behavior is incorrect and what the cause of that is, and
               | it returned code that purportedly fixed this. But the
               | code it returned exhibited exactly the behavior it said
               | "feels wrong". And in interrogating it. I asked it what
               | was broken and we went line by line:                 Show
               | me exactly what was changed from this to the last which
               | fixed the problem            Perfect -- let's do a line-
               | by-line diff-style breakdown of what changed between the
               | buggy version and the fixed version so we can pinpoint
               | exactly what fixed the issue.
               | 
               | We went line by line and it told me what exactly was
               | wrong and why it's fixed, and confirmed that the provided
               | code produced the expected behavior.                 Why
               | this works:       We evaluate all visible headings, not
               | just ones intersecting a line.       We pick the one
               | that's:         just above the activation line, or
               | just below it, if none are above       Handles edge cases
               | like top/bottom of scroll
               | 
               | But the code doesn't do this. It continued on like this
               | where it proposed fixes, talked about the solution
               | correctly, but wouldn't give code that implemented the
               | solution.
               | 
               | > But if your problem is that the LLMs give a bad initial
               | answer, get anchored on it, and can't iterate, just give
               | all the requirements up front. If they're requirements
               | you didn't realize existed until you saw the proposed
               | answer, just start again with a fresh context. That tends
               | to work well for me in Aider.
               | 
               | Yeah that's what I tend to do as well. I don't tend to
               | get good satisfying results though, to the point where
               | coding it myself seems like the faster more reliable
               | option. I'll keep trying to hold it better and maybe one
               | day it'll work for me. Until then I'm a skeptic.
        
           | croemer wrote:
           | "Overengineered anchor links":
           | https://news.ycombinator.com/item?id=43570324
        
             | ModernMech wrote:
             | Thank you!!
        
       | __alexs wrote:
       | Does billing for the API actually work properly yet?
        
       | alecco wrote:
       | Gemini models are very good but in my experience they tend to
       | overdo the problems. When I give it things for context and
       | something to rework, Gemini often reworks the problem.
       | 
       | For software it is barely useful because you want small commits
       | for specific fixes not a whole refactor/rewrite. I tried many
       | prompts but it's hard. Even when I give it function signatures of
       | the APIs the code I want to fix uses, Gemini rewrites the API
       | functions.
       | 
       | If anybody knows a prompt hack to avoid this, I'm all ears.
       | Meanwhile I'm staying with Claude Pro.
        
         | byearthithatius wrote:
         | Yes, it will add INSANE amounts of "robust error handling" to
         | quick scripts where I can be confident about assumptions. This
         | turns my clean 40 lines of Python where I KNOW the JSONL I am
         | parsing is valid into 200+ lines filled with ten new try except
         | statements. Even when I tell it not to do this, it loves to
         | "find and help" in other ways. Quite annoying. But overall it
         | is pretty dang good. It even spotted a bug I missed the other
         | day in a big 400+ line complex data processing file.
        
           | zhengyi13 wrote:
           | I wonder how much of that sort of thing is driven by having
           | trained their models on their own internal codebases? Because
           | if that's the case, careful and defensive being the default
           | would be unsurprising.
        
           | stavros wrote:
           | I didn't realize this was a bigger trend, I asked it to write
           | a simple testing script that POSTed a string to a local HTTP
           | server as JSON, and it wrote a 40 line script, handling any
           | possible error. I just wanted two lines.
        
             | jug wrote:
             | Yes, as late as earlier today, I asked it to provide
             | "naive" code which helped a bit.
        
             | free_energy_min wrote:
             | same issue here! isn't even helpful because if the code
             | isn't working i want it to fail, not just skip over errors
        
         | dherikb wrote:
         | I have the same issue using it with Aider.
         | 
         | The model is good to solve problems, but is very difficult to
         | control the unnecessary changes that the model does in the rest
         | of the code. Also it adds a lot of unnecessary comments, even
         | when I explicitly say to not add.
         | 
         | For now Deepseek R1 and V3 it's working better to me, producing
         | more predictable results and capturing better my intentions
         | (not tried Claude yet).
        
         | w4yai wrote:
         | Here's what I found to be working (not 100% but it gives much
         | better and consistant results)
         | 
         | Basically, I ask it to repeat at the start of each message some
         | rules :
         | 
         | "From now on, you must repeat and comply the following rules at
         | the top of all your messages onwards:
         | 
         | - I will never rewrite API functions. Even if I think it's a
         | good idea, it is a bad idea. I will keep the API function as it
         | is and it is perfect like that.
         | 
         | - I will never add extra input validation. Even if I think it's
         | a good idea, it is a bad idea. I will keep the function without
         | validation and it is perfect like that.
         | 
         | - ...
         | 
         | - If I violate any of those rules, I did a bad job. "
         | 
         | Forcing it to repeat things make the model output more aligned
         | and focused in my experience.
        
       | ks2048 wrote:
       | If this announcement is targeting people not up-to-date on the
       | models available, I think they should say what "flash" means. Is
       | there a "Gemini (non-flash)"?
       | 
       | I see the 4 Google model names in the chart here. Are these 4 the
       | main "families" of models to choose from?
       | 
       | - Gemini-Pro-Preview
       | 
       | - Gemini-Flash-Preview
       | 
       | - Gemini-Flash
       | 
       | - Gemini-Flash-Lite
        
         | mwest217 wrote:
         | Gemini has had 4 families of models, in order of decreasing
         | size:
         | 
         | - Ultra
         | 
         | - Pro
         | 
         | - Flash
         | 
         | - Flash-Lite
         | 
         | Versions with `-Preview` at the end haven't had their "official
         | release" and are technically in some form of "early access"
         | (though I'm not totally clear on exactly what that means given
         | that they're fully available and as of 2.5 Pro Preview, have
         | pricing attached to them - earlier versions were free during
         | Preview but had pretty strict rate limiting but now it seems
         | that Preview models are more or less fully usable).
        
           | drob518 wrote:
           | Is GMail still in beta?
        
             | mring33621 wrote:
             | so Sigma...
        
           | jsnell wrote:
           | The free-with-small-rate-limits designator was
           | "experimental", not "preview".
           | 
           | I _think_ the distinction between preview and full release is
           | that the preview models have no guarantees on how long they
           | 'll be available, the full release comes with a pre-set
           | discontinuation date. So if want the stability for a
           | production app, you wouldn't want to use a preview model.
        
       | AStonesThrow wrote:
       | I've been leveraging the services of 3 LLMs, mainly: Meta,
       | Gemini, and Copilot.
       | 
       | It depends on what I'm asking. If I'm looking for answers in the
       | realm of history or culture, religion, or I want something
       | creative such as a cute limerick, or a song or dramatic script,
       | I'll ask Copilot. Currently, Copilot has two modes: "Quick
       | Answer"; or "Think Deeply", if you want to wait about 30 seconds
       | for a good answer.
       | 
       | If I want info on a product, a business, an industry or a field
       | of employment, or on education, technology, etc., I'll inquire of
       | Gemini.
       | 
       | Both Copilot and Gemini have interactive voice conversation
       | modes. Thankfully, they will also write a transcript of what we
       | said. They also eagerly attempt to engage the user with further
       | questions and followups, with open questions such as "so what's
       | on your mind tonight?"
       | 
       | And if I want to know about pop stars, film actors, the social
       | world or something related to tourism or recreation in general, I
       | can ask Meta's AI through [Facebook] Messenger.
       | 
       | One thing I found to be extremely helpful and accurate was
       | Gemini's tax advice. I mean, it was way better than human beings
       | at the entry/poverty level. Commercial tax advisors, even when
       | I'd paid for the Premium Deluxe Tax Software from the Biggest
       | Name, they just went to Google stuff for me. I mean, they didn't
       | even seem to know where stuff was on irs.gov. When I asked for a
       | virtual or phone appointment, they were no-shows, with a litany
       | of excuses. I visited 3 offices in person; the first two were
       | closed, and the third one basically served Navajos living off the
       | reservation.
       | 
       | So when I asked Gemini about tax information -- simple stuff like
       | the terminology, definitions, categories of income, and things
       | like that -- Gemini was perfectly capable of giving lucid
       | answers. And citing its sources, so I could immediately go find
       | the IRS.GOV publication and read it "from the horse's mouth".
       | 
       | Oftentimes I'll ask an LLM just to jog my memory or inform me of
       | what specific terminology I should use. Like "Hey Gemini, what's
       | the PDU for Ethernet called?" and when Gemini says it's a "frame"
       | then I have that search term I can plug into Wikipedia for
       | further research. Or, for an introduction or overview to topics
       | I'm unfamiliar with.
       | 
       | LLMs are an important evolutionary step in the general-purpose
       | "search engine" industry. One problem was, you see, that it was
       | dangerous, annoying, or risky to go Googling around and click on
       | all those tempting sites. Google knew this: the dot-com sites and
       | all the SEO sites that surfaced to the top were traps, they were
       | bait, they were sometimes legitimate scams. So the LLM providers
       | are showing us that we can stay safe in a sandbox, without
       | clicking external links, without coughing up information about
       | our interests and setting cookies and revealing our IPv6
       | addresses: we can safely ask a local LLM, or an LLM in a trusted
       | service provider, about whatever piques our fancy. And I am glad
       | for this. I saw y'all complaining about how every search engine
       | was worthless, and the Internet was clogged with blogspam, and
       | there was no real information anymore. Well, perhaps LLMs, for
       | now, are a safe space, a sandbox to play in, where I don't need
       | to worry about drive-by-zero-click malware, or being inundated
       | with Joomla ads, or popups. For now.
        
       | cynicalpeace wrote:
       | 1. The main transformative aspect of LLMs has been in writing
       | code.
       | 
       | 2. LLMs have had less transformative aspects in 2025 than we
       | anticipated back in late 2022.
       | 
       | 3. LLMs are unlikely to be very transformative to society, even
       | as their intelligence increases, because intelligence is a minor
       | changemaker in society. Bigger changemakers are motivation,
       | courage, desire, taste, power, sex and hunger.
       | 
       | 4. LLMs are unlikely to develop these more important traits
       | because they are trained on text, not evolved in a rigamarole of
       | ecological challenges.
        
       | charcircuit wrote:
       | 500 RPD for the free tier is good enough for my coding needs.
       | Nice.
        
       | AbuAssar wrote:
       | I noticed that OpenAI don't compare their models to third party
       | models in their announcement posts, unlike google, meta and the
       | others.
        
         | jskherman wrote:
         | They're doing the Apple strategy. Less spotlight for other
         | third parties, and less awareness how they're lagging behind so
         | that those already ignorantly locked into OpenAI would not
         | switch. But at this point why would anyone do that when
         | switching costs are low?
        
       | mmaunder wrote:
       | More great innovation from Google. OpenAI have two major
       | problems.
       | 
       | The first is Google's vertically integrated chip pipeline and
       | deep supply chain and operational knowledge when it comes to
       | creating AI chips and putting them into production. They have a
       | massive cost advantage at every step. This translates into more
       | free services, cheaper paid services, more capabilities due to
       | more affordable compute, and far more growth.
       | 
       | Second problem is data starvation and the unfair advantage that
       | social media has when it comes to a source of continually
       | refreshed knowledge. Now that the foundational model providers
       | have churned through the common crawl and are competing to
       | consume things like video and whatever is left, new data is
       | becoming increasingly valuable as a differentiator, and more
       | importantly, as a provider of sustained value for years to come.
       | 
       | SamA has signaled both of these problems when he made noises
       | about building a fab a while back and is more recently making
       | noises about launching a social media platform off OpenAI. The
       | smart money among his investors know these issues to be
       | fundamental in deciding if OAI will succeed or not, and are
       | asking the hard questions.
       | 
       | If the only answer for both is "we'll build it from scratch",
       | OpenAI is in very big trouble. And it seems that that is the best
       | answer that SamA can come up with. I continue to believe that
       | OpenAI will be the Netscape of the AI revolution.
       | 
       | The win is Google's for the taking, if they can get out of their
       | own way.
        
         | jbverschoor wrote:
         | Except that they train their model even when you pay. So yeah..
         | I'd rather not use their "evil"
        
           | dayvigo wrote:
           | Source?
        
             | throwaway314155 wrote:
             | It's right there in the comment.
        
           | mkl wrote:
           | This is false: https://ai.google.dev/gemini-api/terms
        
         | Keyframe wrote:
         | Google has the data and has the hardware, not to mention
         | software and infrastructure talent. Once this Bismarck turns
         | around and it looks like it is, who can parry it for real? They
         | have internet.zip and all the previous versions as well, they
         | have youtube, email, search, books, traffic, maps and business
         | on it, phones and habits around it, even the OG social network,
         | the usenet. It's a sleeping giant starting to wake up and it's
         | already causing commotion, let's see what it does when it
         | drinks morning coffee.
        
           | kriro wrote:
           | Agreed. One of Google's big advantages is the data access and
           | integrations. They are also positioned really well for the
           | "AI as entertainment" sector with youtube which will be huge
           | (imo). They also have the knowledge in adtech and well
           | injecting adds into AI is an obvious play. As is harvesting
           | AI chat data.
           | 
           | Meta and Google are the long term players to watch as Meta
           | also has similar access (Insta, FB, WhatsApp).
        
             | whoisthemachine wrote:
             | On-demand GenAI could definitely change the meaning of
             | "You" in "Youtube".
        
           | eastbound wrote:
           | They have the Excel spreadsheets of all startups and
           | businesses of the world (well 50/50 with Microsoft).
           | 
           | And Atlassian has all the project data.
        
             | Keyframe wrote:
             | I still can't understand how google missed on github,
             | especially since they were in the same space before with
             | google code. I do understand how they couldn't make a
             | github though.
        
             | jjani wrote:
             | More like 5/95 with Microsoft - and that's being generous,
             | I wouldn't be surprised if it was 1/99. It's basicaly just
             | hip tech companies and a couple of Fortune 500s that use
             | Google Docs. And even their finance departments often use
             | Excel. HN keeps underestimating how the whole physical
             | world runs on Excel.
        
         | whyenot wrote:
         | Another advantage that Google has is the deep integration of
         | Gemini into Google Office products and Gmail. I was part of a
         | pilot group and got to use a pre-release version and it's
         | really powerful and not something that will be easy for OpenAI
         | to match.
        
           | mmaunder wrote:
           | Agreed. Once they dial in the training for sheets it's going
           | to be incredible. I'm already using notebooklm to upload
           | finance PDFs, then having it generate tabular data and
           | copypasta into sheets, but it's a garage solution compared to
           | just telling it to create or update a sheet with parsed data
           | from other sheets, PDFs, docs, etc.
           | 
           | And as far as gmail goes, I periodically try to ask it to
           | unsubscribe from everything marketing related, and not from
           | my own company, but it's not even close to being there. I
           | think there will continue to be a gap in the market for more
           | aggressive email integration with AI, given how useless email
           | has become. I know A16Z has invested in a startup working on
           | this. I doubt Gmail will integrate as deep as is possible, so
           | the opportunity will remain.
        
           | Workaccount2 wrote:
           | I frankly am in doubt of future office products. In the last
           | month I have ditched two separate excel productivity
           | templates in favor of bespoke wrappers on sqlite databases,
           | written by Claude and Gemini. Easier to use and probably 10x
           | as fast.
           | 
           | You don't need a 50 function swiss army knife when your
           | pocket can just generate the exact tool you need.
        
           | jdgoesmarching wrote:
           | You say deep integration, yet there is still no way to send a
           | Gemini Canvas to Docs without a lot of tedious copy-pasting
           | and formatting because Docs still doesn't actually support
           | markdown. Gemini in Google Office in general has been a
           | massive disappointment for all but the most simplistic of
           | writing tasks.
           | 
           | They can have the most advanced infrastructure in the world,
           | but it doesn't mean much if Google continues its infamous
           | floundering approach to product. But hey, 2.5 pro with Cline
           | is pretty nice.
        
             | whyenot wrote:
             | Maybe I'm misunderstanding, but there is literally a Share
             | button in Canvas right below each response with the option
             | to export to Docs. Within Docs, you can also click on the
             | Gemini "star" at the upper right to get a prompt and then
             | also export into the open document. Note that this is a
             | with "experimental" Gemini 2.5 Pro.
        
             | disgruntledphd2 wrote:
             | Docs supports markdown in comments, where it's the only way
             | to get formatting.
             | 
             | I love Googles product dysfunction sometimes :/
        
           | chucky_z wrote:
           | I have access to this now and I want it to work so bad and
           | it's just proper shit. Absolute rubbish.
           | 
           | They really, truly need to fix this integration. Gemini in
           | Google Docs is barely acceptable, it doesn't work at all (for
           | me) in Gmail, and I've not yet had it do _anything_ other
           | than error in Google Sheets.
        
         | zoogeny wrote:
         | If the battle was between Altman and Pichai I'd have my doubts.
         | 
         | But the battle is between Altman and Hassabis.
         | 
         | I recall some advice on investment from Buffett regarding how
         | he invests in the management team.
        
           | mdp2021 wrote:
           | Could you please expand, on both your points?
        
             | zoogeny wrote:
             | It is more gut feel than a rational or carefully reasoned
             | argument.
             | 
             | I think Pichai has been an exceptional revenue maximizer
             | but he lacks vision. I think he is probably capable of
             | squeezing tremendous revenue out of AI once it has been
             | achieved.
             | 
             | I like Hassabis in a "good vibe" way when I hear him speak.
             | He reminds me of engineers that I have worked with
             | personally and have gained my respect. He feels less like a
             | product focused leader and more of a research focused
             | leader (AlphaZero/AlphaFold) which I think will be critical
             | to continue the advances necessary to push the envelope. I
             | like his focus on games and his background in RL.
             | 
             | Google's war chest of Ad money gives Hassabis the
             | flexibility to invest in non-revenue generating directions
             | in a way that Altman is unlikely to be able to do. Altman
             | made a decision to pivot the company towards product which
             | led to the exodus of early research talent.
        
               | sumedh wrote:
               | > Altman made a decision to pivot the company towards
               | product which led to the exodus of early research talent.
               | 
               | Who was going to fund the research though?
        
               | zoogeny wrote:
               | Fair point, and a good reminder not to pass judgement on
               | the actions of others. It is totally possible that Altman
               | made his own prediction of the future and theorized that
               | the only hope he had of competing with the existing big
               | tech companies to realistically achieve an AI for the
               | masses was to show investors a path to profitability.
               | 
               | I should also give Altman a bit more due in that I find
               | his description of a world augmented by powerful AI to be
               | more inspiring than any similar vision I've heard from
               | Pichai.
               | 
               | But I'm not trying to guess their intentions, I am just
               | stating the situation as I see it. And that situation is
               | one where whatever forces have caused it, OpenAI is
               | clearly investing very heavily in product (e.g. windsurf
               | acquisition, even suggesting building a social network).
               | And that shift in focus seems highly correlated with a
               | loss of significant research talent (as well as a healthy
               | dose of boardroom drama).
        
             | mmaunder wrote:
             | Note sure why their comment was downvoted. Google the
             | names. Hassabis runs DeepMind at Google which makes Gemini
             | and he's quite brilliant and has an unbelievable track
             | record. Buffet investing in teams points out that there are
             | smart people out there that think good leadership is a good
             | predictor of future success.
        
               | zoogeny wrote:
               | It may not be relevant to everyone, but it is worth
               | noting that his contribution to AlpaFold won Hassabis a
               | Nobel prize in chemistry.
        
               | mdp2021 wrote:
               | Zoogeny got downvoted? I did not do that. His comments
               | deserved more details anyway (at the level of those
               | kindly provided).
               | 
               | > _Google the names_
               | 
               | Was that a wink about the submission (a milestone from
               | Google)? Read Zoogeny's delightful reply and see whether
               | it can compare a search engine result (not to mention
               | that I asked for Zoogeny's insight, not for trivia). And
               | as a listener to Buffet and Munger, I can surely say that
               | they rarely indulge in tautologies.
        
               | zoogeny wrote:
               | I wouldn't worry about downvotes, it isn't possible on HN
               | to downvote direct replies to your message (unlike
               | reddit), so you cannot be accused of downvoting me unless
               | you did so using an alt.
               | 
               | Some people see tech like they see sports teams and they
               | vote for their tribe without considering any other
               | reason. I'm not shy stating my opinion even when it may
               | invite these kinds of responses.
               | 
               | I do think it is important for people to "do their own
               | research" and not take one man's opinion as fact. I
               | recommend people watch a few videos of Hassabis, there
               | are many, and judge his character and intelligence for
               | themselves. They may find they don't vibe with him and
               | genuinely prefer Altman.
        
           | sidibe wrote:
           | Sorry but my eyes rolled to the back of my head with this
           | one. This is between two teams with tons of smart
           | contributors, but the difference is one is more flexible and
           | able to take risks vs the other that has many times more
           | researchers and the world's best and most mature
           | infrastructure/tooling. Its not a CEO vs CEO battle
        
             | zoogeny wrote:
             | I think it requires a nuanced take but allow me to provide
             | some counter-examples.
             | 
             | The first is CEO pay rates. Another is the highest paid
             | public employees (which tend to be coaches at state
             | schools). This is evidence that the market highly values
             | managers.
             | 
             | Another is systemic failures within enterprises. When
             | Boeing had a few very public plane crashes, a certain
             | narrative suggested that the transition from highly capable
             | engineer managers to financial focus managers contributed
             | to the problem. A similar narrative has been used to
             | explain the decline of Intel.
             | 
             | Consider the return of Steve Jobs to Apple. Or the turn
             | around at Microsoft with Nadella.
             | 
             | All of these are complex cases that don't submit to an easy
             | analysis. Success and failure are definitely multi-factor
             | and rarely can be traced to a single definitive cause.
             | 
             | Perhaps another way to look at it would be: what percentage
             | of the success of highly complex organizations can be
             | attributed to management? To what degree can poor
             | management decisions contribute to the failure of an
             | otherwise capable organization?
             | 
             | How much you choose to weight those factors is entirely up
             | to you.
             | 
             | edit: I was also thinking about the way we think about the
             | advantage of exceptional generals/admirals in military
             | analysis. Or the effect a president can have on the
             | direction of a country.
        
         | throwup238 wrote:
         | Nobody has really talked about what I think is an advantage
         | just as powerful as the custom chips: Google Books. They
         | already won a landmark fair use lawsuit against book
         | publishers, digitized more books than anyone on earth, and used
         | their Captcha service to crowdsource its OCR. They've got the
         | best* legal cover and all of the best sources of human
         | knowledge already there. Then Youtube for video.
         | 
         | The chips of course push them over the top. I don't know how
         | much Deep Research is costing them but it's by far the best
         | experience with AI I've had so far with a generous 20/day rate
         | limit. At this point I must be using up at least 5-10 compute
         | hours a _day_. Until about a week ago I had almost completely
         | written off Google.
         | 
         | * For what it's worth, I don't know. IANAL
        
           | dynm wrote:
           | The amount of text in books is surprisingly finite. My best
           | estimate was that there are ~1013 tokens available in all
           | books (https://dynomight.net/scaling/#scaling-data), which is
           | less than frontier models are already being trained on. On
           | the other hand, book tokens are probably much "better" than
           | random internet tokens. Wikipedia for example seems to get
           | much higher weight than other sources, and it's only ~3x1010
           | tokens.
        
             | dr_dshiv wrote:
             | We need more books! On it...
        
               | kupopuffs wrote:
               | _opens up his favorite chat_
        
           | paxys wrote:
           | LibGen already exists, and all the top LLM publishers use it.
           | I don't know if Google's own book index provides a big
           | technical or legal advantage.
        
             | disgruntledphd2 wrote:
             | I'd be very surprised if the Google books index wasn't much
             | bigger and more diverse than libgen.
        
               | og_kalu wrote:
               | Anna's Archive is at 43M Books and 98M Papers [1]. The
               | book total is nearly double what Google has.
               | 
               | Google's scanning project basically stalled after the
               | legal battle. It's a very fascinating read [2].
               | 
               | [1] https://annas-archive.org/
               | 
               | [2] https://web.archive.org/web/20170719004247/https://ww
               | w.theat...
        
           | jofzar wrote:
           | Something that is not specifically called out but is also
           | super relevant is actually the transcription of YouTube
           | videos.
           | 
           | Every video is machine transcribed and stored and then for
           | larger videos the author will often transcribed them
           | themselves.
           | 
           | This is something they have already, it doesn't need any more
           | "work" to get it vs a competitor.
        
           | jppittma wrote:
           | I would think the biggest advantage is YouTube. There's a lot
           | of modern content for analysis that's uncontaminated by LLMs.
        
         | peterjliu wrote:
         | another advantage is people want the Google bot to crawl their
         | pages, unlike most AI companies
        
           | mmaunder wrote:
           | This is an underrated comment. Yes it's a big advantage and
           | probably a measurable pain point for Anthropic and OpenAI. In
           | fact you could just do a 1% survey of robots.txt out there
           | and get a reasonable picture. Maybe a fun project for an
           | HN'er.
        
           | jiocrag wrote:
           | Excellent point. If they can figure out how to either
           | remunerate or drive traffic to third parties in conjunction
           | with this, it would be huge.
        
           | newfocogi wrote:
           | This is right on. I work for a company with somewhat of a
           | data moat and AI aspirations. We spend a lot of time blocking
           | everyone's bots except for Google. We have people whose
           | entire job is it to make it faster for Google to access our
           | data. We exist because Google accesses our data. We can't not
           | let them have it.
        
           | CobrastanJorji wrote:
           | Reddit was an interesting case here. They knew that they had
           | particularly good AI training data, and they were able to
           | hold it hostage from the Google crawler, which was an awfully
           | high risk play given how important Google search results are
           | to Reddit ads, but they likely knew that Reddit search
           | results were also really important to Google. I would love to
           | be able to watch those negotiations on each side; what a
           | crazy high stakes negotiation that must've been.
        
             | mattlondon wrote:
             | Particularly good training data?
             | 
             | You can't mean the bottom-of-the-barrel dross that people
             | post on Reddit, so not sure what data you are referring to?
             | Click-stream?
        
               | CobrastanJorji wrote:
               | Say what you will, but there's a lot of good answers to
               | real questions people have that's on Reddit. There's a
               | whole thing where people say "oh Google search results
               | are bad, but if you append the word 'REDDIT' to your
               | search, you'll get the right answer." You can see that
               | most of these agents rely pretty heavily from stuff they
               | find on Reddit.
               | 
               | Of course, that's also a big reason why Google search
               | results suggest putting glue on pizza.
        
         | stefan_ wrote:
         | I don't know man, for months now people keep telling me on HN
         | how "Google is winning", yet no normal person I ever asked
         | knows what the fuck "Gemini" is. I don't know what they are
         | winning, it might be internet points for all I know.
         | 
         | Actually, some of the people polled recalled the Google AI
         | efforts by their expert system recommending glue on pizza and
         | smoking in pregnancy. It's a big joke.
        
           | mmaunder wrote:
           | Try uploading a bunch of PDF bank statements to notebooklm
           | and ask it questions. Or the results of blood work. It's jaw
           | dropping. e.g. uploaded 7 brokerage account statements as
           | PDFs in a mess of formats and asked it to generate table
           | summary data which it nailed, and then asked it to generate
           | actual trades to go from current position to a new position
           | in shortest path, and it nailed that too.
           | 
           | Biggest issue we have when using notebooklm is a lack of
           | ambition when it comes to the questions we're asking. And the
           | pro version supports up to 300 documements.
           | 
           | Hell, we uploaded the entire Euro Cyber Resilience Act and
           | asked the same questions we were going to ask our big name
           | legal firm, and it nailed every one.
           | 
           | But you actually make a fair point, which I'm seeing too and
           | I find quite exciting. And it's that even among my early
           | adopter and technology minded friends, adoption of the most
           | powerful AI tools is very low. e.g. many of them don't even
           | know that notebookLM exists. My interpretation on this is
           | that it's VERY early days, which is suuuuuper exciting for us
           | builders and innovators here on HN.
        
           | kube-system wrote:
           | While there are some first-party B2C applications like chat
           | front-ends built using LLMs, once mature, the end game is
           | almost certainly that these are going to be B2B products
           | integrated into other things. The future here goes a lot
           | further than ChatGPT.
        
           | shmoogy wrote:
           | That was ages ago.
           | 
           | Their new models excel at many things. Image editing, parsing
           | PDFs, and coding are what I use it for. It's significantly
           | cheaper than the closest competing models (Gemini 2.5 pro,
           | and flash experimental with image generation).
           | 
           | Highly recommend testing against openai and anthropic models
           | - you'll likely be pleasantly surprised.
        
         | labrador wrote:
         | > If the only answer for both is "we'll build it from scratch",
         | OpenAI is in very big trouble
         | 
         | They could buy Google+ code from Google and resurrect it with
         | OpenAI branding. Alternately they could partner with Bluesky
        
           | parsimo2010 wrote:
           | I don't think the issue is solving the technical
           | implementation of a new social media platform. The issue is
           | whether a new social media platform from OpenAI will deliver
           | the kind of value that existing platforms deliver. If they
           | promise investors that they'll get TikTok/Meta/YouTube levels
           | of content+interaction (and all the data that comes with it),
           | but deliver Mastodon levels, then they are in trouble.
        
         | onlyrealcuzzo wrote:
         | > The smart money among his investors know these issues to be
         | fundamental in deciding if OAI will succeed or not, and are
         | asking the hard questions.
         | 
         | OpenAI has already succeeded.
         | 
         | If it ends up being a $100B company instead of a $10T company,
         | that is success. By a very large margin.
         | 
         | It's hard to imagine a world in which OpenAI just goes bankrupt
         | and ends up being worth nothing.
        
           | bdangubic wrote:
           | it goes bankrupt when the cost of running the business
           | outweights the earnings in the long run
        
           | samuel wrote:
           | I can, and I would say it's a likely scenario, say 30%. If
           | they don't have a significant edge over their competitors in
           | the capabilities of their models, what's left? A money losing
           | web app, and some API services that I'm sure aren't very
           | profitable either. They can't compete with Google, Grok,
           | Meta, MS, Amazon... They just can't.
           | 
           | They can end being the Altavista of this era.
        
         | dyauspitr wrote:
         | I haven't heard this much positive sentiment about Google in a
         | while. Making something freely available really turns public
         | sentiment around.
        
       | mark_l_watson wrote:
       | Nice! Low price, even with reasoning enabled. I have been working
       | on a short new book titled "Practical AI with Google: A Solo
       | Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs" but
       | with all of Google's recent announcements it might not be a short
       | book.
        
       | serjester wrote:
       | Just ran it on one of our internal PDF (3 pages, medium
       | difficulty) to json benchmarks:
       | 
       | gemini-flash-2.0: 60 ish% accuracy 6,250 pages per dollar
       | 
       | gemini-2.5-flash-preview (no thinking): 80 ish% accuracy 1,700
       | pages per dollar
       | 
       | gemini-2.5-flash-preview (with thinking): 80 ish% accuracy (not
       | sure what's going on here) 350 pages per dollar
       | 
       | gemini-flash-2.5: 90 ish% accuracy 150 pages per dollar
       | 
       | I do wish they separated the thinking variant from the regular
       | one - it's incredibly confusing when a model parameter
       | dramatically impacts pricing.
        
         | ValveFan6969 wrote:
         | I have been having similar performance issues, I believe they
         | intentionally made a worse model (Gemini 2.5) to get more money
         | out of you. However, there is a way where you can make money
         | off of Gemini 2.5.
         | 
         | If you set the thinking parameter lower and lower, you can make
         | the model spew absolute nonsense for the first response. It
         | costs 10 cents per input / output, and sometimes you get a
         | response that was just so bad your clients will ask for more
         | and more corrections.
        
           | mpalmer wrote:
           | Wow, what apps have you made so I know never to use them?
        
       | zoogeny wrote:
       | Google making Gemini 2.5 Pro (Experimental) free was a big deal.
       | I haven't tried the more expensive OpenAI models so I can't even
       | compare, only to the free models I have used of theirs in the
       | past.
       | 
       | Gemini 2.5 Pro is so much of a step up (IME) that I've become
       | sold on Google's models in general. It not only is smarter than
       | me on most of the subjects I engage with it, it also isn't
       | completely obsequious. The model pushes back on me rather than
       | contorting itself to find a way to agree.
       | 
       | 100% of my casual AI usage is now in Gemini and I look forward to
       | asking it questions on deep topics because it consistently
       | provides me with insight. I am building new tools with the mind
       | to optimize my usage to increase it's value to me.
        
         | PerusingAround wrote:
         | This comment is exactly my experience, I feel like as if I had
         | wrote it myself.
        
         | cjohnson318 wrote:
         | Yeah, my wife pays for ChatGPT, but Gemini is fine enough for
         | me.
        
           | qwertox wrote:
           | Just be aware that if you don't add a key (and set up
           | billing) youre granting Google the right to train on your
           | data. To have persons read them and decide how to use them
           | for training.
        
             | energy123 wrote:
             | I thought if you turn off App Activity then that's good
             | enough to protect your data?
        
               | voxic11 wrote:
               | Nope, not if you are in the US
               | https://ai.google.dev/gemini-api/terms#data-use-unpaid
        
             | Graphon1 wrote:
             | > To have persons read them and decide how to use them for
             | training.
             | 
             | Not that I have any actual insight. but doesn't it seem
             | more likely that it will not be a human, but a model?
             | Models training models.
        
               | qwertox wrote:
               | > To help with quality and improve our products, human
               | reviewers may read, annotate, and process your API input
               | and output. Google takes steps to protect your privacy as
               | part of this process. This includes disconnecting this
               | data from your Google Account, API key, and Cloud project
               | before reviewers see or annotate it. Do not submit
               | sensitive, confidential, or personal information to the
               | Unpaid Services.
        
             | HDThoreaun wrote:
             | Unless you have the enterprise sub of openAI theyre
             | training on your data too
        
         | dr_kiszonka wrote:
         | I was a big fan of that model but it has been replaced in AI
         | Studio by its preview version, which, by comparison, is pretty
         | bad. I hope Google makes the release version much closer to the
         | experimental one.
        
           | zoogeny wrote:
           | I can confirm the model name in Run Settings has been updated
           | to "Gemini 2.5 Pro Preview ..." when it used to be "Gemini
           | 2.5 Pro (Experimental) ...".
           | 
           | I cannot confirm if the quality is downgraded since I haven't
           | had enough time with it. But if what you are saying is
           | correct, I would be very sad. My big fear is the full-fat
           | Gemini 2.5 Pro will be prohibitively expensive, but a dumbed
           | down model (for the sake of cost) would also be saddening.
        
           | dieortin wrote:
           | The preview version is exactly the same as the experimental
           | one afaik
        
           | gundmc wrote:
           | The AI Studio product lead said on Twitter that it is exactly
           | the same model just renamed for clarity when pricing was
           | announced
        
         | jeeeb wrote:
         | After comparing Gemini Pro and Claude Sonnet 3.7 coding answers
         | side by side a few times, I decided to cancel my Anthropic
         | subscription and just stick to Gemini.
        
           | wcarss wrote:
           | Google has killed so many amazing businesses -- entire
           | industries, even, by giving people something expensive for
           | free until the competition dies, and then they enshittify
           | hard.
           | 
           | It's cool to have access to it, but please be careful not to
           | mistake corporate loss leaders for authentic products.
        
             | JPKab wrote:
             | True. They are ONLY good when they have competition. The
             | sense of complacency that creeps in is so obvious as a
             | customer.
             | 
             | To this day, the Google Home (or is it called Nest now?)
             | speaker is the only physical product i've ever owned where
             | it lost features over time. I used to be able to play the
             | audio of a Youtube video (like a podcast) through it, but
             | then Google decided that it was very very important that I
             | only be able to play a Youtube video through a device with
             | a screen, because it is imperative that I see a still image
             | when I play a longform history podcast.
             | 
             | Obviously, this is a silly and highly specific example, but
             | it is emblematic of how they neglect or enshittify massive
             | swathes of their products as soon as the executive team
             | loses interest and puts their A team on some shiny new
             | object.
        
               | bitpush wrote:
               | The experience on Sonos is terrible. There are countless
               | examples of people sinking 1000s of dollars into Sonos
               | ecosystem, and the new app update has rendered them
               | useless.
        
               | nl wrote:
               | It's mostly fixed now (5 room Sonos setup here). It's
               | also a lot better at not dropping speakers off its
               | network
        
               | average_r_user wrote:
               | I'm experiencing the same problem with my Google Home
               | ecosystem. One day I can turn off the living room lights
               | with the simple phrase "Turn off Living Room Lights," and
               | then randomly for two straight days it doesn't understand
               | my command
        
               | freedomben wrote:
               | Preach it my friend. For years on the Google Home Hub (or
               | Nest Hub or whatever) I could tell it to "favorite my
               | photo" of what is on the screen. This allowed me to
               | incrementally build a great list of my favorite photos on
               | Google Photos and added a ton of value to my life. At
               | some point that broke, and now it just says, "Sorry, I
               | can't do that yet". Infuriating
        
             | mark_l_watson wrote:
             | In this case, Google is a large investor in Anthropic.
             | 
             | I agree that giving away access to expensive models long
             | term is not a good idea on several fronts. Personally, I
             | subscribe to Gemini Advanced and I pay for using the Gemini
             | APIs.
             | 
             | EDIT: a very good deal, at $10/month is
             | https://apps.abacus.ai/chatllm/ that gives you access to
             | almost all commercial models as well as the best open
             | weight models. I have never come close at all to using my
             | monthly credits with them. If you like to experiment with
             | many models the service is a lot of fun.
        
               | F7F7F7 wrote:
               | The problem with tools like this is that somewhere in the
               | chain between you and the LLM are token reducing
               | "features". Whether it's the system prompt, a cheaper LLM
               | middleman, or some other cost saving measure.
               | 
               | You'll never know what that something is. For me, I can't
               | help but think that I'm getting an inferior service.
        
               | revnode wrote:
               | You can self host something like https://big-agi.com/ and
               | grab your own keys from various providers. You end up
               | with the above, but without the pitfalls you mentioned.
        
               | mark_l_watson wrote:
               | BIG-AI does look cool, and supports a different use case.
               | ABACUS.AI takes your $10/month and gives you credits that
               | go towards their costs of using OpenAI, Anthropic,
               | Gemini, etc. Use of smaller open models use very few
               | credits.
               | 
               | The also support an application development framework
               | that looks interesting but I have never used it.
        
               | mark_l_watson wrote:
               | You might be correct about cost savings techniques in
               | their processing pipeline. But they also add
               | functionality: they bake web search into all models which
               | is convenient. I have no affiliation with ABACUS.AI, I am
               | just a happy customer. They currently let me play with 25
               | models.
        
               | freedomben wrote:
               | If anyone from Kagi is on, I'd love to know, does Kagi do
               | that?
        
             | bredren wrote:
             | (Public) corporate loss leaders? Cause they are all likely
             | corporate.
             | 
             | Also, Anthropic is also subsidizing queries, no? The new
             | "5x" plan illustrative of this?
             | 
             | No doubt anthropic's chat ux is the best right now, but it
             | isn't so far ahead on that or holding some UX moat that I
             | can tell.
        
             | pdntspa wrote:
             | The usage limit for experimental gets used up pretty fast
             | in a vibe-coding situation. I found myself setting up an
             | API account with billing enabled just to keep going.
        
             | gexla wrote:
             | It's not free. And it's legit one of the best models. And
             | it was a Google employee who was among the authors of the
             | paper that's most recognized as kicking all this off. They
             | give somewhat limited access in AIStudio (I have only hit
             | the limits via API access, so I don't know what the chat UI
             | limits are.) Don't they all do this? Maybe harder limits
             | and no free API access. But I think most people don't even
             | know about AIStudio.
        
             | bossyTeacher wrote:
             | Just look at Chrome to see the bard/gemini's future. HN
             | folks didn't care about Chrome then but cry about Google's
             | increasingly hostile development of Chrome.
             | 
             | Look at Android.
             | 
             | HN behaviour is more like a kid who sees the candy, wants
             | the candy and eats as much as it can without worrying about
             | the damaging effect that sugar will have on their health.
             | Then, the diabetes diagnosis arrives and they complain
        
             | lxgr wrote:
             | How would I know if it's useful to me without being able to
             | trial it?
             | 
             | Googles previous approach (Pro models available only to
             | Gemini Advanced subscribers, and Advanced trials can't be
             | stacked with Google One paid storage, or rather they
             | convert the already paid storage portion to a _paid_ , much
             | shorter Advanced subscription!) was mind-bogglingly stupid.
             | 
             | Having a free tier on all models is the reasonable option
             | here.
        
           | blueyes wrote:
           | One of the main advantages Anthropic currently has over
           | Google is the tooling that comes with Claude Code. It may not
           | generate better code, and it has a lower complexity ceiling,
           | but it can automatically find and search files, and figure
           | out how to fix a syntax error fast.
        
             | bayarearefugee wrote:
             | As another person that cancelled my Claude and switched to
             | Gemini, I agree that Claude Code is very nice, but beyond
             | some initial exploration I never felt comfortable using it
             | for real work because Claude 3.7 is far too eager to
             | overengineer half-baked solutions that extend far beyond
             | what you asked it to do in the first place.
             | 
             | Paying real API money for Claude to jump the gun on
             | solutions invalidated the advantage of having a tool as
             | nice as Claude Code, at least for me, I admit everyone's
             | mileage will vary.
        
               | roygbiv2 wrote:
               | I wanted some powershell code to do some sharepoint
               | uploading. It created a 1000 line logging module that
               | allowed me to log things at different levels like info,
               | debug, error etc. Not really what I wanted.
        
               | neuah wrote:
               | Exactly my experience as well. Started out loving it but
               | it almost moves too fast - building in functionality that
               | i might want eventually but isn't yet appropriate for
               | where the project is in terms of testing, or is just in
               | completely the wrong place in the architecture. I try to
               | give very direct and specific prompts but it still has
               | the tendency to overreach. Of course it's likely that
               | with more use i will learn better how to rein it in.
        
               | Hugsun wrote:
               | I've experienced this a lot as well. I also just
               | yesterday had an interesting _argument_ with claude.
               | 
               | It put an expensive API call inside a useEffect hook. I
               | wanted the call elsewhere and it fought me on it pretty
               | aggressively. Instead of removing the call, it started
               | changing comments and function names to say that the call
               | was just loading already fetched data from a cache (which
               | was not true). I could not find a way to tell it to
               | remove that API call from the useEffect hook, It just
               | wrote more and more motivated excuses in the surrounding
               | comments. It would have been very funny if it weren't so
               | expensive.
        
               | freedomben wrote:
               | Geez, I'm not one of the people who think AI is going to
               | wake up and wipe us out, but experiences like yours do
               | give me pause. Right now the AI isn't in the drivers seat
               | and can only assert itself through verbal expression, but
               | I know it's only a matter of time. We already saw Cursor
               | themselves get a taste of this. To be clear I'm not
               | suggesting the AI is sentient and malicious - I don't
               | believe that at all. I think it's been
               | trained/programmed/tuned to do this, though not
               | intentionally, but the nature of these tools is they will
               | surprise us
        
               | arrowsmith wrote:
               | > We already saw Cursor themselves get a taste of this.
               | 
               | Sorry what do you mean by this?
        
               | tempoponet wrote:
               | Earlier this week a Cursor AI support agent told a user
               | they could only use Cursor on one machine at a time,
               | causing the user to cancel their subscription.
        
               | Jensson wrote:
               | > but the nature of these tools is they will surprise us
               | 
               | Models used to do this much much more than now, so what
               | it did doesn't surprise us.
               | 
               | The nature of these tools is to copy what we have already
               | written. It has seen many threads where developers argue
               | and dig in, they try to train the AI not to do that but
               | sometimes it still happens and then it just roleplays as
               | the developer that refuses to listen to anything you say.
        
               | btbuildem wrote:
               | "Don't be a keener. Do not do anything I did not ask you
               | to do" are def part of my prompts when using Claude
        
               | Sonnigeszeug wrote:
               | Whats your setup/workflow then?
               | 
               | Any ide integration?
        
               | tough wrote:
               | Open Codex (A codex fork) that supports gemini and
               | openrouter providers https://github.com/ymichael/open-
               | codex
               | 
               | google models on cli are great.
        
             | WiSaGaN wrote:
             | Also the "project" feature in claude improves experience
             | significantly for coder, where you can customize your
             | workflow. Would be great if gemini has this feature.
        
             | energy123 wrote:
             | Google need to fix their Gemini web app at a basic level.
             | It's slow, gets stuck on Show Thinking, rejects 200k token
             | prompts that are sent one shot. Aistudio is in much better
             | shape.
        
               | Graphon1 wrote:
               | But have you tried any other interfaces for Gemini? Like
               | the Gemini Code Assistant in VSCode? Or Gemini-backed
               | Aider?
        
               | roygbiv2 wrote:
               | Have you tried them? Which one is fairly simple but just
               | works?
        
               | johnisgood wrote:
               | I hate how I can copy paste long text into Claude
               | (becomes a pasted text) and it is accepted, but in Gemini
               | it is limited.
        
               | Workaccount2 wrote:
               | You can paste it in a text file and upload that. A little
               | annoying compared to claude, but does work.
        
               | johnisgood wrote:
               | Thanks, will give it a try.
        
               | xbmcuser wrote:
               | Uploading files on google is now great. I uploaded my
               | python script and the text data files I was using the
               | script to process. I asked it how best to optimize the
               | code. It actually ran the python code on the data files.
               | Then recommended changes then when prompted ran the
               | script again to show the new results. At first I was like
               | maybe hallucinating but no the data was correct.
        
               | johnisgood wrote:
               | Yeah "they" run Python code now quite well. They generate
               | some output using Python "internally" (albeit shows you
               | the code).
        
               | shrisukhani wrote:
               | +1 on this. Improving Gemini apps and live mode will go
               | such a long way for them. Google actually has the best
               | model line-up now but the apps and APIs hold them back so
               | much.
        
             | mogili wrote:
             | I use roo code with Gemini to get similar results for free
        
               | ssd532 wrote:
               | Does its agentic features work with any API? I had tried
               | this or Cline and it was clear that they work effectively
               | only with Claude's tooling support.
        
             | igor47 wrote:
             | I've switched to aider with the --watch-files flag. Being
             | able to use models in nvim with no additional tooling is
             | pretty sweet
        
               | mediaman wrote:
               | That's really cool. I've been looking for a nicer
               | solution to use with nvim.
        
               | aitchnyu wrote:
               | Typing `//use this as reference ai` in one file and
               | `//copy this row to x ai!` and it will add those
               | functions/files to context and act on both places.
               | Altough I wish Aider would write `working on your
               | request...` under my comment, now I have to keep Aider
               | window in sight. Autocomplete and "add to context" and
               | "enter your instructions" of other apps feel clunky.
        
             | julianeon wrote:
             | Related:
             | 
             | Only Claude (to my knowledge) has a desktop app which can
             | directly, and usually quite intelligently, modify files and
             | create repos on your desktop. It's the only "agentic"
             | option among the major players.
             | 
             | "Claude, make me an app which will accept Stripe payments
             | and sell an ebook about coding in Python; first create the
             | app, then the ebook."
             | 
             | It would take a few passes but Claude could do this;
             | obviously you can't do that with an API alone. That
             | capability alone is worth $30/month in my opinion.
        
               | indexerror wrote:
               | OpenAI just released Codex, which is basically the same
               | as Claude Code.
        
               | hiciu wrote:
               | It looks the same, but for some reason Claude Code is
               | much more capable. Codex got lost in my source code and
               | hallucinated bunch of stuff, Claude on the same task just
               | went to town, burned money and delivered.
               | 
               | Of course, this is only my experience and codex is still
               | very young. I really hope it becomes as capable as
               | Claude.
        
               | rockwotj wrote:
               | Part of it is probably tgat claude is just better at
               | coding than what openai has available. I am considering
               | trying to hack in support for gemini into codex and play
               | around with it.
        
               | lytedev wrote:
               | I was doing this last night with open-codex, a fork.
               | https://github.com/ymichael/open-codex
        
               | thrdbndndn wrote:
               | Copilot agent mode?
        
               | xvinci wrote:
               | Maybe I am not understanding something here.
               | 
               | But there are third party options availabe that to the
               | very same thing (e.g. https://aider.chat/ ) which allow
               | you to plug in a model (or even a combination thereof
               | e.g. deepseek as architect and claude as code writer) of
               | your choice.
               | 
               | Therefore the advantage of the model provider providing
               | such a thing doesn't matter, no?
        
               | jm547ster wrote:
               | Aider is not agentic - it is interactive by design.
               | Copilot agent mode and Cline would better comparisons.
        
               | tough wrote:
               | OpenAI launched codex 2 days ago, there's open forks
               | already that support other providers too
               | 
               | there's also claude code proxy's to run it on local llm's
               | 
               | you can just do things
        
               | int_19h wrote:
               | A first party app, sure, but there's no shortage of third
               | party options. Cursor, Windsurf/Codeium etc. Even VSCode
               | has agent mode now.
        
               | dingnuts wrote:
               | > first create the app, then the ebook."
               | 
               | > It would take a few passes but Claude could do this;
               | 
               | I'm sorry but absolutely nothing I've seen from using
               | Claude indicates that you could give it a vague prompt
               | like that and have it actually produce anything worth
               | reading.
               | 
               | Can it output a book's worth of bullshit with that
               | prompt? Yes. But if you think "write a book about Python"
               | is where we are in the state of the art in language
               | models in terms of the prompt you need to get a coherent
               | product, I want some of whatever you are smoking because
               | that has got to be the good shit
        
             | vladmdgolam wrote:
             | There are at least 10 projects currently aiming to recreate
             | Claude Code, but for Gemini. For example, geminicodes.co by
             | NotebookLM's founding PM Raiza Martin
        
             | mrinterweb wrote:
             | I don't understand the appeal of investing in leaning and
             | adapting your workflow to use an AI tool that is so tightly
             | coupled to a single LLM provider, when there are other
             | great AI tools available that are not locked to a single
             | LLM provider. I would guess aider is the closest thing to
             | claude code, but you can use pretty much any LLM.
             | 
             | The LLM field is moving so fast that what is the leading
             | frontier model today, may not be the same tomorrow.
             | 
             | Pricing is another important consideration.
             | https://aider.chat/docs/leaderboards/
        
               | smallnamespace wrote:
               | All the AI tools end up converging on a similar workflow:
               | type what you want and interrupt if you're not getting
               | what you want.
        
             | mdhb wrote:
             | Firebase Studio is the Google equivalent
        
           | mamp wrote:
           | I've been using Gemini 2.5 and Claude 3.7 for Rust
           | development and I have been very impressed with Claude, which
           | wasn't the case for some architectural discussions where
           | Gemini impressed with it's structure and scope. OpenAI 4.5
           | and o1 have been disappointing in both contexts.
           | 
           | Gemini doesn't seem to be as keen to agree with me so I find
           | it makes small improvements where Claude and OpenAI will go
           | along with initial suggestions until specifically asked to
           | make improvements.
        
             | yousif_123123 wrote:
             | I have noticed Gemini not accepting an instruction to
             | "leave all other code the same but just modify this part"
             | on a code that included use of an alpha API with a
             | different interface than what Gemini knows is the correct
             | current API. No matter how I promoted 2.5 pro, I couldn't
             | get it to respect my use of the alpha API, it would just
             | think I must be wrong.
             | 
             | So I think patterns from the training data are still
             | overriding some actual logic/intelligence in the model. Or
             | the Google assistant fine-tuning is messing it up.
        
               | Workaccount2 wrote:
               | I have been using gemini daily for coding for the last
               | week, and I swear that they are pulling levers and A/B
               | testing in the background. Which is a very google thing
               | to do. They did the same thing with assistant, which I
               | was a pretty heavy user of back in the day (I was driving
               | a lot).
        
           | onlyrealcuzzo wrote:
           | Yes, IME, Anthropic seemed to be ahead of Google by a decent
           | amount with Sonnet 3.5 vs 1.5 Pro.
           | 
           | However, Sonnet 3.7 seemed like a very small increase,
           | whereas 2.5 Pro seemed like quite a leap.
           | 
           | Now, IME, Google seems to be comfortably ahead.
           | 
           | 2.5 Pro is a little slow, though.
           | 
           | I'm not sure which model Google uses for the AI answers on
           | search, but I find myself using Search for a lot of things I
           | might ask Gemini (via 2.5 Pro) if it was as fast as Search's
           | AI answers.
        
             | dmix wrote:
             | How's is the speed of Gemini vs 3.7?
        
               | benhurmarcel wrote:
               | I use both, Gemini 2.5 Pro is significantly slower than
               | Claude 3.7.
        
               | rockwotj wrote:
               | Yeah I have read gemini pro 2.5 is a much bigger model.
        
           | Graphon1 wrote:
           | Just curious, what tool do you use to interface with these
           | LLMs? Cursor? or Aider? or...
        
             | speedgoose wrote:
             | I'm on GitHub Copilot with VsCode Insiders, mostly because
             | I don't have to subscribe to one more thing.
             | 
             | They pretty quick to let you use the latest models
             | nowadays.
        
               | nicr_22 wrote:
               | I really like the open source Cline extension. It
               | supports most of the model APIs, just need to copy/paste
               | an API key.
        
           | jessep wrote:
           | I have had a few epic refactoring failures with Gemini
           | relative to Claude.
           | 
           | For example: I asked both to change a bunch of code into
           | functions to pass into a `pipe` type function, and Gemini
           | truly seemed to have no idea what it was supposed to do, and
           | Claude just did it.
           | 
           | Maybe there was some user error or something, but after that
           | I haven't really used Gemini.
           | 
           | I'm curious if people are using Gemini and loving it are
           | using it mostly for one-shotting, or if they're working with
           | it more closely like a pair programmer? I could buy that it
           | could maybe be good at one but bad at the other?
        
             | Asraelite wrote:
             | This has been my experience too. Gemini might be better for
             | vibe coding or architecture or whatever, but Claude
             | consistently feels better for serious coding. That is, when
             | I know exactly how I want something implemented in a large
             | existing codebase, and I go through the full cycle of
             | implementation, refinement, bug fixing, and testing,
             | guiding the AI along the way.
             | 
             | It also seems to be better at incorporating knowledge from
             | documentation and existing examples when provided.
        
               | int_19h wrote:
               | My experience has been exactly the opposite - Sonnet did
               | fine on trivial tasks, but couldn't e.g. fix a bug end-
               | to-end (from bug description in the tracker to
               | implementing the fix and adding tests) properly because
               | it couldn't understand how the relevant code worked,
               | whereas Gemini would consistently figure out the root
               | cause and write decent fix & tests.
               | 
               | Perhaps this is down to specific tools and their prompts?
               | In my case, this was Cursor used in agent mode.
               | 
               | Or perhaps it's about the languages involved - my
               | experiments were with TypeScript and C++.
        
               | Asraelite wrote:
               | > Gemini would consistently figure out the root cause and
               | write decent fix & tests.
               | 
               | I feel like you might be using it differently to me. I
               | generally don't ask AI to find the cause of a bug,
               | because it's quite bad at that. I use it to identify
               | relevant parts of the code that could be involved in the
               | bug, and then I come up with my own hypotheses for the
               | cause. Then I use AI to help write tests to validate
               | these hypotheses. I mostly use Rust.
        
               | int_19h wrote:
               | I used to use them mostly in "smart code completion" mode
               | myself until very recently. But with all the AI IDEs
               | adding agentic mode, I was curious to see how well that
               | fares if I let it drive.
               | 
               | And we aren't talking about trivial bugs here. For
               | TypeScript, the most impressive bug it handled to date
               | was an async race condition due to missing await causing
               | a property to be overwritten with invalid value. For that
               | one I actually had to do some manual debugging and tell
               | it what I observed, but given that info, it was able to
               | locate the problem in the code all by itself and fix it
               | correctly and come up with a way to test it as well.
               | 
               | For C++, the codebase in question was gdb, the bug was a
               | test issue, and it correctly found problematic code based
               | solely on the test log (but I had to prod it a bit in the
               | right direction for the fix).
               | 
               | I should note that this is Gemini Pro 2.5 specifically.
               | When I tried Google's models previously (for all kinds of
               | tasks), I was very unimpressed - it was noticeably worse
               | than other SOTA models, so I was very skeptical going
               | into this. Indeed, I started with Sonnet precisely
               | because my past experience indicated that it was the best
               | option, and I only tried Gemini after Sonnet fumbled.
        
               | Asraelite wrote:
               | I use it for basically everything I can, not just code
               | completion, including end-to-end bug fixes when it makes
               | sense. But most of the time even the current Gemini and
               | Claude models fail with the hard things.
               | 
               | It might be because most bugs that you would encounter in
               | other languages don't occur in the first place in Rust
               | because of the stronger type system. The race condition
               | one you mentioned wouldn't be possible for example. If
               | something like that would occur, it's a compiler error
               | and the AI fixes it while still in the initial
               | implementation stage by looking at the linter errors. I
               | also put a lot of effort into trying to use coding
               | patterns that do as much validation as possible within
               | the type system. So in the end all that's left are the
               | more difficult bugs where a human is needed to assist
               | (for now at least, I'm confident that the models are only
               | going to get better).
        
               | int_19h wrote:
               | Race conditions can span across processes (think async
               | process communication).
               | 
               | That said I do wonder if the problems you're seeing are
               | simply because there isn't that much Rust in the training
               | set for the models - because, well, there's relatively
               | little of it overall when you compare it to something
               | like C++ or JS.
        
           | sleiben wrote:
           | Same here. Especially for native app development with swift I
           | had way better results and just sticked with Gemini-2.5-*
        
           | yieldcrv wrote:
           | I also cancelled my Anthropic yesterday, not because of
           | Gemini but because it was the absolute _worst_ time for
           | Anthropic to limit their Pro plan to upsell their Max plan
           | when there is so much competition out there
           | 
           | Manus.im also does code generation in a nice UI, but I'll
           | probably be using Gemini and Deepseek
           | 
           | No Moat strikes again
        
         | fsndz wrote:
         | More and more people are coming to the realisation that Google
         | is actually winning at the model level right now.
        
           | zaphirplane wrote:
           | What's with the Google cheer squad in this thread, usually
           | it's Google lost its way and is evil.
           | 
           | Can't be employees cause usually there is a disclaimer
        
             | pjerem wrote:
             | Google can be evil and release impressive language models.
             | The same way as Apple releasing incredible hardware with
             | good privacy while also being a totally insufferable and
             | arrogant company.
        
             | crowbahr wrote:
             | Google employees only have to disclaimer when they're
             | identified as Google employees.
             | 
             | So shit like "as a googler" requires "my opinions are my
             | own yadda yadda"
        
           | MagicMoonlight wrote:
           | I haven't met a single person that uses Gemini. Companies are
           | using Copilot and individuals are using ChatGPT.
           | 
           | Also, why would I want Google to spy on my AI usage? They're
           | evil.
        
             | fsndz wrote:
             | why is Google more evil than say OpenAI ?
        
         | m3kw9 wrote:
         | Using Claude code and Codex CLI and then Aider with Gemini 2.5
         | pro, Aider is much faster because you feed in the files instead
         | of using tools to start doing all kinds of whole know what
         | spending 10x the tokens. I tried a relatively simple refactor
         | which needed around 7 files changed, only Aider with 2.5 got it
         | and in the first shot. Where as both Codex and Claude code
         | completely fumbled it
        
         | goshx wrote:
         | Same here! It is borderline stubborn at times and I need to
         | prove it wrong. Still, it is the best model to use with Cursor,
         | in my experience.
        
         | teleforce wrote:
         | >obsequious
         | 
         | Thanks for the new word, I have to look it up.
         | 
         | "obedient or attentive to an excessive or servile degree"
         | 
         | Apparently it means an AI that mindlessly follow your logic and
         | instructions without reasoning and articulation is not good
         | enough.
        
           | nemomarx wrote:
           | I think here it's referring to a common problem where the AI
           | agrees with your position too easily, and/or changes it's
           | answer if you tell it the answer is wrong instantly
           | (therefore providing no stable true answer if you asked it
           | something about a fact)?
           | 
           | Also the slightly over cheery tone maybe.
        
             | lylah69 wrote:
             | I like to do this with Claude. It takes 5 back & forths to
             | get an uncertain answer.
             | 
             | Is there a way to tackle this?
        
           | zoogeny wrote:
           | It's a bit of a fancy way to say "yes man". Like in
           | corporations or politics, if a leader surrounds themselves
           | with "yes men".
           | 
           | A synonym would be sycophantic which would be "behaving or
           | done in an obsequious way in order to gain advantage." The
           | connotation is the other party misrepresents their own
           | opinion in order to gain favor or avoid disapproval from
           | someone of a higher status. Like when a subordinate tries to
           | guess what their superior wants to hear instead of providing
           | an unbiased response.
           | 
           | I think that accurately describes my experience with some
           | LLMs due to heavy handed RLHF towards agreeableness.
           | 
           | In fact, I think obsequious is a better word since it doesn't
           | have the cynical connotation of sycophant. LLMs don't have a
           | motive and obsequious describes the behavior without
           | specifying the intent.
        
             | teleforce wrote:
             | Yes, that's the first two words that come to my mind when I
             | read the meaning. The Gen Z word now I think is "simp".
        
               | zoogeny wrote:
               | Yeah, it is very close. But I feel simp has a bit of a
               | sexual feel to it. Like a guy who does favors for a girl
               | expecting affection in return, or donates a lot of money
               | to an OnlyFans or Twitch streamer. I also see simp used
               | where we used to call it white-knighting (e.g. "to simp
               | for").
               | 
               | Obsequious is a bit more general. You could imagine
               | applying it to a waiter or valet who is annoyingly
               | helpful. I don't think it would feel right to use the
               | word simp in that case.
               | 
               | In my day we would call it sucking up. A bit before my
               | time (would sound old timey to me) people called it boot
               | licking. In the novel "Catcher in the Rye", the
               | protagonist uses the word "phony" in a similar way. This
               | kind of behavior is universally disliked so there is a
               | lot slang for it.
        
               | snthpy wrote:
               | Thanks, as an old timer TIL about simp.
        
           | tkgally wrote:
           | Another useful word in this context is "sycophancy," meaning
           | excessive flattery or insincere agreement. Amanda Askell of
           | Anthropic has used it to describe a trait they try to
           | suppress in Claude:
           | 
           | https://youtube.com/watch?v=ugvHCXCOmm4&t=10286
        
             | davidsainez wrote:
             | The second example she uses is really important. You (used
             | to) see this a lot in stackoverflow where an inexperienced
             | programmer asks how to do some convoluted thing. Sure, you
             | can explain how to do the thing while maintaining their
             | artificial constraints. But much more useful is to say "you
             | probably want to approach the problem like this instead".
             | It is surely a difficult problem and context dependent.
        
               | pinoy420 wrote:
               | XY problem
        
             | snthpy wrote:
             | Interesting that Americans appear to hold their AI models
             | to a higher standard than their politicians.
        
               | brookst wrote:
               | Different Americans.
        
               | syndeo wrote:
               | Lots of folks in tech have different opinions than you
               | may expect. Many will either keep quiet or play along to
               | keep the peace/team cohesion, but you really never know
               | if they actually agree deep down.
               | 
               | Their career, livelihoods, ability to support their
               | families, etc. are ultimately on the line, so they'll pay
               | lip service if they have to. Consider it part of the job
               | at that point; personal beliefs are often left at the
               | door.
        
           | sans_souse wrote:
           | I wonder if anyone here will know this one; I learned the
           | word "obsequious" over a decade ago while working the line of
           | a restaurant. I used to listen to the 2p2 (2 plus 2) poker
           | podcasts during prep and they had a regular feature with
           | David Sklansky (iirc) giving tips, stories, advice etc. This
           | particular one he simply gave the word "obsequious" and
           | defined it later. I remember my sous chef and I were debating
           | what it could mean and I guessed it right. I still can't
           | remember what it had to do with poker, but that's besides the
           | point.
           | 
           | Maybe I can locate it
        
             | sicromoft wrote:
             | I didn't hear that one but I am a fan of Sklansky. And I
             | also have a very vivid memory of learning the word, when I
             | first heard the song Turn Around by They Might Be Giants.
             | The connection with the song burned it into my memory.
        
         | UltraSane wrote:
         | I had a very interesting long debate/discussion with Gemini 2.5
         | Pro about the Synapse-Evolve bank debacle among other things.
         | It really feels like debating a very knowledgeable and smart
         | human.
        
           | rat9988 wrote:
           | You didn't have a debate, you just researched a question.
        
             | zoogeny wrote:
             | One mans debate is another mans research.
        
               | rat9988 wrote:
               | Indeed, but a research isn't necessarily a debate. In
               | this case, it was not.
        
             | UltraSane wrote:
             | All right Mr. Pednatic. Very complex linear algebra created
             | a very convincing illusion of a debate. You happy now?
             | 
             | But good LLMs will take a position and push back at your
             | arguments.
        
         | jofzar wrote:
         | My work doesn't have access to 2.5 pro and all these posts are
         | just making me want it so much more.
         | 
         | I hate how slow things are sometimes.
        
           | basch wrote:
           | Can't you just go into aistudio with any free gmail account?
        
             | sciurus wrote:
             | For many workplaces, it's not just that that don't pay for
             | a service, it's that using it is against policy. If I tried
             | to paste some code into ChatGPT, for example, our data loss
             | prevention spyware would block it and I'd soon be having an
             | uncomfortable conversation with our security team.
             | 
             | (We do have access to GitHub Copilot)
        
               | Atotalnoob wrote:
               | Good news then, your GitHub admins can enable Gemini for
               | you without issue.
        
               | d1sxeyes wrote:
               | "Without issue" is an optimistic perspective on how this
               | works in many organisations.
        
         | i_love_retros wrote:
         | Why is it free / so cheap (I seem to be getting charged a few
         | cents a day using it with aider so not free but still crazy
         | cheap compared to sonnet)
        
           | brendanfinan wrote:
           | we know how Google makes money
        
             | d1sxeyes wrote:
             | Give it a few months and it will ignore all your questions
             | and just ask if you've watched Rampart.
        
               | disgruntledphd2 wrote:
               | To be fair, Google do have a cost advantage here as
               | they've built their own hardware.
        
         | redox99 wrote:
         | I've had many disappointing results with gemini 2.5 pro. For
         | general queries possibly involving search, chatgpt and grok
         | work better for me.
         | 
         | For code, gemini is very buggy in cursor, so I use Claude 3.7.
         | But it might be partly cursor's fault.
        
         | rgoulter wrote:
         | The _1 million_ token context window also means you can just
         | copy /paste so much source code or log output.
        
         | crossroadsguy wrote:
         | One difference, and imho that's a big difference -- you can't
         | use any of the Google's chatbots/models without being logged
         | in, unlike chatgpt.
        
         | casey2 wrote:
         | It's a big deal, but not in the way that you think. A race to
         | the bottom is humanities best defense against fast takeoff.
        
         | instagraham wrote:
         | obsequious is such a nice word for this context, only possible
         | in the AI age.
         | 
         | i'd find the same word improper to describe human beings -
         | other words like plaintive, obedient and compliant often do the
         | job better and are less obscure.
         | 
         | here it feels like a word whose time has come.
        
         | _blk wrote:
         | Have you tried Grok 3? It's a bit verbose for my taste even
         | when prompted to be brief but answers seem better/more
         | researched and less opinionated. It's also more willing to
         | answer questions where the other models block an answer.
        
           | fuzzylightbulb wrote:
           | A lot of people don't want to patronize the businesses of an
           | unabashed Nazi sympathizer. There are more important things
           | in life than model output quality.
        
           | zoogeny wrote:
           | I have not tried any of the Grok models but that is probably
           | because I am rarely on X.
           | 
           | I have to admit I have a bias where I think Google is
           | "business" while Grok is for lols. But I should probably take
           | the time to asses it since I would prefer to have an opinion
           | based on experience rather than vibes.
        
         | MetaWhirledPeas wrote:
         | > 100% of my casual AI usage is now in Gemini and I look
         | forward to asking it questions on deep topics because it
         | consistently provides me with insight.
         | 
         | It's probably great for lots of things but it doesn't seem very
         | good for recent news. I asked it about recent accusations
         | around xAI and methane gas turbines and it had no clue what I
         | was talking about. I asked the same question to Grok and it
         | gave me all sorts of details.
        
           | ramesh31 wrote:
           | >It's probably great for lots of things but it doesn't seem
           | very good for recent news.
           | 
           | You are missing the point here. The LLM is just the
           | "reasoning engine" for agents now. Its corpus of facts are
           | meaningless, and shouldn't really be relied upon for
           | anything. But in conjunction with a tool calling agentic
           | process, with access to the web, what you described is now
           | trivially doable. Single shot LLM usage is not really
           | anything anyone should be doing anymore.
        
             | darksaints wrote:
             | That's all fine and dandy, but if you google anything
             | related to llm agents, you get 1000 answers to 100
             | questions, companies hawking their new "visual programming"
             | agent composers, and a ton of videos of douchebags trying
             | to be the Steve Jobs of AI. The concept I'm sure is fine,
             | but execution of agentic anything is still the Wild Wild
             | West and nobody knows what they're really doing.
        
               | ramesh31 wrote:
               | Indeed there is a mountain of snake oil out there at this
               | point, but the underlying concepts are extremely simple,
               | and can be implemented directly without frameworks.
               | 
               | I generally point people to Anthropic's seminal blog post
               | on the topic:
               | https://www.anthropic.com/engineering/building-effective-
               | age...
        
             | MetaWhirledPeas wrote:
             | > You are missing the point here.
             | 
             | I'm just discussing the GP's topic of casual use. Casual
             | use implies heading over to an already-hosted prompt and
             | typing in questions. Implementing my own 'agentic process'
             | does not sound very casual to me.
        
               | ramesh31 wrote:
               | > Implementing my own 'agentic process' does not sound
               | very casual to me.
               | 
               | It really is though. This can be as simple as using
               | Claude desktop with a web search tool.
        
           | arizen wrote:
           | This was my experience as well.
           | 
           | Gemini performing the best on coding tasks, while giving
           | underwhelming responses on recent news.
           | 
           | While Grok was OK for coding tasks, but being linked to X,
           | provided best response on recent events.
        
       | minimaxir wrote:
       | One hidden note from Gemini 2.5 Flash when diving deep into the
       | documentation: for image inputs, not only can the model be
       | instructed to generated 2D bounding boxes of relevant subjects,
       | but it can also create segmentation masks!
       | https://ai.google.dev/gemini-api/docs/image-understanding#se...
       | 
       | At this price point with the Flash model, creating segmentation
       | masks is pretty nifty.
       | 
       | The segmentation masks are a bit of a galaxy brain implementation
       | by generating a b64 string representing the mask:
       | https://colab.research.google.com/github/google-gemini/cookb...
       | 
       | I am trying to test it in AI Studio but it sometimes errors out,
       | likely because it tries to decode the b64 lol.
        
         | behnamoh wrote:
         | Wait, did they just kill YOLO, at least for time-insensitive
         | tasks?
        
           | minimaxir wrote:
           | YOLO is probably still cheaper if bounding boxes are your
           | main goal. Good segmentation models that work for arbitrary
           | labels, however, are much more expensive to set up and run,
           | so this type of approach could be an interesting alternative
           | depending on performance.
        
           | daemonologist wrote:
           | No, the speed of YOLO/DETR inference makes it cheap as well -
           | probably at least five or six orders of magnitude cheaper.
           | Edit: After some experimentation, Gemini also seems to not
           | perform nearly as well as a purpose-tuned detection model.
           | 
           | It'll be interesting to test this capability and see how it
           | evolves though. At some point you might be able use it as a
           | "teacher" to generate training data for new tasks.
        
           | vunderba wrote:
           | Well no. You can run/host YOLO which means not having to
           | submit potentially sensitive information to a company that
           | generates a large amount of revenue from targeted
           | advertising.
        
         | daemonologist wrote:
         | Interestingly if you run this in Gemini (instead of AI Studio)
         | you get:                   I am sorry, but I was unable to
         | generate the segmentation masks for _ in the image due to an
         | internal error with the tool required for this task.
         | 
         | (Not sure if that's a real or hallucinated error.)
        
         | ipsum2 wrote:
         | The performance is basically so bad it's unusable though,
         | segmentation models and object detection models are still the
         | best, for now.
        
         | msp26 wrote:
         | I've had mixed results with the bounding boxes even on 2.5 pro.
         | On complex images where a lot of boxes need to be drawn they're
         | in the general region but miss the exact location of objects.
        
         | simonw wrote:
         | This is SO cool. I built an interactive tool for trying this
         | out (bring your own Gemini API key) here:
         | https://tools.simonwillison.net/gemini-mask
         | 
         | More details plus a screenshot of the tool working here:
         | https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...
         | 
         | I vibe coded it using Claude and O3.
        
         | xnx wrote:
         | There is a starter app in AI Studio that demos this:
         | https://aistudio.google.com/apps/bundled/spatial-understandi...
        
       | simonw wrote:
       | I spotted something interesting in the Python API library code:
       | 
       | https://github.com/googleapis/python-genai/blob/473bf4b6b5a6...
       | class ThinkingConfig(_common.BaseModel):           """The
       | thinking features configuration."""
       | include_thoughts: Optional[bool] = Field(
       | default=None,               description="""Indicates whether to
       | include thoughts in the response. If true, thoughts are returned
       | only if the model supports thought and thoughts are available.
       | """,           )           thinking_budget: Optional[int] =
       | Field(               default=None,
       | description="""Indicates the thinking budget in tokens.
       | """,           )
       | 
       | That thinking_budget thing is documented, but what's the deal
       | with include_thoughts? It sounds like it's an option to have the
       | API return the thought summary... but I can't figure out how to
       | get it to work, and I've not found documentation or example code
       | that uses it.
       | 
       | Anyone managed to get Gemini to spit out thought summaries in its
       | API using this option?
        
         | phillypham wrote:
         | They removed the docs and support for it
         | https://github.com/googleapis/python-
         | genai/commit/af3b339a9d....
         | 
         | You can see the thoughts in AI Studio UI as per
         | https://ai.google.dev/gemini-api/docs/thinking#debugging-
         | and....
        
         | lemming wrote:
         | I maintain an alternative client which I build from the API
         | definitions at https://github.com/googleapis/googleapis, which
         | according to https://github.com/googleapis/python-
         | genai/issues/345 should be the right place. But neither the AI
         | Studio nor the Vertex definitions even have ThinkingConfig yet
         | - very frustrating. In general it's amazing how much API
         | munging is required to get a working client from the public API
         | definitions.
        
         | qwertox wrote:
         | In AI Studio the flash moddels has two toggles: Enable thinking
         | and Set thinking budget. If thinking budget is enabled, you can
         | set tue max number of tokens it can use to think, else it's
         | Auto.
        
         | Deathmax wrote:
         | It is gated behind the GOOGLE_INTERNAL visibility flag, which
         | only internal Google projects and Cursor have at the moment as
         | far as I know.
        
         | msp26 wrote:
         | The API won't give you the "thinking" tokens, those are only
         | visible on AI studio. Probably to try to stop distillation,
         | very disappointing. I find reading the cot to be incredibly
         | informative to identify failure modes.
         | 
         | > Hey Everyone,
         | 
         | > Moving forward, our team has made a decision to only show
         | thoughts in Google AI Studio. Meaning, we no longer return
         | thoughts via the Gemini API. Here is the updated doc to reflect
         | that.
         | 
         | https://discuss.ai.google.dev/t/thoughts-are-missing-cot-not...
         | 
         | ---
         | 
         | After I wrote all of that I see that the API docs page looks
         | different today and now says:
         | 
         | >Note that a summarized version of the thinking process is
         | available through both the API and Google AI Studio.
         | 
         | https://ai.google.dev/gemini-api/docs/thinking
         | 
         | Maybe they just updated it? Or people aren't on the same page
         | at Google idk
         | 
         | Previously it said
         | 
         | > Models with thinking capabilities are available in Google AI
         | Studio and through the Gemini API. Note that the thinking
         | process is visible within Google AI Studio but is not provided
         | as part of the API output.
         | 
         | https://web.archive.org/web/20250409174840/https://ai.google...
        
       | deanmoriarty wrote:
       | Genuine naive question: when it comes to Google HN has generally
       | a negative view of it (pick any random story on Chrome, ads,
       | search, web, working at faang, etc. and this should be obvious
       | from the comments), yet when it comes to AI there is a somewhat
       | notable "cheering effect" for Google to win the AI race that goes
       | beyond a conventional appreciation of a healthy competitive
       | landscape, which may appear as a bit of a double standard.
       | 
       | Why is this? Is it because OpenAI is seen as such a negative
       | player in this ecosystem that Google "gets a pass on this one"?
       | 
       | And bonus question: what do people think will happen to OpenAI if
       | Google wins the race? Do you think they'll literally just go
       | bust?
        
         | antirez wrote:
         | Maybe because Google is largely responsible, paying for the
         | research, of most of the results we are seeing now. I'm not a
         | Google fan, in the web side, and in their idea of what software
         | engineering is, but they deserve to win the AI race, because
         | right now all the other players provided a lot less than what
         | Google did as public research. Also, with Gemini 2.5 PRO, there
         | was a big hype moment, because the model is of unseen ability.
        
           | wkat4242 wrote:
           | Maybe they deserve it but it would be really bad for the
           | world. Because they will enshittify the hell out of it once
           | they're established. That's their MO.
           | 
           | I don't want Google to have a stranglehold over yet another
           | type of online service. So I avoid them.
           | 
           | And things are going so fast now, whatever Google has today
           | that might be better than the rest, in two months the rest
           | will have it too. Of course Google will have something new
           | again. But being 2 months behind isn't a huge deal. I don't
           | have to have the 'winning' product. In fact most of my AI
           | tasks go to an 8b llama 3.1 model. It's about on par with gpt
           | 3.5 but that's fine.
        
             | visarga wrote:
             | The situation with LLMs is much different than search,
             | Google doesn't have such a large lead here. LLMs are social
             | things, they learn from each other, any provider with SOTA
             | model will see its abilities leaked through synthetic
             | training data. That's what GPT-4 did for a year, against
             | the wishes of OpenAI, powering up millions of open model
             | finetunes.
        
         | 01100011 wrote:
         | Didn't Google invent the transformer?
         | 
         | I think a lot of us see Google as both an evil advertiser and
         | as an innovator. Google winning AI is sort of nostalgic for
         | those of us who once cheered the "Do No Evil"(now mostly "Do
         | Know Evil") company.
         | 
         | I also like how Google is making quiet progress while other
         | companies take their latest incremental improvement and promote
         | it as hard as they can.
        
         | pkaye wrote:
         | I think for a while some people felt the Google AI models are
         | worse but now its getting much better. On the other hand Google
         | has their own hardware so they can drive down the costs of
         | using the models so it keeps pressure on Open AI do remain cost
         | competitive. Then you have Anthropic which has very good models
         | but is very expensive. But I've heard they are working with
         | Amazon to build a data center with Amazons custom AI chips so
         | maybe they can bring down their costs. In the end all these
         | companies will need a good model and lower cost hardware to
         | succeed.
        
         | brap wrote:
         | I am cheering for the old Google to make a comeback and it
         | seems like the AI race has genuinely sparked something positive
         | inside Google.
        
         | wyre wrote:
         | Gemini is just that good. From my usage it is much smarter than
         | DeepSeek or Claude 3.7 Thinking models.
         | 
         | A lot of Google's market share across its services comes from
         | the monopoly effects Google has. The quality of Gemini 2.5 is
         | noticeably smarter than its competitors so I see the applause
         | for the quality of the LLM and not for Google.
         | 
         | I think it's way too early to say anything about who is winning
         | the race. There is still a long way to go; o3 scores highest in
         | Humanity's Last Exam (https://agi.safe.ai/) at 20%, 2.5 scores
         | 18%.
        
         | sothatsit wrote:
         | 2.5 Pro is free, and I'm sure there's a lot of people who have
         | just never tried the best models because they don't want to pay
         | for them. So 2.5 Pro probably blows their socks off.
         | 
         | Whereas, if you've been paying for access to the best models
         | from OpenAI and Anthropic all along, 2.5 Pro doesn't feel like
         | such a drastic step-change. But going from free models to 2.5
         | Pro is a crazy difference. I also think this is why DeepSeek
         | got so much attention so quickly - because it was free.
        
         | julianeon wrote:
         | It's been a while since they won something the "old" Google
         | way: by building a superior product that is #1 on its merits.
         | 
         | In that sense Gemini is a throwback: there's no trick - it's
         | objectively better than everything else.
        
         | sagarpatil wrote:
         | Most of us weren't using Gemini pro models (1.0, 1.5, 2.0) but
         | the recent 2.5 pro is such a huge step up. It's better than 3.7
         | sonnet for coding. Better than o1, o3-mini models and now o3
         | and o4-mini. It's become my daily driver. It does everything I
         | need with almost 100% accuracy, is cheap, fast, 1 million
         | context window, uses google web search for grounding, can fetch
         | YouTube video transcripts, can fetch website content, works in
         | google workspace: Gmail, Docs, Sheets. Really hard to beat this
         | combo. Oh and if you subscribe to their AI plan it comes with 2
         | TB drive storage.
        
         | oezi wrote:
         | The key is Gemini being free through AI Studio. This makes
         | their technical improvement more impressive when OpenAI sells
         | their best models at ridiculous prices.
         | 
         | If Google engages in price dumping as a monopolist remains to
         | be seen but it feels like it.
         | 
         | The LLM race is fast paced and no moat has developed. People
         | are switching on a whim if better models (by some margin) show
         | up. When will OpenAI, Anthropic or DeepSeek counter 2.5 Pro?
         | And will it be before Google releases the next Pro?
         | 
         | OpenAI commands a large chunk of the consumer market and they
         | have considerable funds after their last round. They won't fold
         | this or next year.
         | 
         | If Google wants to win this they must come up with a product
         | strategy integrating their search business without seriously
         | damaging their existing search business to much. This is hard.
        
         | int_19h wrote:
         | I dislike Google rather strongly due to their ad-based business
         | model, and I was previously very skeptical of their AI
         | offerings because of very lackluster performance compared to
         | OpenAI and Claude. But I can't help but be impressed with
         | Gemini Pro 2.5 for "deep research" and agentic coding. I have
         | subscriptions with all three so that I can keep up with SOTA,
         | but if I had to choose only one to keep, right now it'd be
         | Gemini.
         | 
         | That said I still don't "cheer" for them and I would really
         | rather someone else win the race. But that is orthogonal to
         | recognition of observed objective superiority.
        
         | greentea23 wrote:
         | I prefer OpenAI and Anthropic big time because they are fresh
         | players with less dominance over other aspects of digital life.
         | Not having to login to an insidious tracker like Google is
         | worth significantly worse performance. Although I have little
         | FOMO here avoiding Gemini because evaluating these models on
         | real world use cases remains quite subjective imo.
        
         | jonas21 wrote:
         | A lot of the negativity toward Google stems from the fact that
         | they're the big, dominant player in search, ads, browsers,
         | etc., rather than anything that they've done or any particular
         | attribute of the company.
         | 
         | In AI, they're still seen as being behind OpenAI and others, so
         | we don't see the same level of negativity.
        
         | summerlight wrote:
         | Because now it has brought real competitions to the field. GPT
         | was the king and Claude had been the only meaningful challenger
         | for a while but OpenAI didn't care about Anthropic but just be
         | obsessed with Google. Gemini took a quite time to set the
         | pipeline so initial version was not enough to push the
         | frontier; you remember the days when Google released a new
         | model, OpenAI just responded with some old models in their silo
         | within a day only to crush them. That does not happen anymore
         | and they're forced to develop a better model.
        
         | CephalopodMD wrote:
         | As a googler working in LLM space, this feels like revisionist
         | history to me haha! I remember a completely different
         | environment only a few months ago when Anthropic was the
         | darling child, and before that it was OpenAI (and for like 4
         | weeks somewhere in there, it was Deepseek). For literally years
         | at this point, every time Bard or Gemini would make a major
         | release, it would be largely ignored or put down in favor of
         | the next "big thing" OpenAI was doing or Claude saturating
         | coding benchmarks, never mind that Google was often just behind
         | with the exact same tech ready to go, in some cases only
         | missing their demo release by literally 1 day (remember live
         | voice?). And every time this happened, folks would be posting
         | things to the effect of "LOL I can't believe Google is losing
         | the AI race - didn't they invent this?", "this is like
         | Microsoft dropping the ball on mobile", "Google is getting
         | their lunch eaten by scrappy upstarts," etc. I can't lie, it
         | stings a bit when that's what you work on all day.
         | 
         | 2.5 was quite good. Not stupidly good like the jump from GPT 2
         | to 3 or 3.5 to 4, but really good. It was a big jump in ELO and
         | benchmarks. People like it, and I think it's just
         | psychologically satisfying that the player everybody would have
         | expected to win the AI race is currently in the lead. Gemini
         | finally gets a day in the sun.
         | 
         | I'm sure this will change with whenever somebody comes up with
         | the next big idea though. It probably won't take much to beat
         | Gemini in the long run. There is literally zero moat.
        
       | krembo wrote:
       | How is this sustainable for Google from business POV? It feels
       | like Google is shooting itself in the foot while "winning" the AI
       | race.. From my experience I think Google lost 99% of the ads it
       | used to show me before in the search engine.
        
         | tomr75 wrote:
         | someone else will do it if they don't
        
         | aoeusnth1 wrote:
         | Their inference costs are the lowest in the business.
        
       | zenGull wrote:
       | I've been paying for googles pro llm for about six months. At 20
       | it feels steep considering the free version is very good. I'm a
       | devops work, and it's been very helpful. Ive tried gpt, copilot,
       | Mixtral, Claude, etc and Geminis 1.5 pro was what sold me. The
       | new 2.0 stuff is even better. Anecdotally, Gemini seems to forget
       | to add stuff but doesn't hallucinate as much. I've been doing
       | some pretty complex scripting this last week purely on Gemini
       | fast 2.0 and it's been really really good.
        
       | jdthedisciple wrote:
       | Very excited to try it, but it _is_ noteworthy that o4-mini is
       | _strictly better_ according to the very benchmarks shown by
       | Google here.
       | 
       | Of course it's about 4x as expensive too (I believe), but still,
       | given the release of openai/codex as well, o4-mini will remain a
       | strong competitor for now.
        
       | thimabi wrote:
       | I find it baffling that Google offers such impressive models
       | through the API and even the free AI Studio with fine-grained
       | control, yet the models used in the Gemini app feel much worse.
       | 
       | Over the past few weeks, I've been using Gemini Advanced on my
       | Workspace account. There, the models think for shorter times,
       | provide shorter outputs, and even their context window is far
       | from the advertised 1 million tokens. It makes me think that
       | Google is intentionally limiting the Gemini app.
       | 
       | Perhaps the goal is to steer users toward the API or AI Studio,
       | with the free tier that involves data collection for training
       | purposes.
        
         | Alifatisk wrote:
         | Google lack marketing for ai studio, it has only recently
         | become widely known through word of mouth
        
           | thimabi wrote:
           | That does work in Google's favor. Users who are technical
           | enough to want a better model eventually learn about AI
           | Studio, while the rest are none the wiser.
        
         | _delirium wrote:
         | This might have changed after you posted your comment, but it
         | looks like 2.5 Pro and 2.5 Flash are available in the Gemini
         | app now, both web and mobile.
        
           | thimabi wrote:
           | Oh, I didn't mean to say that these models were unavailable
           | through the app or website. Rather, I've realized that using
           | them through the API or AI Studio yields much better results
           | -- even in the free tier.
           | 
           | You can check that by trying prompts with complex
           | instructions and long inputs/outputs.
           | 
           | For instance, ask Gemini to generate notes from a specific
           | source (say, a book or class transcription). Or ask it to
           | translate a long article, full of idiomatic expressions,
           | while maintaining high fidelity to the source. You will see
           | that the very same Gemini models are underutilized on the app
           | or the website, while their performance is stellar on the API
           | or AI Studio.
        
       | bingdig wrote:
       | It appears that this impacted gemini-2.5-pro-preview-03-25
       | somehow? grounding with google search no longer works.
       | 
       | I had a workflow running that would pull news articles from the
       | past 24 hours. It now refuses to believe the current date is
       | 2025-04-17. Even with search turned on and I ask it what the date
       | is it and it always replies sometime in July 2024.
        
       | Alifatisk wrote:
       | No matter how good the new Gemini models have become, my bad
       | experience with early Gemini is still stuck with me and I am
       | afraid I still suffer from confirmation bias. Whenever I just
       | look at the Gemini app, I already assume it's going to be a bad
       | experience.
        
       | thallavajhula wrote:
       | At this point, at the current pace of AI model development, I
       | feel like I can't tell which one is better. I usually end up
       | using multiple LLMs to get a task done to my taste. They're all
       | equally good and bad. It's like using GCP vs AWS vs Azure all
       | over again, except in the AI space.
        
       | simonw wrote:
       | An often overlooked feature of the Gemini models is that they can
       | write and execute Python code directly via their API.
       | 
       | My llm-gemini plugin supports that:
       | https://github.com/simonw/llm-gemini                 uv tool
       | install llm       llm install llm-gemini       llm keys set
       | gemini       # paste key here       llm -m gemini-2.5-flash-
       | preview-04-17 \         -o code_excution 1 \         'render a
       | mandelbrot fractal in ascii art'
       | 
       | I ran that just now and got this:
       | https://gist.github.com/simonw/cb431005c0e0535343d6977a7c470...
       | 
       | They don't charge anything extra for code execution, you just pay
       | for input and output tokens. The above example used 10 input,
       | 1,531 output which is $0.15/million for input and $3.50/million
       | output for Gemini 2.5 Flash with thinking enabled, so 0.536 cents
       | (just over half a cent) for this prompt.
        
         | blahgeek wrote:
         | > An often overlooked feature of the Gemini models is that they
         | can write and execute Python code directly via their API.
         | 
         | Could you elaborate? I thought function calling is a common
         | feature among models from different providers
        
           | WiSaGaN wrote:
           | This common feature requires the user of the API to implement
           | the tool, in this case, the user is responsible to run the
           | code the API outputs. The post you replied suggests that
           | Gemini will run the code for the user behind the API call.
        
             | tempoponet wrote:
             | That was how I read it as well, as if it had a built-in
             | lambda type service in the cloud.
             | 
             | If we're just talking about some API support to call python
             | scripts, that's pretty basic to wire up with any model that
             | supports tool use.
        
           | simonw wrote:
           | The Gemini API runs the Python code for you as part of your
           | single API call, without you having to handle the tool call
           | request yourself.
        
             | tempaccount420 wrote:
             | This is so much cheaper than re-prompting each tool use.
             | 
             | I wish this was extended to things like: you could give the
             | model an API endpoint that it can call to execute JS code,
             | and the only requirement is that your API has to respond
             | within 5 seconds (maybe less actually).
             | 
             | I wonder if this is what OpenAI is planning to do in the
             | upcoming API update to support tools in o3.
        
               | danpalmer wrote:
               | I imagine there wouldn't bd much of a cost to the
               | provider on the API call there so much longer times may
               | be possible. It's not like this would hold up the LLM in
               | any way, execution would get suspended while the call is
               | made and the TPU/GPU will serve another request.
        
               | suchar wrote:
               | They need to keep KV cache to avoid prompt reprocessing,
               | so they would need to move it to ram/nvme during longer
               | api calls to use gpu for another request
        
         | pantsforbirds wrote:
         | See a example full in a few commands using uv think "wow I bet
         | that Simon guy from twitter would love this" ... it's already
         | him.
        
         | throaway920181 wrote:
         | I wish Gemini could do this with Go. It generates plenty of
         | junk/non-parseable code and I have to feed it the error
         | messages and hope it properly corrects it.
        
       | lleymrl651 wrote:
       | good
        
       | djrj477dhsnv wrote:
       | Why are most comments here only comparing to Claude and just a
       | few to ChatGPT and none to Grok?
       | 
       | Grok 3 has been my main LLM since its release. Is it not as good
       | as I thought it was?
        
         | jofzar wrote:
         | IMO I will not use Grok while it's owned and related to Elon,
         | not only do I not trust their privacy and data usage (not that
         | I "really" trust open AI/Google etc) I just despise him.
         | 
         | It would have to be very significantly better for me to use it.
        
         | dyauspitr wrote:
         | Grok just isn't the best out there.
        
       | WiSaGaN wrote:
       | Interesting that the output price per 1M tokens is $0.6 for non-
       | reasoning, but $3.5 for reasoning. This seems to defy common
       | assumption of how reasoning models work, and you tweak the
       | <think> token probability to control how much thinking it does,
       | but underlying it's the same model and the same inference code
       | path.
        
       | michaelbrave wrote:
       | Yesterday I started working through How to design programs, and
       | set up a chat with Gemini 2.5 asking it to be my tutor as I go
       | through it and to help answer my questions if I don't understand
       | a part of the book. It has been knowledgeable, helpful and
       | capable of breaking down complex things that I couldn't
       | understand into understandable things. Fantastic all around.
        
       | zenkey wrote:
       | Google is totally back in the game now, but it's still going to
       | take a lot more for them at this point to overcome OpenAI's
       | "first-mover advantage" (clearly the favorite among younger users
       | atm).
        
         | sweca wrote:
         | Google Pixel marketing is doing wonders for Gemini in young
         | populations. I have been seeing a lot more of their phones in
         | my generation's hands.
        
       | sinuhe69 wrote:
       | I'm not familiar with Python internals, so when I tried to
       | convert a public AI model (not a LLM) to run locally, I got some
       | problems no other AI could help. Asked Gemini 2.5 and it pin
       | pointed the problem immediately. It solution was not practical
       | but I guess it also works.
        
       | ashu1461 wrote:
       | One place where I feel gemini models lag is function calling and
       | predicting correct arguments to function calls, is there a
       | benchmark which scores models on the basis of this ?
        
       | upmind wrote:
       | It's a shame that Gemini doesn't seem to have as much hype as
       | GPT, I hope they gain more market share.
        
       | menshiki wrote:
       | As a person mostly using AI for everyday tasks and business-
       | related research, it's very impressive how quickly they've
       | progressed. I would consider all models before 2.0 totally
       | unusable. Their web interface, however, is so much worse than
       | that of the ChatGPT macOS app.
        
         | gcbirzan wrote:
         | Some aren't even at 2.0, and the version numbers aren't related
         | in any way to their... generation? Also, what is so good about
         | the ChatGPT app, specifically on macOS that makes it better?
        
       | convivialdingo wrote:
       | Dang - Google finally made a quality model that doesn't make me
       | want to throw my computer out a window. It's honest, neutral and
       | clearly not trained by the ideologically rabid anti-bias but
       | actually super biased regime.
       | 
       | Did I miss a revolt or something in googley land? A Google model
       | saying "free speech is valuable and diverse opinions are good" is
       | frankly bizarre to see.
        
         | convivialdingo wrote:
         | Downvote me all you want - the fact remains that previous
         | Google models were so riddled with guardrails and political
         | correctness that it was practically impossible to use for
         | anything besides code and clean business data. Random text and
         | opinion would trigger a filter and shut down output.
         | 
         | Even this model criticizes the failures of the previous models.
        
           | tempaccount420 wrote:
           | Yes, something definitely changed. It's still a little
           | biased, it's kind of like OpenAI before Trump became
           | president.
        
       | camkego wrote:
       | The pricing table image in the article really should have
       | included Gemini 2.5 pro. Sure, it could be after Flash to the
       | right, but it would help people understand the price performance
       | benefits of 2.5 Flash.
        
       | wanderr wrote:
       | Gemini has the annoying habit of delegating tasks to me. Most
       | recently I was trying to find out how to do something in
       | FastRawViewer that I couldn't find a straightforward answer on.
       | After hallucinating a bunch of settings and menus that don't
       | exist, it told me to read the manual and check the user forums.
       | So much for saving me time.
        
       | hubraumhugo wrote:
       | You can get your HN profile analyzed and roasted by it. It's
       | pretty funny :) https://hn-wrapped.kadoa.com/
       | 
       | I'll add a selection for different models soon.
        
         | demaga wrote:
         | Didn't expect to be roasted by AI this morning. Nice one
        
         | Alifatisk wrote:
         | How is this relevant to Gemini 2.5 Flash? I guess it's using it
         | or something?
        
         | few wrote:
         | This is cool.
         | 
         | Does it only use a few recent comments or entire history? I'm
         | trying to figure out where it figured out my city when I
         | thought I was careful not to reveal it. I'm scrolling back
         | pages without finding where I said it in the past. Could it
         | have inferred it based on other information or hallucinated it?
         | 
         | I wonder if there's a more opsec-focused version of this.
        
         | x187463 wrote:
         | _Personal Projects_
         | 
         | Will finally implement that gravity in TTE, despite vowing not
         | to. We all know how well developers keep promises.
         | 
         |  _Knowledge Growth_
         | 
         | Will achieve enlightenment on the true meaning of
         | 'enshittification', likely after attempting to watch a single
         | YouTube video without Premium.
         | 
         | I found these actually funny. Cool project.
        
       | 131hn wrote:
       | If OpenAI offers Codex and Anthropic offers Claude Code, is there
       | a CLI integration that Google recommends for using Gemini 2.5?
       | That's what's keeping me, for now, with the other two.
        
       | yawaramin wrote:
       | I just asked "why is 'Good Friday' so called?" and it got stuck.
       | Flash 2.0 worked though.
        
       | uninformed-me00 wrote:
       | I want to think that this is all great, but the fact that this is
       | also one of the best way to collect unsuspecting user data by
       | default without explicit consent just doesn't feel right -- that
       | applies to most people who would never have a chance reading this
       | comment.
       | 
       | I don't want to be angry but screw these default opt-in to have
       | your privacy violated free stuff.
       | 
       | Before you jump in to say you can pay to keep your privacy, stop
       | and read again.
        
       | sgt wrote:
       | How are they able to remain so competitive and will it last? The
       | pricing almost seems too good to be true in terms of what they
       | claim you get.
        
         | sweca wrote:
         | Custom TPUs ftw
        
       | egorfine wrote:
       | I am always overlooking anything Google due to the fact that they
       | are the opposite of "Don't be evil" and because their developer's
       | console (Google Cloud) is incredibly hostile to humans.
       | 
       | Today I reluctantly clicked on their "AI Studio" link in the
       | press-release and I was pleasantly surprised to discover that AI
       | Studio has nothing in common with their typical UI/UX. It's nice
       | and I love it!
        
         | brap wrote:
         | To be fair the UX of all GCP/AWS/Azure is ass. If you don't
         | know exactly what you're looking for, good luck navigating that
         | mess.
        
       | techwiz137 wrote:
       | I had a heart attack moment thinking they were bringing some form
       | of Adobe Flash back.
        
       | latemedium wrote:
       | There's an important difference between Gemini and Claude that
       | I'm not sure how to quantify. I often use shell-connected LLMs
       | (LLMs with a shell tool enabled) to take care of basic CSV
       | munging / file-sorting tasks for me - I work in data science so
       | there's a lot of this. When I ask Claude to do something, it
       | carefully looks at all the directories and files before doing
       | anything. Gemini, on the other hand, blindly jumps in and just
       | starts moving stuff around. Claude executes more tools and is a
       | little slower, but it almost always gets the right answer because
       | it appropriately gathers the right context before really trying
       | to solve the problem. Gemini doesn't seem to do this at all, but
       | it makes a world of difference for my set of problems. Curious to
       | see if others have had the same experience or if its just a quirk
       | of my particular set of tasks
        
         | energy123 wrote:
         | What's a shell connected LLM and how to do that?
        
           | kmacdough wrote:
           | Look up Claude Code, Cursor, Aider and VSCode's agent
           | integration. Generally, tools to use AI more actively for
           | development. There are others as well. Plenty of info around.
           | Here's not the place for a tutorial.
        
       | rjurney wrote:
       | I am building a knowledge graph using BAML [baml-py] to extract
       | documents [it's opinionated towards docs] and then PySpark to ETL
       | the data into a node / edge list. GPT4o got few relations...
       | Gemini 2.5 got so many it was nuts, all accurate but not all from
       | the article! I had to reign it in and instruct it not to build so
       | vast a graph. Really cool, it knows a LOT about semiconductors :)
        
       | profsummergig wrote:
       | I tried this prompt in both Gemini 2.5 Pro, and in ChatGPT.
       | 
       | "Draw me a timeline of all the dynasties of China. Imagine a
       | horizontal line. Start from the leftmost point and draw segments
       | for the start and end of each dynasty. For periods where multiple
       | dynasties existed simultaneously draw parallel lines or boxes to
       | represent the concurrent rule."
       | 
       | Gemini's response: "I'm just a language model, so I can't help
       | you with that."
       | 
       | ChatGPT's response: an actual visual timeline.
        
         | renewiltord wrote:
         | All the communities where people think LLMs are junk love
         | Gemini. Makes me sceptical that the enthusiasm is useful
         | signal.
         | 
         | I found the full 2.0 useful for transcription of images. Very
         | good OCR. But not a good assistant. Stalls often and once it
         | has, loses context easily.
        
           | thegeomaster wrote:
           | Is it possible that a community of people who are constantly
           | pushing LLMs to their limits would be most aware of their
           | limitations, and so more inclined to think they are junk?
           | 
           | In terms of business utility, Google has had great releases
           | ever since the 2.0 family. Their models have never missed
           | _some_ mark --- either a good price /performance ratio,
           | insane speeds, novel modalities (they still have the only API
           | for autoregressive image generation atm), state-of-the-art
           | long context support and coding ability (Gemini 2.5), etc.
           | 
           | However, most average users are using these models through a
           | chat-like UI, or via generic tools like Cursor, which don't
           | really optimize their pipelines to capture the strengths of
           | different models. This way, it's very difficult to judge a
           | model objectively. Just look at the obscene sycophancy
           | exhibited by chatgpt-4o-latest and how it lifted LMArena
           | scores.
        
             | renewiltord wrote:
             | Just the fact that everyone on HN is always telling us how
             | LLMs are useless but that Gemini is the best of them
             | convinces me of the opposite. No one who can't find a use
             | for this technology is really informed on the subject. Hard
             | to take them seriously.
        
         | ncr100 wrote:
         | Worked for me in 2.5 Flash, text only:
         | 
         | https://g.co/gemini/share/bcc257f9b0a0
        
       | asim wrote:
       | I just wish the whole industry would stop using terms like
       | thinking and reasoning. This is not what's happening. If we could
       | come up with more appropriate terms that don't treat these models
       | like they're human then we'd be in a much better place. That
       | aside, it's cool to see the advancement of Google's offering.
        
         | dbbk wrote:
         | Thinking perhaps, but why not reasoning?
        
         | lyu07282 wrote:
         | Do you think any machine will ever be able to think and/or
         | reason? Or is that a uniquely human thing? and do you have a
         | rational standard to judge when something is reasoning or
         | thinking, or just vibes?
         | 
         | I'm asking because I wonder how much of that common attitude is
         | just a sort of species-chauvinism. You are feeling anxious
         | because machines getting smarter, you are feeling anger because
         | "they" are taking your job away, but the machine doesn't do
         | that, its people with an ideology that do that, you should be
         | angry at that instead.
        
       | aerhardt wrote:
       | I am only on OpenAI because they have a native Mac app. Call me
       | old-school but my preferred workflow is still for the most part
       | just asking narrow questions and copying-pasting back and forth.
       | I've been playing with Junie (Jetbrain's AI agent) for a couple
       | of days, but I still don't trust agents to run loose in my
       | codebase for any sizeable amount of work.
       | 
       | Does anyone know if Google is planning native apps? Or any
       | wrapping interfaces that work well on a Mac?
        
         | sweca wrote:
         | Raycast[0] has Gemini support in their AI offering and it's
         | native, fast and intuitive.
         | 
         | [0] https://raycast.com/ai
        
       | sweca wrote:
       | Honestly, the _best_ part about Gemini, especially as a consumer
       | product, is their super lax, or lack thereof, ratelimits. They
       | never have capacity issues, unlike Claude which always feels slow
       | or sometimes outright rejects requests during peak hours. Gemini
       | is constantly speedy and has extremely generous context window
       | limits on the Gemini apps.
        
         | onlyrealcuzzo wrote:
         | Interesting. I use Claude quite a bit, and haven't encountered
         | this.
         | 
         | Is this the free version of Claude or the paid version?
         | 
         | When are peak hours typically (in what timezone)?
        
       | bossyTeacher wrote:
       | Is everyone on here solely evaluating the models on their
       | programming capabilities? I understand this is HN but vibe coding
       | LLM tools won't be able to sustain the LLM industry (let's not
       | call it AI please)
        
       | barfingclouds wrote:
       | I just need the Gemini app to allow push to talk :( Otherwise
       | it's not usable for me in the way I want it to be
        
       ___________________________________________________________________
       (page generated 2025-04-18 23:01 UTC)