[HN Gopher] Gemini 2.5 Flash
___________________________________________________________________
Gemini 2.5 Flash
Author : meetpateltech
Score : 995 points
Date : 2025-04-17 19:03 UTC (1 days ago)
(HTM) web link (developers.googleblog.com)
(TXT) w3m dump (developers.googleblog.com)
| xnx wrote:
| 50% price increase from Gemini 2.0 Flash. That sounds like a lot,
| but Flash is still so cheap when compared to other models of this
| (or lesser) quality. https://developers.googleblog.com/en/start-
| building-with-gem...
| akudha wrote:
| Is this cheaper than DeepSeek? Am I reading this right?
| vdfs wrote:
| Only if you don't use reasoning
| Tiberium wrote:
| del
| Havoc wrote:
| You may want to consult Gemini on those percentage calcs .10
| to .15 is not 25%
| swyx wrote:
| done pretty much inline with the price elo pareto frontier
| https://x.com/swyx/status/1912959140743586206/photo/1
| xnx wrote:
| Love that chart! Am I imagining that I saw a version of that
| somewhere that even showed how the boundary has moved out
| over time?
| swyx wrote:
| https://x.com/swyx/status/1882933368444309723
|
| https://x.com/swyx/status/1830866865884991999 (scroll up)
| oezi wrote:
| So if I see it right flash 2.5 doesn't push the pareto front
| forward, right? It just sits between 2.5 pro and 2.0 flash.
|
| https://storage.googleapis.com/gweb-developer-goog-blog-
| asse...
| swyx wrote:
| yeah but 1) its useful to have the point there on the curve
| if you need it, 2) intelligence is multidimensional, maybe
| in 2.5 flash you get qualitatively a better set of
| capabilities for your needs than 2.5 pro
| onlyrealcuzzo wrote:
| Why isn't Phi-3, Llama 3, or Mistral in the comparison?
|
| Aren't there a lot of hosted options? How do they compare in
| terms of cost?
| byefruit wrote:
| It's interesting that there's a price nearly 6x price difference
| between reasoning and no reasoning.
|
| This implies it's not a hybrid model that can just skip reasoning
| steps if requested.
|
| Anyone know what else they might be doing?
|
| Reasoning means contexts will be longer (for thinking tokens) and
| there's an increase in cost to inference with a longer context
| but it's not going to be 6x.
|
| Or is it just market pricing?
| vineyardmike wrote:
| Based on their graph, it does look explicitly priced along
| their "Pareto Frontier" curve. I'm guessing that is guiding the
| price more than their underlying costs.
|
| It's smart because it gives them room to drop prices later and
| compete once other company actually get to a similar quality.
| jsnell wrote:
| > This implies it's not a hybrid model that can just skip
| reasoning steps if requested.
|
| It clearly is, since most of the post is dedicated to the
| tunability (both manual and automatic) of the reasoning budget.
|
| I don't know what they're doing with this pricing, and the blog
| post does not do a good job explaining.
|
| Could it be that they're not counting thinking tokens as output
| tokens (since you don't get access to the full thinking trace
| anyway), and this is the basically amortizing the thinking
| tokens spend over the actual output tokens? Doesn't make sense
| either, because then the user has no incentive to use anything
| except 0/max thinking budgets.
| RobinL wrote:
| Does anyone know how this pricing works? Supposing I have a
| classification prompt where I need the response to be a binary
| yes/no. I need one token of output, but reasoning will
| obviously add far more than 6 additional tokens. Is it still a
| 6x price multiplier? That doesn't seem to make sense, but not
| does paying 6x more for every token including reasoning ones
| coder543 wrote:
| "When you have thinking turned on, all output tokens
| (including thoughts) are charged at the $3.50 / 1M rate"[0]
|
| [0]: https://x.com/OfficialLoganK/status/1912981986085323231
| punkpeye wrote:
| This is cool, but rate limits on all of these preview models are
| PITA
| Layvier wrote:
| Agreed, it's not even possible to run an eval dataset. If
| someone from google see this please at least increase the burst
| rate limit
| punkpeye wrote:
| It is not without rate limits, but we do have elevated limits
| for our accounts through:
|
| https://glama.ai/models/gemini-2.5-flash-preview-04-17
|
| So if you just want to run evals, that should do it.
|
| Though the first couple of days after a model comes out are
| usually pretty rough because everyone try to run their evals.
| punkpeye wrote:
| What I am noticing with every new Gemini model that comes
| out is that the time to first token (TTFT) is not great. I
| guess it is because they gradually transfer computer power
| from old models to new models as the demand increases.
| Filligree wrote:
| If you're imagining that 2.5Pro gets dynamically loaded
| during the time to first token, then you're vastly
| overestimating what's physically possible.
|
| It's more likely a latency-throughput tradeoff. Your
| query might get put inside a large batch, for example.
| Layvier wrote:
| That's very interesting, thanks for sharing!
| arnaudsm wrote:
| Gemini flash models have the least hype, but in my experience in
| production have the best bang for the buck and multimodal
| tooling.
|
| Google is silently winning the AI race.
| belter wrote:
| > Google is silently winning the AI race.
|
| That is what we keep hearing here...The last Gemini I cancelled
| the account, and can't help notice the new one they are
| offering for free...
| arnaudsm wrote:
| Sorry I was talking of B2B APIs for my YC startup. Gemini is
| still far behind for consumers indeed.
| JeremyNT wrote:
| I use Gemini almost exclusively as a normal user. What am I
| missing out on that they are far behind on?
|
| It seems shockingly good and I've watched it get much
| better up to 2.5 Pro.
| arnaudsm wrote:
| Mostly brand recognition and the earlier Geminis had more
| refusals.
|
| As a consumer, I also really miss the Advanced voice mode
| of ChatGPT, which is the most transformative tech in my
| daily life. It's the only frontier model with true audio-
| to-audio.
| wavewrangler wrote:
| What do you mean miss? You don't have the budget to keep
| something you truly miss for $20? What am in missing here
| / I don't mean to criticize I am just curious is all. I
| would reword but I have to go
| what_ever wrote:
| What is true audio-to-audio in this case?
| jorvi wrote:
| > and the earlier Geminis had more refusals.
|
| Its more so that almost every company is running a
| classifier on their web chat's output.
|
| It isn't actually the model refusing, but rather if the
| classifier hits a threshold, it'll swap the model's out
| with "Sorry, let's talk about something else."
|
| This is most apparent with DeepSeek. If you use their web
| chat with V3 and then jailbreak it, you'll get uncensored
| output but it is then swapped with "Let's talk about
| something else" halfway through the output. And if you
| ask the model, it has no idea its previous output got
| swapped and you can even ask it build on its previous
| answer. But if you use the API, you can push it pretty
| far with a simple jailbreak.
|
| These classifiers are virtually always ran on a separate
| track, meaning you cannot jailbreak them.
|
| If you use an API, you only have to deal with the
| inherent training data bias, neutering by tuning and
| neutering by pre-prompt. The last two are, depending on
| the model, fairly trivial to overcome.
|
| I still think the first big AI company that has the guts
| to say "our LLM is like a pen and brush, what you write
| or draw with it is on you" and publishes a completely
| unneutered model will be the one to take a huge slice of
| marketshare. If I had to bet on anyone doing that, it
| would be xAI with Grok. And by not neutering it, the
| model will perform better in SFW tasks too.
| whistle650 wrote:
| Have you tried the Gemini Live audio-to-audio in the free
| Gemini iOS app? I find it feels far more natural than
| ChatGPT Advanced Voice Mode.
| Jensson wrote:
| > and the earlier Geminis had more refusals.
|
| You can turn off those, Google lets you decide how much
| it censors you can completely turn it off.
|
| It has separate sliders for sexually explicit, hate,
| dangerous and harassment. It is by far the best at this,
| since sometimes you want those refusals/filters.
| int_19h wrote:
| They used to be, but not anymore, not since Gemini Pro 2.5.
| Their "deep research" offering is the best available on the
| market right now, IMO - better than both ChatGPT and
| Claude.
| Layvier wrote:
| Absolutely. So many use cases for it, and it's so
| cheap/fast/reliable
| danielbln wrote:
| I want to use these almost too cheap to meter models like
| Flash more, what are some interesting use cases for those?
| SparkyMcUnicorn wrote:
| And stellar OCR performance. Flash 2.0 is cheaper and more
| accurate than AWS Textract, Google Document AI, etc.
|
| Not only in benchmarks[0], but in my own production usage.
|
| [0] https://getomni.ai/ocr-benchmark
| Fairburn wrote:
| Sorry, but no. Gemini isn't the fastest horse, yet. And it's
| use within their ecosystem means it isn't geared to the masses
| outside of their bubble. They are not leading the race but they
| are a contender.
| spruce_tips wrote:
| i have a high volume task i wrote an eval for and was
| pleasantly surprised at 2.0 flash's cost to value ratio
| especially compared to gpt4.1-mini/nano
|
| accuracy | input price | output price
|
| Gemini Flash 2.0 Lite: 67% | $0.075 | $0.30
|
| Gemini Flash 2.0: 93% | $0.10 | $0.40
|
| GPT-4.1-mini: 93% | $0.40 | $1.60
|
| GPT-4.1-nano: 43% | $0.10 | $0.40
|
| excited to to try out 2.5 flash
| jay_kyburz wrote:
| Can I ask a serious question. What task are you writing where
| its ok to get 7% error rate. I can't get my head around how
| this can be used.
| spruce_tips wrote:
| low stakes text classification but it's something that
| needs to be done and couldnt be done in reasonable time
| frames or at reasonable price points by humans
| omneity wrote:
| In my case, I have workloads like this where it's possible
| to verify the correctness of the result after inference, so
| any success rate is better than 0 as it's possible to
| identify the "good ones".
| nonethewiser wrote:
| Aren't you basically just saying you are able to measure
| the error rate? I mean that's good, but already a given
| in this scenario where hes reporting the 7% error rate.
| jsnell wrote:
| No. If you're able to verify correctness of individual
| items of work, you can accept the 93% of verified items
| as-is and send the remaining 7% to some more expensive
| slow path.
|
| That's very different from just knowing the aggregate
| error rate.
| yjftsjthsd-h wrote:
| No, it's anything that's harder to write than verify. A
| simple example is a logic puzzle; it's hard to come up
| with a solution, but once you have a possible answer it's
| really easy to check it. In fact, it can be easier to vet
| _multiple_ answers and tell the machine to try again than
| solve it once manually.
| 16bytes wrote:
| There are tons of AI/ML use-cases where 7% is acceptable.
|
| Historically speaking, if you had a 15% word error rate in
| speech recognition, it would generally be considered
| useful. 7% would be performing well, and <5% would be near
| the top of the market.
|
| Typically, your error rate just needs to be below the
| usefulness threshold and in many cases the cost of errors
| is pretty small.
| muzani wrote:
| I expect some manual correction after the work is done. I
| actually mentally counted all the times I pressed backspace
| while writing this paragraph, and it comes down to 45. I'm
| not counting the next paragraph or changing the number.
|
| Humans make a ton of errors as well. I didn't even notice
| how many I was making here until I started counting it. AI
| is super useful to just write get a first draft out, not
| for the final work.
| sroussey wrote:
| You could be OCRing a page that includes a summation line,
| then add up all the numbers and check against the sum.
| 42lux wrote:
| The API is free, and it's great for everyday tasks. So yes
| there is no better bang for the buck.
| drusepth wrote:
| Wait, the API is free? I thought you had to use their web
| interface for it to be free. How do you use the API for free?
| mlboss wrote:
| using aistudio.google.com
| spruce_tips wrote:
| create an api key and dont set up billing. pretty low rate
| limits and they use your data
| dcre wrote:
| You can get an API key and they don't bill you. Free tier
| rate limits for some models (even decent ones like Gemini
| 2.0 Flash) are quite high.
|
| https://ai.google.dev/gemini-api/docs/pricing
|
| https://ai.google.dev/gemini-api/docs/rate-limits#free-tier
| NoahZuniga wrote:
| The rate limits I've encountered with free api keys has
| been way lower than the limits advertised.
| jmacd wrote:
| I agree. I found it unusable for anything but casual
| usage due to the rate limiting. I wonder if I am just
| missing something?
| tempthrow wrote:
| I think it's the small TPM limits. I'll be way under the
| 10-30 requests per minute while using Cline, but it
| appears that the input tokens count towards the rate
| limit so I'll find myself limited to one message a minute
| if I let the conversation go on for too long, ironically
| due to Gemini's long context window. AFAIK Cline doesn't
| currently offer an option to limit the context explosion
| to lower than model capacity.
| nolok wrote:
| I'm pretty sure that's a google maps' level of free where
| once in control they will massively bill it
| dcre wrote:
| There is no reason to expect the other entrants in the
| market to drop out and give them monopoly power. The paid
| tier is also among the cheapest. People say it's because
| they built their own their inference hardware and are
| genuinely able to serve it cheaper.
| midasz wrote:
| I use Gemini 2.5 pro experimental via openrouter in my
| openwebui for free. Was using sonnet 3.7 but I don't notice
| much difference so just default to the free thing now.
| statements wrote:
| Absolutely agree. Granted, it is task dependent. But when it
| comes to classification and attribute extraction, I've been
| using 2.0 Flash with huge access across massive datasets. It
| would not be even viable cost wise with other models.
| sethkim wrote:
| How "huge" are these datasets? Did you build your own tooling
| to accomplish this?
| xnx wrote:
| Shhhh. You're going to give away the secret weapon!
| gambiting wrote:
| In my experience they are as dumb as a bag of bricks. The other
| day I asked "can you edit a picture if I upload one"
|
| And it replied "sure, here is a picture of a photo editing
| prompt:"
|
| https://g.co/gemini/share/5e298e7d7613
|
| It's like "baby's first AI". The only good thing about it is
| that it's free.
| JFingleton wrote:
| Prompt engineering is a thing.
|
| Learning how to "speak llm" will give you great results.
| There's loads of online resources that will teach you. Think
| of it like learning a new API.
| abletonlive wrote:
| for now. one would hope that this is a transitory moment in
| llms and that we can just use intuition in the future.
| asadotzler wrote:
| LLM's whole thing is language. They make great translators
| and perform all kinds of other language tasks well, but
| somehow they can't interpret my English language prompts
| unless I go to school to learn how to speak LLM-flavored
| English?
|
| WTF?
| pplante wrote:
| I like to think of my interactions with an LLM like I'm
| explaining a request to a junior engineer or non
| engineering person. You have to be more verbose to
| someone who has zero context in order for them to execute
| a task correctly. The LLM only has the context you
| provided so they fail hard like a junior engineer would
| at a complicated task with no experience.
| pplante wrote:
| I like to think of my interactions with an LLM like I'm
| explaining a request to a junior engineer or non
| engineering person. You have to be more verbose to
| someone who has zero context in order for them to execute
| a task correctly. The LLM only has the context you
| provided so they fail hard like a junior engineer would
| at a complicated task with no experience.
| JFingleton wrote:
| They are not humans - so yeah I can totally see having to
| "go to school" to learn how to interact with them.
| int_19h wrote:
| It's a natural language processor, yes. It's not AGI. It
| has numerous limitations that have to be recognized and
| worked around to make use of it. Doesn't mean that it's
| not useful, though.
| th0ma5 wrote:
| You have the right perspective. All of these people hand
| waving away the core issue here don't realize their own
| biases. Some of the best these things tout as much as 97%
| accuracy on tasks but if a person was completely randomly
| wrong at 3% of what they say you'd call an ambulance and
| no doctor would be able to diagnose their condition (the
| kinds of errors that people make with brain injuries are
| a major diagnostic tool and the kinds of errors are known
| for major types of common injuries ... Conversely there
| is no way to tell within an LLM system if any specific
| token is actually correct or not and its incorrectness is
| not even categorizable.)
| gambiting wrote:
| This was using Gemini on my phone - which both Samsung and
| Google advertise as "just talk to it".
| ghurtado wrote:
| > in my experience they are as dumb as a bag of bricks
|
| In my experience, anyone that describes LLMs using terms of
| actual human intelligence is bound to struggle using the
| tool.
|
| Sometimes I wonder if these people enjoy feeling "smarter"
| when the LLM fails to give them what they want.
| mdp2021 wrote:
| If those people are a subset of those who demand actual
| intelligence, they will very often feel frustrated.
| nowittyusername wrote:
| Its because google hasn't realized the value of training the
| model on information about its own capabilities and metadata.
| My biggest pet peeve about google and the way they train
| these models.
| rvz wrote:
| Google always has been winning the AI race as soon as DeepMind
| was properly put to use to develop their AI models, instead of
| the ones that built Bard (Google AI team).
| GaggiX wrote:
| Flash models are really good even for an end user because how
| fast and good performance they have.
| ghurtado wrote:
| I know it's a single data point, but yesterday I showed it a
| diagram of my fairly complex micropython program, (including
| RP2 specific features, DMA and PIO) and it was able to describe
| in detail not just the structure of the program, but also
| exactly what it does and how it does it. This is before seeing
| a single like of code, just going by boxes and arrows.
|
| The other AIs I have shown the same diagram to, have all
| struggled to make sense of it.
| redbell wrote:
| > Google is silently winning the AI race
|
| Yep, I agree! This convinced me:
| https://news.ycombinator.com/item?id=43661235
| ramesh31 wrote:
| >"Google is silently winning the AI race."
|
| It's not surprising. What was surprising honestly was how they
| were caught off guard by OpenAI. It feels like in 2022 just
| about all the big players had a GPT-3 level system in the works
| internally, but SamA and co. knew they had a winning hand at
| the time, and just showed their cards first.
| wkat4242 wrote:
| True and their first mover advantage still works pretty well.
| Despite "ChatGPT" being a really uncool name in terms of
| marketing. People remember it because they were the first to
| wow them.
| golergka wrote:
| It feels more authentically engineer-coded.
| kaoD wrote:
| How is ChatGPT bad in terms of marketing? It's recognizable
| and rolls off the tongue in many many many languages.
|
| Gemini is what sucks from a marketing perspective. Generic-
| ass name.
| simonw wrote:
| Generative Pre-trained Transformer is a horrible term to
| have an acronym for.
| kaoD wrote:
| Do you think the mass market thinks GPT is an acronym?
| It's just a name. Currently synonymous with AI.
|
| Ask anyone outside the tech bubble about "Gemini" though.
| You'll get astrology.
| wkat4242 wrote:
| True I guess they treat it just like SMS.
|
| I still think they'd have taken off more if they'd given
| it a catchy name from the start and made the interface a
| bit more consumer friendly.
| russellbeattie wrote:
| I have to say, I never doubted it would happen. They've been at
| the forefront of AI and ML for well over a decade. Their
| scientists were the authors of the "Attention is all you need"
| paper, among thousands of others. A Google Scholar search
| produces endless results. There just seemed to be a disconnect
| between the research and product areas of the company. I think
| they've got that worked out now.
|
| They're getting their ass kicked in court though, which might
| be making them much less aggressive than they would be
| otherwise, or at least quieter about it.
| Nihilartikel wrote:
| 100% agree. I had Gemini flash 2 chew through thousands of
| points of nasty unstructured client data and it did a 'better
| than human intern' level conversion into clean structured
| output for about $30 of API usage. I am sold. 2.5 pro
| experimental is a different league though for coding. I'm
| leveraging it for massive refactoring now and it is almost
| magical.
| jdthedisciple wrote:
| > thousands of points of nasty unstructured client data
|
| What I always wonder in these kinds of cases is: What makes
| you confident the AI actually did a good job since presumably
| you haven't looked at the thousands of client data yourself?
|
| For all you know it made up 50% of the result.
| golergka wrote:
| Many types of data have very easily checkable aggregates.
| Think accounting books.
| pamplemoose wrote:
| You take a sample and check
| tominous wrote:
| In my case I had hundreds of invoices in a not-very-
| consistent PDF format which I had contemporaneously tracked
| in spreadsheets. After data extraction (pdftotext + OpenAI
| API), I cross-checked against the spreadsheets, and for any
| discrepancies I reviewed the original PDFs and old bank
| statements.
|
| The main issue I had was it was surprisingly hard to get
| the model to consistently strip commas from dollar values,
| which broke the csv output I asked for. I gave up on prompt
| engineering it to perfection, and just looped around it
| with a regex check.
|
| Otherwise, accuracy was extremely good and it surfaced a
| few errors in my spreadsheets over the years.
| jofzar wrote:
| I hope there is a future where csv comma's don't screw up
| data. I know it will never happen but it's a nightmare.
|
| Everyone has a story of a csv formatting nightmare
| summerlight wrote:
| Though the same logic can be applied to everywhere, right?
| Even if it's done by human interns, you need to audit
| everything to be 100% confident or just have some trust on
| them.
| andrei_says_ wrote:
| Not the same logic because interns can make meaning out
| of the data - that's built-in error correction.
|
| They also remember what they did - if you spot one
| misunderstanding, there's a chance they'll be able to
| check all similar scenarios.
|
| Comparing the mechanics of an LLM to human intelligence
| shows deep misunderstanding of one, the other, or both -
| if done in good faith of course.
| summerlight wrote:
| Not sure why you're trying to conflate intellectual
| capability problems into this and complicate the
| argument? The problem layout is the same. You delegate
| the works to someone so you cannot understand all the
| details. This makes a fundamental tension between trust
| and confidence. Their parameters might be different due
| to intellectual capability, but whoever you're going to
| delegate, you cannot evade this trade-off.
|
| BTW, not sure if you have experiences of delegating some
| works to human interns or new grads and being rewarded by
| disastrous results? I've done that multiple times and
| don't trust anyone too much. This is why we typically
| develop review processes, guardrails etc etc.
| Nihilartikel wrote:
| For what it's worth, I did check over many hundreds of
| them. Formatted things for side by side comparison and
| ordered by some heuristics of data nastiness.
|
| It wasn't a one shot deal at all. I found the ambiguous
| modalities in the data and hand corrected examples to
| include in the prompt. After about 10 corrections and some
| exposition about the cases it seemed to misundestand, it
| got really good. Edit: not too different from a feedback
| loop with an intern ;)
| jofzar wrote:
| It also depends on what you are using the data for, if it's
| for non (precise) data based decisions then it's fine.
| Specially if you looking for "vibe" based decisions before
| then dedicating time to "actually" process the data for
| confirmation.
|
| 30$ to get an view into data that would take at least x
| many hours of someone's time is actually super cheap,
| specially if the decision of that result is then to invest
| or not invest the x many hours to confirm it.
| mediaman wrote:
| This was solved a hundred years ago.
|
| It's the same problem factories have: they produce a lot of
| parts, and it's very expensive to put a full operator or
| more on a machine to do 100% part inspection. And the
| machines aren't perfect, so we can't just trust that they
| work.
|
| So starting in the 1920s Walter Shewhart and Edward Deming
| came up with Statistical Process Control. We accept the
| quality of the product produced based on the variance we
| see of samples, and how they measure against upper and
| lower control limits.
|
| Based on that, we can estimate a "good parts rate" (which
| later got used in ideas like Six Sigma to describe the
| probability of bad parts being passed).
|
| The software industry was built on determinism, but now
| software engineers will need to learn the statistical
| methods created by engineers who have forever lived in the
| stochastic world of making physical products.
| thawawaycold wrote:
| I hope you're being sarcastic. SPC is necessary because
| mechanical parts have physical tolerances and
| manufacturing processes are affected by unavoidable
| statistical variations; it is beyond idiotic to be
| provided with a machine that can execute deterministic,
| repeatable processes and then throw that all into the
| gutter for mere convenience, justifying that simply
| because "the time is ripe for SWE to learn statistics"
| int_19h wrote:
| We don't know how to implement a "deterministic,
| repeatable process" that can look at a bug in a repo and
| implement a fix end-to-end.
| thawawaycold wrote:
| that is not what OP was talking about though.
| rorytbyrne wrote:
| LLMs are literally stochastic, so the point is the same
| no matter what the example application is.
| warkdarrior wrote:
| Humans are literally stochastic, so the point is the same
| no matter what the example application is.
| perching_aix wrote:
| The deterministic, repeatable process of human (and now
| machine) judgement and semantic processing?
| visarga wrote:
| In my professional opinion they can extract data at 85-95%
| accuracy.
| FooBarWidget wrote:
| You can use AI to verify its own work. Last time I split a
| C++ header file into header + implementation file. I
| noticed some code got rewritten in a wrong manner, so I
| asked it to compare the new implementation file against the
| original header file, but to do so one method at a time.
| For each method, say whether the code is exactly the same
| and has the same behavior, ignoring superficial syntax
| changes and renames. Took me a few times to get the prompt
| right, though.
| cdelsolar wrote:
| what tool are you using 2.5-pro-exp through? Cline? Or the
| browser directly?
| Nihilartikel wrote:
| For 2.5 pro exp I've been attaching files into AIStudio in
| the browser in some cases. In others, I have been using
| vscode's Gemini Code Assist which I believe recently
| started using 2.5 Pro. Though at one point I noticed that
| it was acting noticeably dumber, and over in the corner,
| sure enough it warned that it had reverted to 2.0 due to
| heavy traffic.
|
| For the bulk data processing I just used the python API and
| Jupyter notebooks to build things out, since it was a one-
| time effort.
| manmal wrote:
| Copilot experimental (need VSCode Insiders) has it. I've
| thought about trying aider ---watch-files though, also
| works with multiple files.
| roygbiv2 wrote:
| Isn't it better to get gemini to create a tool to format the
| data? Or was it in such a state that that would have been
| impossible?
| tcgv wrote:
| > I'm leveraging it for massive refactoring now and it is
| almost magical.
|
| Can you share more about your strategy for "massive
| refactoring" with Gemini?
|
| Like the steps in general for processing your codebase, and
| even your main goals for the refactoring.
| no_wizard wrote:
| I remember everyone saying its a two horse race between Google
| and OpenAI, then DeepSeek happened.
|
| Never count out the possibility of a dark horse competitor
| ripping the sod right out from under
| nonethewiser wrote:
| How is deepseak doing though? It seemed like they probably
| just ingested ChatGPT. https://www.forbes.com/sites/torconsta
| ntino/2025/03/03/deeps...
|
| Still impressive but would really put a cap on expectations
| for them.
| gs17 wrote:
| They supposedly have a new R2 model coming within a month.
| FooBarWidget wrote:
| Everybody else also trains on ChatGPT data, have you never
| heard of public ChatGPT conversation data sets? Yes they
| trained on ChatGPT data. No it's not "just".
| bhl wrote:
| It's cheap but also lazy. It sometimes generates empty strings
| or empty arrays for tool calls, and then I just re-route the
| request to a stronger model for the tool call.
|
| I've spent a lot of time on prompts and tool-calls to get Flash
| models to reason and execute well. When I give the same context
| to stronger models like 4o or Gemini 2.5 Pro, it's able to get
| to the same answers in less steps but at higher token cost.
|
| Which is to be expected: more guardrails for smaller, weaker
| models. But then it's a tradeoff; no easy way to pick which
| models to use.
|
| Instead of SQL optimization, it's now model optimization.
| paulcole wrote:
| > Google is silently winning the AI race.
|
| It's not clear to me what either the "race" or "winning" is.
|
| I use ChatGPT for 99% of my personal and professional use. I've
| just gotten used to the interface and quirks. It's a good
| consumer product that I like to pay $20/month for and use. My
| work doesn't require much in the way of monthly tokens but I
| just pay for the OpenAI API and use that.
|
| Is that winning? Becoming the de facto "AI" tool for consumers?
|
| Or is the race to become what's used by developers inside of
| apps and software?
|
| The race isn't to have the best model (I don't think) because
| it seems like the 3rd best model is very very good for many
| people's uses.
| transformi wrote:
| Bad day is going on google.
|
| First the decleration of illegal monopoly..
|
| and now... Google's latest innovation: programmable overthinking.
|
| With Gemini 2.5 Flash, you too can now set a thinking_budget--
| because nothing says "state-of-the-art AI" like manually capping
| how long it's allowed to reason. Truly the dream: debugging a
| production outage at 2am wondering if your LLM didn't answer
| correctly because you cheaped out on tokens. lol.
|
| "Turn thinking off for better performance." That's not a model
| config, that's a metaphor for Google's entire AI strategy lately.
|
| At this point, Gemini isn't an AI product--it's a latency-cost-
| quality compromise simulator with a text interface. Meanwhile,
| OpenAI and Anthropic are out here just... cooking the benchmarks
| danielbln wrote:
| Google's Gemini 2.5 pro model is incredibly strong, it's en par
| and at times better than Claude 3.7 in coding performance,
| being able to ingest entire videos into the context is
| something I haven seen elsewhere either. Google AI products
| have been anywhere between bad (Bard) to lackluster (Gemini
| 1.5), but 2.5 is a contender, in all dimensions. Google is also
| the only player that owns the entire stack, from research,
| software , data, compute hardware. I think they were slow to
| start but they've closed the gap since.
| bsmith wrote:
| Using AI to debug code at 2am sounds like pure insanity.
| mring33621 wrote:
| the new normal
| spiderice wrote:
| They're suggesting you'll be up at 2am debugging code because
| your AI code failed. Not that you'll be using AI to do the
| debugging.
| hmaxwell wrote:
| I did some testing this morning:
|
| Prompt: "can you find any mistakes on my codebase? I put one in
| there on purpose" + 70,000 tokens of codebase where in one line I
| have an include for a non-existent file.
|
| Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race
| condition in the api of the admin interface that would be
| triggered if two admins were to change the room order at the same
| time. Claude suggested I group all sql queries in a single
| transaction. I looked at the code and found that it already used
| a transaction for all queries. I said: the order_update api is
| already done with a transaction. Claude replied: "You're
| absolutely right, and I apologize for my mistake. I was incorrect
| to claim there was a race condition issue. The transaction
| ensures atomicity and consistency of the updates, and the SQL
| queries are properly structured for their intended purpose."
|
| Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin
| ui javascript code that suggested a potential alternative to
| event handler cleanup that was not implemented because I decided
| to go with a cleaner route. Then asked "Is this the issue you
| intentionally included, or would you like me to look for other
| potential problems?" I said: "The comment merely suggests an
| alternative, right?" claude said: "Yes, you're absolutely right.
| The comment is merely suggesting an alternative approach that
| isn't being used in the code, rather than indicating a mistake.
| So there's no actual bug or mistake in this part of the code -
| just documentation of different possible approaches. I apologize
| for misinterpreting this as an issue!"
|
| Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of
| the database to generate QR codes in the admin interface, Claude
| says that my code both attempts to generate QR codes with
| undefined data AS WELL AS saying that my error handling skips
| undefined data. Claude contradicts itself within 2 sentences.
| When asking about clarification Claude replies: Looking at the
| code more carefully, I see that the code actually has proper
| error handling. I incorrectly stated that it "still attempts to
| call generateQRCode()" in the first part of my analysis, which
| was wrong. The code properly handles the case when there's no
| data-room attribute.
|
| Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional
| error and said I should stop putting db creds/api keys into the
| codebase.
|
| Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional
| error and said I should stop putting db creds/api keys into the
| codebase.
|
| Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional
| error and said I should stop putting db creds/api keys into the
| codebase.
|
| o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you
| submitted was too long, please reload the conversation and submit
| something shorter."
| Tiberium wrote:
| The thread is about 2.5 Flash though, not 2.5 Pro. Maybe you
| can try again with 2.5 Flash specifically? Even though it's a
| small model.
| dyauspitr wrote:
| I don't particularly care about the non frontier models
| though, I found the comment very useful.
| airstrike wrote:
| Have you tried Claude Code?
| danielbln wrote:
| Those responses are very Claude, to. 3.7 has powered our
| agentic workflows for weeks, but I've been using almost only
| Gemini for the last week and feel the output is better
| generally. It's gotten much better at agentic workflows (using
| 2.0 in an agent setup was not working well at all) and I prefer
| its tuning over Clause's, more to the point and less
| meandering.
| rendang wrote:
| 3 different answers in 3 tries for Claude? Makes me curious how
| many times you'd get the same answer if you asked 10/20/100
| times
| bambax wrote:
| > _codebase where in one line I have an include for a non-
| existent file_
|
| Ok but you don't need AI for this; almost any IDE will issue a
| warning for that kind of error...
| fandorin wrote:
| how did you put your whole codebase in a prompt for gemini?
| Workaccount2 wrote:
| OpenAI might win the college students but it looks like Google
| will lock in enterprise.
| xnx wrote:
| ChatGPT seems to have a name recognition / first-mover
| advantage with college students now, but is there any reason to
| think that will stick when today's high school students are
| using Gemini on their Chromebooks?
| gundmc wrote:
| Funny you should say that. Google just announced today that
| they are giving all college students one year of free Gemini
| advanced. I wonder how much that will actually move the needle
| among the youth.
| Workaccount2 wrote:
| My guess is that they will use it and still call it
| "ChatGPT"...
| xnx wrote:
| Chat Gemini Pretrained Transformer
| tantalor wrote:
| Pass the Kleenex. Can I get a Band-Aid? Here's a Sharpie. I
| need a Chapstick. Let me Xerox that. Toss me that Frisbee.
| drob518 wrote:
| Exactly.
| esafak wrote:
| Do you prefer those brands or just use their names? I
| google stuff on Kagi...
| drob518 wrote:
| And every professor just groaned at the thought of having to
| read yet another AI-generated term paper.
| jay_kyburz wrote:
| They should just get AI to mark them. I genuinely think
| this is one thing AI would do better than humans.
| mdp2021 wrote:
| Grading papers definitely requires intelligence.
| jay_kyburz wrote:
| My partner marked a PHD thesis yesterday and there was a
| spelling mistake in the title.
|
| There is some level of analysis and feedback than an LLM
| could provide before a human reviews it. Even if it's
| just a fancy spelling checker.
| mdp2021 wrote:
| I'd like to burst into a post a number of the
| unbelievable akin mishandlings of academic tasks I was
| reported, but. I do have a number of prize-worthy
| anecdotes that compete with yours. Nonetheless. Let us
| fight farce with rigour.
|
| Even when the tasks are not in-depth, but easier to
| assess, you still require a /reliable evaluator/. LLMs
| are not. Could they be at least employed as a virtual
| assistant, "parse and suggest, then I'll check"? If so,
| not randomly ("pick a bot"), but in full awareness of the
| specific instrument. That stage is not here.
| bufferoverflow wrote:
| Take-home assignments are basically obsolete. Students who
| want to cheat, can do so easily. Of course, in the end,
| they cheat themselves, but that's not the point.
| anovick wrote:
| * Only in the U.S.
| superfrank wrote:
| Is there really lock in with AI models?
|
| I built a product that uses and LLM and I got curious about the
| quality of the output from different models. It took me a
| weekend to go from just using OpenAI's API to having Gemini,
| Claude, and DeepSeek all as options and a lot of that time was
| research on what model from each provider that I wanted to use.
| pydry wrote:
| For enterprise practically any SaaS gets used as one more
| thing to lock them into a platform they already have a
| relationship with (either AWS, GCP or Azure).
|
| It's actually pretty dangerous for the industry to have this
| much vertical integration. Tech could end up like the car
| industry.
| superfrank wrote:
| I'm aware of that. I'm an EM for a large tech company that
| sells multiple enterprise SaaS product.
|
| You're right that the lock in happens because of
| relationships, but most big enterprise SaaS companies have
| relationships with multiple vendors. My company
| relationships with AWS, Azure, and GCP and we're currently
| using products from all of them in different products. Even
| on my specific product we're using all three.
|
| When you've already got those relationships, the lock in is
| more about switching costs. The time it takes to switch,
| the knowledge needed to train people internally on the
| differences after the switch, and the actual cost of the
| new service vs the old one.
|
| With AI models the time to switch from OpenAI to Gemini is
| negligible and there's little retraining needed. If the
| Google models (now or in the future) are comparable in
| price and do a better job than OpenAI models, I don't see
| where the lock in is coming from.
| drob518 wrote:
| There isn't much of a lock-in, and that's part of the problem
| the industry is going to face. Everyone is spending gobs of
| money on training and if someone else creates a better one
| next week, the users can just swap it right in. We're going
| to have another tech crash for AI companies, similar to what
| happened in 2001 for .coms. Some will be winners but they
| won't all be.
| ein0p wrote:
| How will it lock in the enterprise if its market share of
| enterprise customers is half that of Azure (Azure also sells
| OpenAI inference, btw), and one third that of AWS?
| kccqzy wrote:
| The same reason why people enjoy BigQuery enough that their
| only use of GCP is BigQuery while they put their general
| compute spend on AWS.
|
| In other words, I believe talking about cloud market share as
| a whole is misleading. One cloud could have one product
| that's so compelling that people use that one product even
| when they use other clouds for more commoditized products.
| asadm wrote:
| funny thing about younglings, they will migrate to something
| else as fast as they came to you.
| drob518 wrote:
| I read about that on Facebook.
| Oras wrote:
| Enterprise has already been won by Microsoft (Azure), which
| runs on OpenAI.
| r00fus wrote:
| That isn't what I'm seeing with my clientele (lots of
| startups and mature non-tech companies). Most are using Azure
| but very few have started to engage AI outside the periphery.
| jimbob45 wrote:
| Came to say this. No respectable CTO would ever push a Google
| product to their superiors knowing Google will kill it in 1-3
| years and they'll look foolish for having pushed it.
| edaemon wrote:
| It seems more and more like AI is less of a product and more of
| a feature. Most people aren't going to care or even know about
| the model or the company who made it, they're just going to use
| the AI features built into the products they already use.
| esafak wrote:
| That's going to be true until we reach AGI, when there will
| be a qualitative difference and we will lose our ability to
| discern which is better since they're too far ahead of us.
| statements wrote:
| Interesting to note that this might be the only model with
| knowledge cut off as recent as 2025 January
| Tiberium wrote:
| Gemini 2.5 Pro has the same knowledge cutoff specified, but in
| reality on more niche topics it's still limited to ~middle of
| 2024.
| brightball wrote:
| Isn't Grok 3 basically real time now?
| Tiberium wrote:
| That's the web version (which has tools like search plugged
| in), other models in their official frontends (Gemini on
| gemini.google.com, GPT/o models on chatgpt.com) are also
| "real time". But when served over API, most of those models
| are just static.
| bearjaws wrote:
| No LLM is real time, and in fact, even a 2025 cut off isn't
| entirely realistic. Without guidance to say, a new version of
| a framework it will frequently "reference" documentation from
| old versions and use that.
|
| It's somewhat real time when it searches the web, of course
| that data is getting populated into context rather than in
| training.
| jiocrag wrote:
| Not at all. The model weights and training data remain the
| same, it's just RAG'ing real-time twitter data into its
| context window when returning results. It's like a worse
| version of Perplexity.
| flashblaze wrote:
| Why worse? Doesn't Grok also search the web along with
| Twitter?
| ein0p wrote:
| Absolutely decimated on metrics by o4-mini, straight out of the
| gate, and not even that much cheaper on output tokens (o4-mini's
| thinking can't be turned off IIRC).
| gundmc wrote:
| It's good to see some actual competition on this price range! A
| lot of Flash 2.5's edge will depend on how well the dynamic
| reasoning works. It's also helpful to have _significantly_
| lower input token cost for a large context use cases.
| rfw300 wrote:
| o4-mini does look to be a better model, but this is actually a
| lot cheaper! It's ~7x cheaper for both input and output tokens.
| ein0p wrote:
| These small models only make sense with "thinking" enabled.
| And once you enable that, much of the cost advantage
| vanishes, for output tokens.
| overfeed wrote:
| > These small models only make sense with "thinking"
| enabled
|
| This entirely depends on your use-cases.
| vessenes wrote:
| o4-mini costs 8x as much as 2.5 flash. I believe its useful
| context window is also shorter, although I haven't verified
| this directly.
| mccraveiro wrote:
| 2.5 flash with reasoning is just 20% cheaper than o4-mini
| vessenes wrote:
| Good point: reasoning costs more. Also impossible to tell
| without tests is how verbose the reasoning mode is
| mupuff1234 wrote:
| Not sure "decimated" is a fitting word for "slightly higher
| performance on some benchmarks".
| fwip wrote:
| Perhaps they were using the original meaning of "one-tenth
| destroyed." :P
| ein0p wrote:
| 66.8% error rate reduction for o4-mini on AIME2025, and 21%
| error rate reduction on MMMU isn't "slightly higher". It'll
| be quite noticeable in practice.
| kfajdsl wrote:
| Anecdotally o4-mini doesn't perform as well on video
| understanding tasks in our pipeline, and also in Cursor it
| seems really not great.
|
| During one session, it read the same file (same lines) several
| times, ran 'python -c 'print("skip!")'' for no reason, and then
| got into another file reading loop. Then after asking a
| hypothetical about the potential performance implications of
| different ffmpeg flags, it claimed that it ran a test and
| determined conclusively that one particular set was faster,
| even though it hadn't even attempted a tool call, let alone
| have the results from a test that didn't exist.
| xbmcuser wrote:
| For a non programmer like me google is becoming shockingly good.
| It is giving working code the first time. I was playing around
| with it asked it to write code to scrape some data of a website
| to analyse. I was expecting it to write something that would
| scrape the data and later I would upload the data to it to
| analyse. But it actually wrote code that scraped and analysed the
| data. It was basic categorizing and counting of the data but I
| was not expecting it to do that.
| kccqzy wrote:
| That's the opposite experience of my wife who's in tech but
| also a non programmer. She wanted to ask Gemini to write code
| to do some basic data analysis things in a more automated way
| than Excel. More than once, Gemini wrote a long bash script
| where some sed invocations are just plain wrong. More than once
| I've had to debug Gemini-written bash scripts. As a programmer
| I knew how bash scripts aren't great for readability so I told
| my wife to ask Gemini to write Python. It resulted in higher
| code quality, but still contained bugs that are impossible for
| a non programmer to fix. Sometimes asking a follow up about the
| bugs would cause Gemini to fix it, but doing so repeatedly will
| result in Gemini forgetting what's being asked or simply
| throwing an internal error.
|
| Currently IMO you have to be a programmer to use Gemini to
| write programs effectively.
| sbarre wrote:
| I've found that good prompting isn't just about asking for
| results but also giving hints/advice/direction on how to go
| about the work.
|
| I suspect that if Gemini is giving you bash scripts it's
| because you're note giving it enough direction. As you
| pointed out, telling it to use Python, or giving it more
| expectations about how to go about the work or how the output
| should be, will give better results.
|
| When I am prompting for technical or data-driven work, I tend
| to almost walk through what I imagine the process would be,
| including steps, tools, etc...
| xbmcuser wrote:
| I had similar experiences few months back that is why I am
| saying it is becoming shockingly good the 2.5 is a lot better
| than the 2.0 version. Another thing I have realized just like
| google search in the past your query has a lot to do with the
| results you get. So an example of what you want works at
| getting better results
| ac29 wrote:
| > I am saying it is becoming shockingly good the 2.5 is a
| lot better than the 2.0 version
|
| Are you specifically talking about 2.5 Flash? It only came
| out an hour ago, I dont know how you would have enough
| experience with it already to come to your conclusion.
|
| (I am very impressed with 2.5 Pro, but that is a different
| model that's been available for several weeks now)
| xbmcuser wrote:
| I am talking about 2.5 Pro
| 999900000999 wrote:
| Let's hope that's the case for a while.
|
| I want to be able to just tell chat GPT or whatever to create
| a full project for me, but I know the moment it can do that
| without any human intervention, I won't be able to find a
| job.
| drob518 wrote:
| IMO, the only thing that's consistent about AIs is how
| inconsistent they are. Sometimes, I ask them to write code
| and I'm shocked at how well it works. Other times, I feel
| like I'm trying to explain to a 5-year-old Alzheimer's
| patient what I want and it just can't seem to do the simplest
| stuff. And it's the same AI in both cases.
| greyadept wrote:
| I wouldn't be surprised if AI tools are frequently
| throttled in the backend to save on costs, resulting in
| this type of inconsistency.
| SweetSoftPillow wrote:
| It must have something to do with the way your wife is
| prompting. I've noticed this with my friends too. I usually
| get working code from Gemini 2.5 Pro on the first try, and
| with a couple of follow-up prompts, it often improves
| significantly, while my friends seem to struggle
| communicating their ideas to the AI and get worse results.
|
| Good news: Prompting is a skill you can develop.
| halfmatthalfcat wrote:
| Or we can just learn to write it ourselves in the same
| amount of time /shrug
| viraptor wrote:
| If you're going to need scripts like that every week -
| sure. If you need it once a year on average... not
| likely. There's a huge amount of things we could learn
| but do them so infrequently that we outsource it to other
| people.
| rgoulter wrote:
| Right.
|
| This is one case where I've found writing code with LLMs
| to be effective.
|
| With some unfamiliar tool I don't care about too much
| (e.g. GitHub Actions YAML or some build script), I just
| want it to work, & then focus on other things.
|
| I can spend time to try and come up with something that
| works; something that's robust & idiomatic.. but, likely
| I won't be able to re-use that knowledge before I forget
| it.
|
| With an LLM, I'll likely get just as good a result; or if
| not, will have a good starting point to go from.
| SweetSoftPillow wrote:
| You can't.
| halfmatthalfcat wrote:
| Not with that attitude.
| gregorygoc wrote:
| Is there a website with off the shelf prompts that work?
| Workaccount2 wrote:
| There is definitely an art to doing it, but the ability is
| definitely there even if you don't know the language at all.
|
| I have a few programs now that are written in Python (2 by
| 3.7, one by 2.5) used for business daily, and I can tell you
| I didn't, and frankly couldn't, check a single line of code.
| One of them is ~500 LOC, the other two are 2200-2700 LOC.
| yakz wrote:
| Ask it to write tests with the code and then ask it to fix
| the errors from the tests rather than just pointing out bugs.
| If you have an IDE that supports tool use (Claude Code, Roo
| Code) it can automate this process.
| jiggawatts wrote:
| The AIs like many things out there work like an "evil genie".
| They'll give you what you asked for. The problem is typically
| that users ask for the wrong thing.
|
| I've noticed beginners make mistakes like using singular
| terms when they should have used plural ("find the bug" vs
| "find the bugs"), or they fail to specify their preferred
| platform, language, or approach.
|
| You mentioned your wife is using Excel, which is primarily
| used on Windows desktops and/or with the Microsoft ecosystem
| of products such as Power BI, PowerShell, Azure, SQL Server,
| etc...
|
| Yet you mention she got a bash script using sed, both of
| which are from the Linux / GNU ecosystem. That implies that
| your wife didn't specify that she wanted a Microsoft-centric
| solution to her problem!
|
| The correct answer here would have likely to have been to use
| Microsoft Fabric, which is an entire bag of data analysis and
| reporting tools that has data pipelines, automation,
| publishing, etc...
|
| Or... just use the MashUp engine that's built-in to both
| Excel and PowerBI, which allows a surprisingly complex set of
| text, semi-structured, and tabular data processing. It can
| re-run the import and update graphs and charts with the new
| data.
|
| PS: This is similar to going up to a Node.js programmer with
| a request. It doesn't matter what it is, they will recommend
| writing JavaScript to solve the problem. Similarly, a C++
| developer will reach for C++ to solve everything they're
| asked to do. Right now, the AIs strongly prefer Linux,
| JavaScript, and especially Python for problem solving,
| because that's the bulk of the open-source code they were
| trained with.
| dmos62 wrote:
| Which Gemini was it? I've been using 2.5 Flash all day for
| programming ClojureScript via roo code and it's been great.
| Provided I'm using agent orchestration, a memory bank, and
| having it write docs for code it will work on.
| ant6n wrote:
| Last time I tried Gemini, it messed with my google photo data
| plan and family sharing. I wish I could try the AI separate
| from my Google account.
| jsnell wrote:
| > I wish I could try the AI separate from my Google account.
|
| If that's a concern, just create another account. Doesn't
| even require using a separate browser profile, you can be
| logged into multiple accounts at once and use the account
| picker in the top right of most their apps to switch.
| ModernMech wrote:
| I've been continually disappointed. I've been told it's getting
| exponentially better and we won't be able to keep up with how
| good they get, but I'm not convinced. I'm using them every
| single day and I'm never shocked or awed by its competence, but
| instead continually vexxed that isn't not living up to the hype
| I keep reading.
|
| Case in point: there was a post here recently about
| implementing a JS algorithm that highlighted headings as you
| scrolled (side note: can anyone remember what the title was? I
| can't find it again), but I wanted to test the LLM for that
| kind of task.
|
| Pretty much no matter what I did, I couldn't get it to give me
| a solution that would highlight all of the titles down to the
| very last one.
|
| I knew what the problem was, but even guiding the AI, it
| couldn't fix the code. I tried multiple AIs, different
| strategies. The best I could come up with was to guide it step
| by step on how to fix the code. Even telling it _exactly_ what
| the problem was, it couldn 't fix it.
|
| So this goes out to the "you're prompting it wrong" crowd...
| Can you show me a prompt or a conversation that will get an AI
| to spit out working code for this task: JavaScript that will
| highlighting headings as you scroll, to the very last one. The
| challenge is to prompt it to do this without telling it how to
| implement it.
|
| I figure this should be easy for the AI because this kind of
| thing is very standard, but maybe I'm just holding it wrong?
| jsnell wrote:
| Even as a human programmer I don't actually understand your
| description of the problem well enough to be confident I
| could correctly guess your intent.
|
| What do you mean by "highlight as you scroll"? I guess you
| want a single heading highlighted at a time, and it should be
| somehow depending on the viewport. But even that is
| ambiguous. Do you want the topmost heading in the viewport?
| The bottom most? Depending on scroll direction?
|
| This is what I got one-shot from Gemini 2.5 Pro, with my best
| guess at what you meant:
| https://gemini.google.com/share/d81c90ab0b9f
|
| It seems pretty good. Handles scrolling via all possible
| ways, does the highlighting at load too so that the
| highlighting is in effect for the initial viewport too.
|
| The prompt was "write me some javascript that higlights the
| topmost heading (h1, h2, etc) in the viewport as the document
| is scrolled in any way".
|
| So I'm thinking your actual requirements are very different
| than what you actually wrote. That might explain why you did
| not have much luck with any LLMs.
| ModernMech wrote:
| > Even as a human programmer I don't actually understand
| your description of the problem well enough to be confident
| I could correctly guess your intent.
|
| Yeah, you understand what I meant. The code Gemini gave you
| implements the behavior, and the AI I used gave me pretty
| much the same thing. There's a problem with the algorithm
| tho -- if there's a heading too close to the bottom of the
| page it will never highlight. The page doesn't exhibit the
| bug because it provides enough padding at the bottom.
|
| But my point wasn't that it couldn't one-shot the code; my
| point was that I couldn't interrogate it into giving me
| code that behaved as I wanted. It seemed too anchored to
| the solution it had provided me, where it said it was
| offering fixes that didn't do anything, and when I pointed
| that out it apologized and proceeded to lie about fixing
| the code again. It appeared to be an infinite loop.
|
| I think what's happened here is the opposite of what you
| suggest; this is a very common tutorial problem, you can
| find solutions of the variety you showed me all over the
| internet, and that's essentially what Gemini gave you. But
| being tutorial code, it's very basic and tries not to
| implement a more robust solution that is needed in
| production websites. When I asked AI for that extra
| robustness, it didn't want to stray too far from the
| template, and the bug persisted.
|
| Maybe you can coax it into getting a better result? I want
| to understand how.
| jsnell wrote:
| I clearly didn't understand what you meant, because you
| did in fact have additional unstated requirements that I
| could not even have imagined existed and were not in any
| way hinted at by your initial spec.
|
| And I still don't know what you want! Like, you want some
| kind of special case where the last heading is handled
| differently. But what kind of special case? You didn't
| specify. "It's wrong, fix it".
|
| Fix it how? When the page is scrolled all the way to the
| bottom, should the last heading always be highlighted?
| That would just move the complaint to the second heading
| from the bottom if three headings fit on the last screen.
| Add padding? Can't be that, since it's exactly what this
| solution already did and you thought it wasn't good
| enough.
|
| Sorry, I will not be playing another round of this. I
| don't know if you don't realize how inadequate your
| specifications are (in which case that's your problem
| with the LLMs too), or if this is some kind of a bit, but
| either way it doesn't feel like a good use of my time.
|
| But if your problem is that the LLMs give a bad initial
| answer, get anchored on it, and can't iterate, just give
| all the requirements up front. If they're requirements
| you didn't realize existed until you saw the proposed
| answer, just start again with a fresh context. That tends
| to work well for me in Aider.
| ModernMech wrote:
| > I clearly didn't understand what you meant, because you
| did in fact have additional unstated requirements
|
| Okay, but the AI understood the requirements; It
| confirmed to me what I intended it to do, but it couldn't
| produce code that met its textual descriptions. It kept
| returning the tutorial code.
|
| > You didn't specify. "It's wrong, fix it". Fix it how?
|
| Maybe I wasn't clear here but I'm not replying as if I'm
| prompting you like an AI. The problem domain is described
| better in the link in the sibling comment. When the AI
| gave back the initial code, I had inquired the following:
| how does this work when it gets to the bottom. Let's say
| I have a heading at the very bottom of the page. Will the
| "active" one be the bottom one or the top?
|
| It replied: What happens at the bottom of
| the page? When you're near the bottom of the page
| and the last section (section3, say) starts to enter the
| top of the viewport (within 100px), it becomes the active
| one -- exactly as you'd want. But here's the
| catch: if the last section is short, or you scroll all
| the way to the very bottom (past the heading), then no
| section's top may be <= 100px, especially if the last
| heading has already scrolled off the top. In that case,
| the last matched section (likely the one before the last)
| stays active, which feels wrong. We can fix
| this either by: Making the threshold smarter (e.g.,
| using .bottom or checking if any part of the section is
| visible). Or better: use IntersectionObserver to
| handle this robustly. Why this is better:
| It triggers when a section is entering the viewport
| (around 30% from top). Works even when you scroll
| to the very bottom. It's less sensitive to section
| height or viewport size.
|
| So it identified the edge case, it identified that the
| behavior is incorrect and what the cause of that is, and
| it returned code that purportedly fixed this. But the
| code it returned exhibited exactly the behavior it said
| "feels wrong". And in interrogating it. I asked it what
| was broken and we went line by line: Show
| me exactly what was changed from this to the last which
| fixed the problem Perfect -- let's do a line-
| by-line diff-style breakdown of what changed between the
| buggy version and the fixed version so we can pinpoint
| exactly what fixed the issue.
|
| We went line by line and it told me what exactly was
| wrong and why it's fixed, and confirmed that the provided
| code produced the expected behavior. Why
| this works: We evaluate all visible headings, not
| just ones intersecting a line. We pick the one
| that's: just above the activation line, or
| just below it, if none are above Handles edge cases
| like top/bottom of scroll
|
| But the code doesn't do this. It continued on like this
| where it proposed fixes, talked about the solution
| correctly, but wouldn't give code that implemented the
| solution.
|
| > But if your problem is that the LLMs give a bad initial
| answer, get anchored on it, and can't iterate, just give
| all the requirements up front. If they're requirements
| you didn't realize existed until you saw the proposed
| answer, just start again with a fresh context. That tends
| to work well for me in Aider.
|
| Yeah that's what I tend to do as well. I don't tend to
| get good satisfying results though, to the point where
| coding it myself seems like the faster more reliable
| option. I'll keep trying to hold it better and maybe one
| day it'll work for me. Until then I'm a skeptic.
| croemer wrote:
| "Overengineered anchor links":
| https://news.ycombinator.com/item?id=43570324
| ModernMech wrote:
| Thank you!!
| __alexs wrote:
| Does billing for the API actually work properly yet?
| alecco wrote:
| Gemini models are very good but in my experience they tend to
| overdo the problems. When I give it things for context and
| something to rework, Gemini often reworks the problem.
|
| For software it is barely useful because you want small commits
| for specific fixes not a whole refactor/rewrite. I tried many
| prompts but it's hard. Even when I give it function signatures of
| the APIs the code I want to fix uses, Gemini rewrites the API
| functions.
|
| If anybody knows a prompt hack to avoid this, I'm all ears.
| Meanwhile I'm staying with Claude Pro.
| byearthithatius wrote:
| Yes, it will add INSANE amounts of "robust error handling" to
| quick scripts where I can be confident about assumptions. This
| turns my clean 40 lines of Python where I KNOW the JSONL I am
| parsing is valid into 200+ lines filled with ten new try except
| statements. Even when I tell it not to do this, it loves to
| "find and help" in other ways. Quite annoying. But overall it
| is pretty dang good. It even spotted a bug I missed the other
| day in a big 400+ line complex data processing file.
| zhengyi13 wrote:
| I wonder how much of that sort of thing is driven by having
| trained their models on their own internal codebases? Because
| if that's the case, careful and defensive being the default
| would be unsurprising.
| stavros wrote:
| I didn't realize this was a bigger trend, I asked it to write
| a simple testing script that POSTed a string to a local HTTP
| server as JSON, and it wrote a 40 line script, handling any
| possible error. I just wanted two lines.
| jug wrote:
| Yes, as late as earlier today, I asked it to provide
| "naive" code which helped a bit.
| free_energy_min wrote:
| same issue here! isn't even helpful because if the code
| isn't working i want it to fail, not just skip over errors
| dherikb wrote:
| I have the same issue using it with Aider.
|
| The model is good to solve problems, but is very difficult to
| control the unnecessary changes that the model does in the rest
| of the code. Also it adds a lot of unnecessary comments, even
| when I explicitly say to not add.
|
| For now Deepseek R1 and V3 it's working better to me, producing
| more predictable results and capturing better my intentions
| (not tried Claude yet).
| w4yai wrote:
| Here's what I found to be working (not 100% but it gives much
| better and consistant results)
|
| Basically, I ask it to repeat at the start of each message some
| rules :
|
| "From now on, you must repeat and comply the following rules at
| the top of all your messages onwards:
|
| - I will never rewrite API functions. Even if I think it's a
| good idea, it is a bad idea. I will keep the API function as it
| is and it is perfect like that.
|
| - I will never add extra input validation. Even if I think it's
| a good idea, it is a bad idea. I will keep the function without
| validation and it is perfect like that.
|
| - ...
|
| - If I violate any of those rules, I did a bad job. "
|
| Forcing it to repeat things make the model output more aligned
| and focused in my experience.
| ks2048 wrote:
| If this announcement is targeting people not up-to-date on the
| models available, I think they should say what "flash" means. Is
| there a "Gemini (non-flash)"?
|
| I see the 4 Google model names in the chart here. Are these 4 the
| main "families" of models to choose from?
|
| - Gemini-Pro-Preview
|
| - Gemini-Flash-Preview
|
| - Gemini-Flash
|
| - Gemini-Flash-Lite
| mwest217 wrote:
| Gemini has had 4 families of models, in order of decreasing
| size:
|
| - Ultra
|
| - Pro
|
| - Flash
|
| - Flash-Lite
|
| Versions with `-Preview` at the end haven't had their "official
| release" and are technically in some form of "early access"
| (though I'm not totally clear on exactly what that means given
| that they're fully available and as of 2.5 Pro Preview, have
| pricing attached to them - earlier versions were free during
| Preview but had pretty strict rate limiting but now it seems
| that Preview models are more or less fully usable).
| drob518 wrote:
| Is GMail still in beta?
| mring33621 wrote:
| so Sigma...
| jsnell wrote:
| The free-with-small-rate-limits designator was
| "experimental", not "preview".
|
| I _think_ the distinction between preview and full release is
| that the preview models have no guarantees on how long they
| 'll be available, the full release comes with a pre-set
| discontinuation date. So if want the stability for a
| production app, you wouldn't want to use a preview model.
| AStonesThrow wrote:
| I've been leveraging the services of 3 LLMs, mainly: Meta,
| Gemini, and Copilot.
|
| It depends on what I'm asking. If I'm looking for answers in the
| realm of history or culture, religion, or I want something
| creative such as a cute limerick, or a song or dramatic script,
| I'll ask Copilot. Currently, Copilot has two modes: "Quick
| Answer"; or "Think Deeply", if you want to wait about 30 seconds
| for a good answer.
|
| If I want info on a product, a business, an industry or a field
| of employment, or on education, technology, etc., I'll inquire of
| Gemini.
|
| Both Copilot and Gemini have interactive voice conversation
| modes. Thankfully, they will also write a transcript of what we
| said. They also eagerly attempt to engage the user with further
| questions and followups, with open questions such as "so what's
| on your mind tonight?"
|
| And if I want to know about pop stars, film actors, the social
| world or something related to tourism or recreation in general, I
| can ask Meta's AI through [Facebook] Messenger.
|
| One thing I found to be extremely helpful and accurate was
| Gemini's tax advice. I mean, it was way better than human beings
| at the entry/poverty level. Commercial tax advisors, even when
| I'd paid for the Premium Deluxe Tax Software from the Biggest
| Name, they just went to Google stuff for me. I mean, they didn't
| even seem to know where stuff was on irs.gov. When I asked for a
| virtual or phone appointment, they were no-shows, with a litany
| of excuses. I visited 3 offices in person; the first two were
| closed, and the third one basically served Navajos living off the
| reservation.
|
| So when I asked Gemini about tax information -- simple stuff like
| the terminology, definitions, categories of income, and things
| like that -- Gemini was perfectly capable of giving lucid
| answers. And citing its sources, so I could immediately go find
| the IRS.GOV publication and read it "from the horse's mouth".
|
| Oftentimes I'll ask an LLM just to jog my memory or inform me of
| what specific terminology I should use. Like "Hey Gemini, what's
| the PDU for Ethernet called?" and when Gemini says it's a "frame"
| then I have that search term I can plug into Wikipedia for
| further research. Or, for an introduction or overview to topics
| I'm unfamiliar with.
|
| LLMs are an important evolutionary step in the general-purpose
| "search engine" industry. One problem was, you see, that it was
| dangerous, annoying, or risky to go Googling around and click on
| all those tempting sites. Google knew this: the dot-com sites and
| all the SEO sites that surfaced to the top were traps, they were
| bait, they were sometimes legitimate scams. So the LLM providers
| are showing us that we can stay safe in a sandbox, without
| clicking external links, without coughing up information about
| our interests and setting cookies and revealing our IPv6
| addresses: we can safely ask a local LLM, or an LLM in a trusted
| service provider, about whatever piques our fancy. And I am glad
| for this. I saw y'all complaining about how every search engine
| was worthless, and the Internet was clogged with blogspam, and
| there was no real information anymore. Well, perhaps LLMs, for
| now, are a safe space, a sandbox to play in, where I don't need
| to worry about drive-by-zero-click malware, or being inundated
| with Joomla ads, or popups. For now.
| cynicalpeace wrote:
| 1. The main transformative aspect of LLMs has been in writing
| code.
|
| 2. LLMs have had less transformative aspects in 2025 than we
| anticipated back in late 2022.
|
| 3. LLMs are unlikely to be very transformative to society, even
| as their intelligence increases, because intelligence is a minor
| changemaker in society. Bigger changemakers are motivation,
| courage, desire, taste, power, sex and hunger.
|
| 4. LLMs are unlikely to develop these more important traits
| because they are trained on text, not evolved in a rigamarole of
| ecological challenges.
| charcircuit wrote:
| 500 RPD for the free tier is good enough for my coding needs.
| Nice.
| AbuAssar wrote:
| I noticed that OpenAI don't compare their models to third party
| models in their announcement posts, unlike google, meta and the
| others.
| jskherman wrote:
| They're doing the Apple strategy. Less spotlight for other
| third parties, and less awareness how they're lagging behind so
| that those already ignorantly locked into OpenAI would not
| switch. But at this point why would anyone do that when
| switching costs are low?
| mmaunder wrote:
| More great innovation from Google. OpenAI have two major
| problems.
|
| The first is Google's vertically integrated chip pipeline and
| deep supply chain and operational knowledge when it comes to
| creating AI chips and putting them into production. They have a
| massive cost advantage at every step. This translates into more
| free services, cheaper paid services, more capabilities due to
| more affordable compute, and far more growth.
|
| Second problem is data starvation and the unfair advantage that
| social media has when it comes to a source of continually
| refreshed knowledge. Now that the foundational model providers
| have churned through the common crawl and are competing to
| consume things like video and whatever is left, new data is
| becoming increasingly valuable as a differentiator, and more
| importantly, as a provider of sustained value for years to come.
|
| SamA has signaled both of these problems when he made noises
| about building a fab a while back and is more recently making
| noises about launching a social media platform off OpenAI. The
| smart money among his investors know these issues to be
| fundamental in deciding if OAI will succeed or not, and are
| asking the hard questions.
|
| If the only answer for both is "we'll build it from scratch",
| OpenAI is in very big trouble. And it seems that that is the best
| answer that SamA can come up with. I continue to believe that
| OpenAI will be the Netscape of the AI revolution.
|
| The win is Google's for the taking, if they can get out of their
| own way.
| jbverschoor wrote:
| Except that they train their model even when you pay. So yeah..
| I'd rather not use their "evil"
| dayvigo wrote:
| Source?
| throwaway314155 wrote:
| It's right there in the comment.
| mkl wrote:
| This is false: https://ai.google.dev/gemini-api/terms
| Keyframe wrote:
| Google has the data and has the hardware, not to mention
| software and infrastructure talent. Once this Bismarck turns
| around and it looks like it is, who can parry it for real? They
| have internet.zip and all the previous versions as well, they
| have youtube, email, search, books, traffic, maps and business
| on it, phones and habits around it, even the OG social network,
| the usenet. It's a sleeping giant starting to wake up and it's
| already causing commotion, let's see what it does when it
| drinks morning coffee.
| kriro wrote:
| Agreed. One of Google's big advantages is the data access and
| integrations. They are also positioned really well for the
| "AI as entertainment" sector with youtube which will be huge
| (imo). They also have the knowledge in adtech and well
| injecting adds into AI is an obvious play. As is harvesting
| AI chat data.
|
| Meta and Google are the long term players to watch as Meta
| also has similar access (Insta, FB, WhatsApp).
| whoisthemachine wrote:
| On-demand GenAI could definitely change the meaning of
| "You" in "Youtube".
| eastbound wrote:
| They have the Excel spreadsheets of all startups and
| businesses of the world (well 50/50 with Microsoft).
|
| And Atlassian has all the project data.
| Keyframe wrote:
| I still can't understand how google missed on github,
| especially since they were in the same space before with
| google code. I do understand how they couldn't make a
| github though.
| jjani wrote:
| More like 5/95 with Microsoft - and that's being generous,
| I wouldn't be surprised if it was 1/99. It's basicaly just
| hip tech companies and a couple of Fortune 500s that use
| Google Docs. And even their finance departments often use
| Excel. HN keeps underestimating how the whole physical
| world runs on Excel.
| whyenot wrote:
| Another advantage that Google has is the deep integration of
| Gemini into Google Office products and Gmail. I was part of a
| pilot group and got to use a pre-release version and it's
| really powerful and not something that will be easy for OpenAI
| to match.
| mmaunder wrote:
| Agreed. Once they dial in the training for sheets it's going
| to be incredible. I'm already using notebooklm to upload
| finance PDFs, then having it generate tabular data and
| copypasta into sheets, but it's a garage solution compared to
| just telling it to create or update a sheet with parsed data
| from other sheets, PDFs, docs, etc.
|
| And as far as gmail goes, I periodically try to ask it to
| unsubscribe from everything marketing related, and not from
| my own company, but it's not even close to being there. I
| think there will continue to be a gap in the market for more
| aggressive email integration with AI, given how useless email
| has become. I know A16Z has invested in a startup working on
| this. I doubt Gmail will integrate as deep as is possible, so
| the opportunity will remain.
| Workaccount2 wrote:
| I frankly am in doubt of future office products. In the last
| month I have ditched two separate excel productivity
| templates in favor of bespoke wrappers on sqlite databases,
| written by Claude and Gemini. Easier to use and probably 10x
| as fast.
|
| You don't need a 50 function swiss army knife when your
| pocket can just generate the exact tool you need.
| jdgoesmarching wrote:
| You say deep integration, yet there is still no way to send a
| Gemini Canvas to Docs without a lot of tedious copy-pasting
| and formatting because Docs still doesn't actually support
| markdown. Gemini in Google Office in general has been a
| massive disappointment for all but the most simplistic of
| writing tasks.
|
| They can have the most advanced infrastructure in the world,
| but it doesn't mean much if Google continues its infamous
| floundering approach to product. But hey, 2.5 pro with Cline
| is pretty nice.
| whyenot wrote:
| Maybe I'm misunderstanding, but there is literally a Share
| button in Canvas right below each response with the option
| to export to Docs. Within Docs, you can also click on the
| Gemini "star" at the upper right to get a prompt and then
| also export into the open document. Note that this is a
| with "experimental" Gemini 2.5 Pro.
| disgruntledphd2 wrote:
| Docs supports markdown in comments, where it's the only way
| to get formatting.
|
| I love Googles product dysfunction sometimes :/
| chucky_z wrote:
| I have access to this now and I want it to work so bad and
| it's just proper shit. Absolute rubbish.
|
| They really, truly need to fix this integration. Gemini in
| Google Docs is barely acceptable, it doesn't work at all (for
| me) in Gmail, and I've not yet had it do _anything_ other
| than error in Google Sheets.
| zoogeny wrote:
| If the battle was between Altman and Pichai I'd have my doubts.
|
| But the battle is between Altman and Hassabis.
|
| I recall some advice on investment from Buffett regarding how
| he invests in the management team.
| mdp2021 wrote:
| Could you please expand, on both your points?
| zoogeny wrote:
| It is more gut feel than a rational or carefully reasoned
| argument.
|
| I think Pichai has been an exceptional revenue maximizer
| but he lacks vision. I think he is probably capable of
| squeezing tremendous revenue out of AI once it has been
| achieved.
|
| I like Hassabis in a "good vibe" way when I hear him speak.
| He reminds me of engineers that I have worked with
| personally and have gained my respect. He feels less like a
| product focused leader and more of a research focused
| leader (AlphaZero/AlphaFold) which I think will be critical
| to continue the advances necessary to push the envelope. I
| like his focus on games and his background in RL.
|
| Google's war chest of Ad money gives Hassabis the
| flexibility to invest in non-revenue generating directions
| in a way that Altman is unlikely to be able to do. Altman
| made a decision to pivot the company towards product which
| led to the exodus of early research talent.
| sumedh wrote:
| > Altman made a decision to pivot the company towards
| product which led to the exodus of early research talent.
|
| Who was going to fund the research though?
| zoogeny wrote:
| Fair point, and a good reminder not to pass judgement on
| the actions of others. It is totally possible that Altman
| made his own prediction of the future and theorized that
| the only hope he had of competing with the existing big
| tech companies to realistically achieve an AI for the
| masses was to show investors a path to profitability.
|
| I should also give Altman a bit more due in that I find
| his description of a world augmented by powerful AI to be
| more inspiring than any similar vision I've heard from
| Pichai.
|
| But I'm not trying to guess their intentions, I am just
| stating the situation as I see it. And that situation is
| one where whatever forces have caused it, OpenAI is
| clearly investing very heavily in product (e.g. windsurf
| acquisition, even suggesting building a social network).
| And that shift in focus seems highly correlated with a
| loss of significant research talent (as well as a healthy
| dose of boardroom drama).
| mmaunder wrote:
| Note sure why their comment was downvoted. Google the
| names. Hassabis runs DeepMind at Google which makes Gemini
| and he's quite brilliant and has an unbelievable track
| record. Buffet investing in teams points out that there are
| smart people out there that think good leadership is a good
| predictor of future success.
| zoogeny wrote:
| It may not be relevant to everyone, but it is worth
| noting that his contribution to AlpaFold won Hassabis a
| Nobel prize in chemistry.
| mdp2021 wrote:
| Zoogeny got downvoted? I did not do that. His comments
| deserved more details anyway (at the level of those
| kindly provided).
|
| > _Google the names_
|
| Was that a wink about the submission (a milestone from
| Google)? Read Zoogeny's delightful reply and see whether
| it can compare a search engine result (not to mention
| that I asked for Zoogeny's insight, not for trivia). And
| as a listener to Buffet and Munger, I can surely say that
| they rarely indulge in tautologies.
| zoogeny wrote:
| I wouldn't worry about downvotes, it isn't possible on HN
| to downvote direct replies to your message (unlike
| reddit), so you cannot be accused of downvoting me unless
| you did so using an alt.
|
| Some people see tech like they see sports teams and they
| vote for their tribe without considering any other
| reason. I'm not shy stating my opinion even when it may
| invite these kinds of responses.
|
| I do think it is important for people to "do their own
| research" and not take one man's opinion as fact. I
| recommend people watch a few videos of Hassabis, there
| are many, and judge his character and intelligence for
| themselves. They may find they don't vibe with him and
| genuinely prefer Altman.
| sidibe wrote:
| Sorry but my eyes rolled to the back of my head with this
| one. This is between two teams with tons of smart
| contributors, but the difference is one is more flexible and
| able to take risks vs the other that has many times more
| researchers and the world's best and most mature
| infrastructure/tooling. Its not a CEO vs CEO battle
| zoogeny wrote:
| I think it requires a nuanced take but allow me to provide
| some counter-examples.
|
| The first is CEO pay rates. Another is the highest paid
| public employees (which tend to be coaches at state
| schools). This is evidence that the market highly values
| managers.
|
| Another is systemic failures within enterprises. When
| Boeing had a few very public plane crashes, a certain
| narrative suggested that the transition from highly capable
| engineer managers to financial focus managers contributed
| to the problem. A similar narrative has been used to
| explain the decline of Intel.
|
| Consider the return of Steve Jobs to Apple. Or the turn
| around at Microsoft with Nadella.
|
| All of these are complex cases that don't submit to an easy
| analysis. Success and failure are definitely multi-factor
| and rarely can be traced to a single definitive cause.
|
| Perhaps another way to look at it would be: what percentage
| of the success of highly complex organizations can be
| attributed to management? To what degree can poor
| management decisions contribute to the failure of an
| otherwise capable organization?
|
| How much you choose to weight those factors is entirely up
| to you.
|
| edit: I was also thinking about the way we think about the
| advantage of exceptional generals/admirals in military
| analysis. Or the effect a president can have on the
| direction of a country.
| throwup238 wrote:
| Nobody has really talked about what I think is an advantage
| just as powerful as the custom chips: Google Books. They
| already won a landmark fair use lawsuit against book
| publishers, digitized more books than anyone on earth, and used
| their Captcha service to crowdsource its OCR. They've got the
| best* legal cover and all of the best sources of human
| knowledge already there. Then Youtube for video.
|
| The chips of course push them over the top. I don't know how
| much Deep Research is costing them but it's by far the best
| experience with AI I've had so far with a generous 20/day rate
| limit. At this point I must be using up at least 5-10 compute
| hours a _day_. Until about a week ago I had almost completely
| written off Google.
|
| * For what it's worth, I don't know. IANAL
| dynm wrote:
| The amount of text in books is surprisingly finite. My best
| estimate was that there are ~1013 tokens available in all
| books (https://dynomight.net/scaling/#scaling-data), which is
| less than frontier models are already being trained on. On
| the other hand, book tokens are probably much "better" than
| random internet tokens. Wikipedia for example seems to get
| much higher weight than other sources, and it's only ~3x1010
| tokens.
| dr_dshiv wrote:
| We need more books! On it...
| kupopuffs wrote:
| _opens up his favorite chat_
| paxys wrote:
| LibGen already exists, and all the top LLM publishers use it.
| I don't know if Google's own book index provides a big
| technical or legal advantage.
| disgruntledphd2 wrote:
| I'd be very surprised if the Google books index wasn't much
| bigger and more diverse than libgen.
| og_kalu wrote:
| Anna's Archive is at 43M Books and 98M Papers [1]. The
| book total is nearly double what Google has.
|
| Google's scanning project basically stalled after the
| legal battle. It's a very fascinating read [2].
|
| [1] https://annas-archive.org/
|
| [2] https://web.archive.org/web/20170719004247/https://ww
| w.theat...
| jofzar wrote:
| Something that is not specifically called out but is also
| super relevant is actually the transcription of YouTube
| videos.
|
| Every video is machine transcribed and stored and then for
| larger videos the author will often transcribed them
| themselves.
|
| This is something they have already, it doesn't need any more
| "work" to get it vs a competitor.
| jppittma wrote:
| I would think the biggest advantage is YouTube. There's a lot
| of modern content for analysis that's uncontaminated by LLMs.
| peterjliu wrote:
| another advantage is people want the Google bot to crawl their
| pages, unlike most AI companies
| mmaunder wrote:
| This is an underrated comment. Yes it's a big advantage and
| probably a measurable pain point for Anthropic and OpenAI. In
| fact you could just do a 1% survey of robots.txt out there
| and get a reasonable picture. Maybe a fun project for an
| HN'er.
| jiocrag wrote:
| Excellent point. If they can figure out how to either
| remunerate or drive traffic to third parties in conjunction
| with this, it would be huge.
| newfocogi wrote:
| This is right on. I work for a company with somewhat of a
| data moat and AI aspirations. We spend a lot of time blocking
| everyone's bots except for Google. We have people whose
| entire job is it to make it faster for Google to access our
| data. We exist because Google accesses our data. We can't not
| let them have it.
| CobrastanJorji wrote:
| Reddit was an interesting case here. They knew that they had
| particularly good AI training data, and they were able to
| hold it hostage from the Google crawler, which was an awfully
| high risk play given how important Google search results are
| to Reddit ads, but they likely knew that Reddit search
| results were also really important to Google. I would love to
| be able to watch those negotiations on each side; what a
| crazy high stakes negotiation that must've been.
| mattlondon wrote:
| Particularly good training data?
|
| You can't mean the bottom-of-the-barrel dross that people
| post on Reddit, so not sure what data you are referring to?
| Click-stream?
| CobrastanJorji wrote:
| Say what you will, but there's a lot of good answers to
| real questions people have that's on Reddit. There's a
| whole thing where people say "oh Google search results
| are bad, but if you append the word 'REDDIT' to your
| search, you'll get the right answer." You can see that
| most of these agents rely pretty heavily from stuff they
| find on Reddit.
|
| Of course, that's also a big reason why Google search
| results suggest putting glue on pizza.
| stefan_ wrote:
| I don't know man, for months now people keep telling me on HN
| how "Google is winning", yet no normal person I ever asked
| knows what the fuck "Gemini" is. I don't know what they are
| winning, it might be internet points for all I know.
|
| Actually, some of the people polled recalled the Google AI
| efforts by their expert system recommending glue on pizza and
| smoking in pregnancy. It's a big joke.
| mmaunder wrote:
| Try uploading a bunch of PDF bank statements to notebooklm
| and ask it questions. Or the results of blood work. It's jaw
| dropping. e.g. uploaded 7 brokerage account statements as
| PDFs in a mess of formats and asked it to generate table
| summary data which it nailed, and then asked it to generate
| actual trades to go from current position to a new position
| in shortest path, and it nailed that too.
|
| Biggest issue we have when using notebooklm is a lack of
| ambition when it comes to the questions we're asking. And the
| pro version supports up to 300 documements.
|
| Hell, we uploaded the entire Euro Cyber Resilience Act and
| asked the same questions we were going to ask our big name
| legal firm, and it nailed every one.
|
| But you actually make a fair point, which I'm seeing too and
| I find quite exciting. And it's that even among my early
| adopter and technology minded friends, adoption of the most
| powerful AI tools is very low. e.g. many of them don't even
| know that notebookLM exists. My interpretation on this is
| that it's VERY early days, which is suuuuuper exciting for us
| builders and innovators here on HN.
| kube-system wrote:
| While there are some first-party B2C applications like chat
| front-ends built using LLMs, once mature, the end game is
| almost certainly that these are going to be B2B products
| integrated into other things. The future here goes a lot
| further than ChatGPT.
| shmoogy wrote:
| That was ages ago.
|
| Their new models excel at many things. Image editing, parsing
| PDFs, and coding are what I use it for. It's significantly
| cheaper than the closest competing models (Gemini 2.5 pro,
| and flash experimental with image generation).
|
| Highly recommend testing against openai and anthropic models
| - you'll likely be pleasantly surprised.
| labrador wrote:
| > If the only answer for both is "we'll build it from scratch",
| OpenAI is in very big trouble
|
| They could buy Google+ code from Google and resurrect it with
| OpenAI branding. Alternately they could partner with Bluesky
| parsimo2010 wrote:
| I don't think the issue is solving the technical
| implementation of a new social media platform. The issue is
| whether a new social media platform from OpenAI will deliver
| the kind of value that existing platforms deliver. If they
| promise investors that they'll get TikTok/Meta/YouTube levels
| of content+interaction (and all the data that comes with it),
| but deliver Mastodon levels, then they are in trouble.
| onlyrealcuzzo wrote:
| > The smart money among his investors know these issues to be
| fundamental in deciding if OAI will succeed or not, and are
| asking the hard questions.
|
| OpenAI has already succeeded.
|
| If it ends up being a $100B company instead of a $10T company,
| that is success. By a very large margin.
|
| It's hard to imagine a world in which OpenAI just goes bankrupt
| and ends up being worth nothing.
| bdangubic wrote:
| it goes bankrupt when the cost of running the business
| outweights the earnings in the long run
| samuel wrote:
| I can, and I would say it's a likely scenario, say 30%. If
| they don't have a significant edge over their competitors in
| the capabilities of their models, what's left? A money losing
| web app, and some API services that I'm sure aren't very
| profitable either. They can't compete with Google, Grok,
| Meta, MS, Amazon... They just can't.
|
| They can end being the Altavista of this era.
| dyauspitr wrote:
| I haven't heard this much positive sentiment about Google in a
| while. Making something freely available really turns public
| sentiment around.
| mark_l_watson wrote:
| Nice! Low price, even with reasoning enabled. I have been working
| on a short new book titled "Practical AI with Google: A Solo
| Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs" but
| with all of Google's recent announcements it might not be a short
| book.
| serjester wrote:
| Just ran it on one of our internal PDF (3 pages, medium
| difficulty) to json benchmarks:
|
| gemini-flash-2.0: 60 ish% accuracy 6,250 pages per dollar
|
| gemini-2.5-flash-preview (no thinking): 80 ish% accuracy 1,700
| pages per dollar
|
| gemini-2.5-flash-preview (with thinking): 80 ish% accuracy (not
| sure what's going on here) 350 pages per dollar
|
| gemini-flash-2.5: 90 ish% accuracy 150 pages per dollar
|
| I do wish they separated the thinking variant from the regular
| one - it's incredibly confusing when a model parameter
| dramatically impacts pricing.
| ValveFan6969 wrote:
| I have been having similar performance issues, I believe they
| intentionally made a worse model (Gemini 2.5) to get more money
| out of you. However, there is a way where you can make money
| off of Gemini 2.5.
|
| If you set the thinking parameter lower and lower, you can make
| the model spew absolute nonsense for the first response. It
| costs 10 cents per input / output, and sometimes you get a
| response that was just so bad your clients will ask for more
| and more corrections.
| mpalmer wrote:
| Wow, what apps have you made so I know never to use them?
| zoogeny wrote:
| Google making Gemini 2.5 Pro (Experimental) free was a big deal.
| I haven't tried the more expensive OpenAI models so I can't even
| compare, only to the free models I have used of theirs in the
| past.
|
| Gemini 2.5 Pro is so much of a step up (IME) that I've become
| sold on Google's models in general. It not only is smarter than
| me on most of the subjects I engage with it, it also isn't
| completely obsequious. The model pushes back on me rather than
| contorting itself to find a way to agree.
|
| 100% of my casual AI usage is now in Gemini and I look forward to
| asking it questions on deep topics because it consistently
| provides me with insight. I am building new tools with the mind
| to optimize my usage to increase it's value to me.
| PerusingAround wrote:
| This comment is exactly my experience, I feel like as if I had
| wrote it myself.
| cjohnson318 wrote:
| Yeah, my wife pays for ChatGPT, but Gemini is fine enough for
| me.
| qwertox wrote:
| Just be aware that if you don't add a key (and set up
| billing) youre granting Google the right to train on your
| data. To have persons read them and decide how to use them
| for training.
| energy123 wrote:
| I thought if you turn off App Activity then that's good
| enough to protect your data?
| voxic11 wrote:
| Nope, not if you are in the US
| https://ai.google.dev/gemini-api/terms#data-use-unpaid
| Graphon1 wrote:
| > To have persons read them and decide how to use them for
| training.
|
| Not that I have any actual insight. but doesn't it seem
| more likely that it will not be a human, but a model?
| Models training models.
| qwertox wrote:
| > To help with quality and improve our products, human
| reviewers may read, annotate, and process your API input
| and output. Google takes steps to protect your privacy as
| part of this process. This includes disconnecting this
| data from your Google Account, API key, and Cloud project
| before reviewers see or annotate it. Do not submit
| sensitive, confidential, or personal information to the
| Unpaid Services.
| HDThoreaun wrote:
| Unless you have the enterprise sub of openAI theyre
| training on your data too
| dr_kiszonka wrote:
| I was a big fan of that model but it has been replaced in AI
| Studio by its preview version, which, by comparison, is pretty
| bad. I hope Google makes the release version much closer to the
| experimental one.
| zoogeny wrote:
| I can confirm the model name in Run Settings has been updated
| to "Gemini 2.5 Pro Preview ..." when it used to be "Gemini
| 2.5 Pro (Experimental) ...".
|
| I cannot confirm if the quality is downgraded since I haven't
| had enough time with it. But if what you are saying is
| correct, I would be very sad. My big fear is the full-fat
| Gemini 2.5 Pro will be prohibitively expensive, but a dumbed
| down model (for the sake of cost) would also be saddening.
| dieortin wrote:
| The preview version is exactly the same as the experimental
| one afaik
| gundmc wrote:
| The AI Studio product lead said on Twitter that it is exactly
| the same model just renamed for clarity when pricing was
| announced
| jeeeb wrote:
| After comparing Gemini Pro and Claude Sonnet 3.7 coding answers
| side by side a few times, I decided to cancel my Anthropic
| subscription and just stick to Gemini.
| wcarss wrote:
| Google has killed so many amazing businesses -- entire
| industries, even, by giving people something expensive for
| free until the competition dies, and then they enshittify
| hard.
|
| It's cool to have access to it, but please be careful not to
| mistake corporate loss leaders for authentic products.
| JPKab wrote:
| True. They are ONLY good when they have competition. The
| sense of complacency that creeps in is so obvious as a
| customer.
|
| To this day, the Google Home (or is it called Nest now?)
| speaker is the only physical product i've ever owned where
| it lost features over time. I used to be able to play the
| audio of a Youtube video (like a podcast) through it, but
| then Google decided that it was very very important that I
| only be able to play a Youtube video through a device with
| a screen, because it is imperative that I see a still image
| when I play a longform history podcast.
|
| Obviously, this is a silly and highly specific example, but
| it is emblematic of how they neglect or enshittify massive
| swathes of their products as soon as the executive team
| loses interest and puts their A team on some shiny new
| object.
| bitpush wrote:
| The experience on Sonos is terrible. There are countless
| examples of people sinking 1000s of dollars into Sonos
| ecosystem, and the new app update has rendered them
| useless.
| nl wrote:
| It's mostly fixed now (5 room Sonos setup here). It's
| also a lot better at not dropping speakers off its
| network
| average_r_user wrote:
| I'm experiencing the same problem with my Google Home
| ecosystem. One day I can turn off the living room lights
| with the simple phrase "Turn off Living Room Lights," and
| then randomly for two straight days it doesn't understand
| my command
| freedomben wrote:
| Preach it my friend. For years on the Google Home Hub (or
| Nest Hub or whatever) I could tell it to "favorite my
| photo" of what is on the screen. This allowed me to
| incrementally build a great list of my favorite photos on
| Google Photos and added a ton of value to my life. At
| some point that broke, and now it just says, "Sorry, I
| can't do that yet". Infuriating
| mark_l_watson wrote:
| In this case, Google is a large investor in Anthropic.
|
| I agree that giving away access to expensive models long
| term is not a good idea on several fronts. Personally, I
| subscribe to Gemini Advanced and I pay for using the Gemini
| APIs.
|
| EDIT: a very good deal, at $10/month is
| https://apps.abacus.ai/chatllm/ that gives you access to
| almost all commercial models as well as the best open
| weight models. I have never come close at all to using my
| monthly credits with them. If you like to experiment with
| many models the service is a lot of fun.
| F7F7F7 wrote:
| The problem with tools like this is that somewhere in the
| chain between you and the LLM are token reducing
| "features". Whether it's the system prompt, a cheaper LLM
| middleman, or some other cost saving measure.
|
| You'll never know what that something is. For me, I can't
| help but think that I'm getting an inferior service.
| revnode wrote:
| You can self host something like https://big-agi.com/ and
| grab your own keys from various providers. You end up
| with the above, but without the pitfalls you mentioned.
| mark_l_watson wrote:
| BIG-AI does look cool, and supports a different use case.
| ABACUS.AI takes your $10/month and gives you credits that
| go towards their costs of using OpenAI, Anthropic,
| Gemini, etc. Use of smaller open models use very few
| credits.
|
| The also support an application development framework
| that looks interesting but I have never used it.
| mark_l_watson wrote:
| You might be correct about cost savings techniques in
| their processing pipeline. But they also add
| functionality: they bake web search into all models which
| is convenient. I have no affiliation with ABACUS.AI, I am
| just a happy customer. They currently let me play with 25
| models.
| freedomben wrote:
| If anyone from Kagi is on, I'd love to know, does Kagi do
| that?
| bredren wrote:
| (Public) corporate loss leaders? Cause they are all likely
| corporate.
|
| Also, Anthropic is also subsidizing queries, no? The new
| "5x" plan illustrative of this?
|
| No doubt anthropic's chat ux is the best right now, but it
| isn't so far ahead on that or holding some UX moat that I
| can tell.
| pdntspa wrote:
| The usage limit for experimental gets used up pretty fast
| in a vibe-coding situation. I found myself setting up an
| API account with billing enabled just to keep going.
| gexla wrote:
| It's not free. And it's legit one of the best models. And
| it was a Google employee who was among the authors of the
| paper that's most recognized as kicking all this off. They
| give somewhat limited access in AIStudio (I have only hit
| the limits via API access, so I don't know what the chat UI
| limits are.) Don't they all do this? Maybe harder limits
| and no free API access. But I think most people don't even
| know about AIStudio.
| bossyTeacher wrote:
| Just look at Chrome to see the bard/gemini's future. HN
| folks didn't care about Chrome then but cry about Google's
| increasingly hostile development of Chrome.
|
| Look at Android.
|
| HN behaviour is more like a kid who sees the candy, wants
| the candy and eats as much as it can without worrying about
| the damaging effect that sugar will have on their health.
| Then, the diabetes diagnosis arrives and they complain
| lxgr wrote:
| How would I know if it's useful to me without being able to
| trial it?
|
| Googles previous approach (Pro models available only to
| Gemini Advanced subscribers, and Advanced trials can't be
| stacked with Google One paid storage, or rather they
| convert the already paid storage portion to a _paid_ , much
| shorter Advanced subscription!) was mind-bogglingly stupid.
|
| Having a free tier on all models is the reasonable option
| here.
| blueyes wrote:
| One of the main advantages Anthropic currently has over
| Google is the tooling that comes with Claude Code. It may not
| generate better code, and it has a lower complexity ceiling,
| but it can automatically find and search files, and figure
| out how to fix a syntax error fast.
| bayarearefugee wrote:
| As another person that cancelled my Claude and switched to
| Gemini, I agree that Claude Code is very nice, but beyond
| some initial exploration I never felt comfortable using it
| for real work because Claude 3.7 is far too eager to
| overengineer half-baked solutions that extend far beyond
| what you asked it to do in the first place.
|
| Paying real API money for Claude to jump the gun on
| solutions invalidated the advantage of having a tool as
| nice as Claude Code, at least for me, I admit everyone's
| mileage will vary.
| roygbiv2 wrote:
| I wanted some powershell code to do some sharepoint
| uploading. It created a 1000 line logging module that
| allowed me to log things at different levels like info,
| debug, error etc. Not really what I wanted.
| neuah wrote:
| Exactly my experience as well. Started out loving it but
| it almost moves too fast - building in functionality that
| i might want eventually but isn't yet appropriate for
| where the project is in terms of testing, or is just in
| completely the wrong place in the architecture. I try to
| give very direct and specific prompts but it still has
| the tendency to overreach. Of course it's likely that
| with more use i will learn better how to rein it in.
| Hugsun wrote:
| I've experienced this a lot as well. I also just
| yesterday had an interesting _argument_ with claude.
|
| It put an expensive API call inside a useEffect hook. I
| wanted the call elsewhere and it fought me on it pretty
| aggressively. Instead of removing the call, it started
| changing comments and function names to say that the call
| was just loading already fetched data from a cache (which
| was not true). I could not find a way to tell it to
| remove that API call from the useEffect hook, It just
| wrote more and more motivated excuses in the surrounding
| comments. It would have been very funny if it weren't so
| expensive.
| freedomben wrote:
| Geez, I'm not one of the people who think AI is going to
| wake up and wipe us out, but experiences like yours do
| give me pause. Right now the AI isn't in the drivers seat
| and can only assert itself through verbal expression, but
| I know it's only a matter of time. We already saw Cursor
| themselves get a taste of this. To be clear I'm not
| suggesting the AI is sentient and malicious - I don't
| believe that at all. I think it's been
| trained/programmed/tuned to do this, though not
| intentionally, but the nature of these tools is they will
| surprise us
| arrowsmith wrote:
| > We already saw Cursor themselves get a taste of this.
|
| Sorry what do you mean by this?
| tempoponet wrote:
| Earlier this week a Cursor AI support agent told a user
| they could only use Cursor on one machine at a time,
| causing the user to cancel their subscription.
| Jensson wrote:
| > but the nature of these tools is they will surprise us
|
| Models used to do this much much more than now, so what
| it did doesn't surprise us.
|
| The nature of these tools is to copy what we have already
| written. It has seen many threads where developers argue
| and dig in, they try to train the AI not to do that but
| sometimes it still happens and then it just roleplays as
| the developer that refuses to listen to anything you say.
| btbuildem wrote:
| "Don't be a keener. Do not do anything I did not ask you
| to do" are def part of my prompts when using Claude
| Sonnigeszeug wrote:
| Whats your setup/workflow then?
|
| Any ide integration?
| tough wrote:
| Open Codex (A codex fork) that supports gemini and
| openrouter providers https://github.com/ymichael/open-
| codex
|
| google models on cli are great.
| WiSaGaN wrote:
| Also the "project" feature in claude improves experience
| significantly for coder, where you can customize your
| workflow. Would be great if gemini has this feature.
| energy123 wrote:
| Google need to fix their Gemini web app at a basic level.
| It's slow, gets stuck on Show Thinking, rejects 200k token
| prompts that are sent one shot. Aistudio is in much better
| shape.
| Graphon1 wrote:
| But have you tried any other interfaces for Gemini? Like
| the Gemini Code Assistant in VSCode? Or Gemini-backed
| Aider?
| roygbiv2 wrote:
| Have you tried them? Which one is fairly simple but just
| works?
| johnisgood wrote:
| I hate how I can copy paste long text into Claude
| (becomes a pasted text) and it is accepted, but in Gemini
| it is limited.
| Workaccount2 wrote:
| You can paste it in a text file and upload that. A little
| annoying compared to claude, but does work.
| johnisgood wrote:
| Thanks, will give it a try.
| xbmcuser wrote:
| Uploading files on google is now great. I uploaded my
| python script and the text data files I was using the
| script to process. I asked it how best to optimize the
| code. It actually ran the python code on the data files.
| Then recommended changes then when prompted ran the
| script again to show the new results. At first I was like
| maybe hallucinating but no the data was correct.
| johnisgood wrote:
| Yeah "they" run Python code now quite well. They generate
| some output using Python "internally" (albeit shows you
| the code).
| shrisukhani wrote:
| +1 on this. Improving Gemini apps and live mode will go
| such a long way for them. Google actually has the best
| model line-up now but the apps and APIs hold them back so
| much.
| mogili wrote:
| I use roo code with Gemini to get similar results for free
| ssd532 wrote:
| Does its agentic features work with any API? I had tried
| this or Cline and it was clear that they work effectively
| only with Claude's tooling support.
| igor47 wrote:
| I've switched to aider with the --watch-files flag. Being
| able to use models in nvim with no additional tooling is
| pretty sweet
| mediaman wrote:
| That's really cool. I've been looking for a nicer
| solution to use with nvim.
| aitchnyu wrote:
| Typing `//use this as reference ai` in one file and
| `//copy this row to x ai!` and it will add those
| functions/files to context and act on both places.
| Altough I wish Aider would write `working on your
| request...` under my comment, now I have to keep Aider
| window in sight. Autocomplete and "add to context" and
| "enter your instructions" of other apps feel clunky.
| julianeon wrote:
| Related:
|
| Only Claude (to my knowledge) has a desktop app which can
| directly, and usually quite intelligently, modify files and
| create repos on your desktop. It's the only "agentic"
| option among the major players.
|
| "Claude, make me an app which will accept Stripe payments
| and sell an ebook about coding in Python; first create the
| app, then the ebook."
|
| It would take a few passes but Claude could do this;
| obviously you can't do that with an API alone. That
| capability alone is worth $30/month in my opinion.
| indexerror wrote:
| OpenAI just released Codex, which is basically the same
| as Claude Code.
| hiciu wrote:
| It looks the same, but for some reason Claude Code is
| much more capable. Codex got lost in my source code and
| hallucinated bunch of stuff, Claude on the same task just
| went to town, burned money and delivered.
|
| Of course, this is only my experience and codex is still
| very young. I really hope it becomes as capable as
| Claude.
| rockwotj wrote:
| Part of it is probably tgat claude is just better at
| coding than what openai has available. I am considering
| trying to hack in support for gemini into codex and play
| around with it.
| lytedev wrote:
| I was doing this last night with open-codex, a fork.
| https://github.com/ymichael/open-codex
| thrdbndndn wrote:
| Copilot agent mode?
| xvinci wrote:
| Maybe I am not understanding something here.
|
| But there are third party options availabe that to the
| very same thing (e.g. https://aider.chat/ ) which allow
| you to plug in a model (or even a combination thereof
| e.g. deepseek as architect and claude as code writer) of
| your choice.
|
| Therefore the advantage of the model provider providing
| such a thing doesn't matter, no?
| jm547ster wrote:
| Aider is not agentic - it is interactive by design.
| Copilot agent mode and Cline would better comparisons.
| tough wrote:
| OpenAI launched codex 2 days ago, there's open forks
| already that support other providers too
|
| there's also claude code proxy's to run it on local llm's
|
| you can just do things
| int_19h wrote:
| A first party app, sure, but there's no shortage of third
| party options. Cursor, Windsurf/Codeium etc. Even VSCode
| has agent mode now.
| dingnuts wrote:
| > first create the app, then the ebook."
|
| > It would take a few passes but Claude could do this;
|
| I'm sorry but absolutely nothing I've seen from using
| Claude indicates that you could give it a vague prompt
| like that and have it actually produce anything worth
| reading.
|
| Can it output a book's worth of bullshit with that
| prompt? Yes. But if you think "write a book about Python"
| is where we are in the state of the art in language
| models in terms of the prompt you need to get a coherent
| product, I want some of whatever you are smoking because
| that has got to be the good shit
| vladmdgolam wrote:
| There are at least 10 projects currently aiming to recreate
| Claude Code, but for Gemini. For example, geminicodes.co by
| NotebookLM's founding PM Raiza Martin
| mrinterweb wrote:
| I don't understand the appeal of investing in leaning and
| adapting your workflow to use an AI tool that is so tightly
| coupled to a single LLM provider, when there are other
| great AI tools available that are not locked to a single
| LLM provider. I would guess aider is the closest thing to
| claude code, but you can use pretty much any LLM.
|
| The LLM field is moving so fast that what is the leading
| frontier model today, may not be the same tomorrow.
|
| Pricing is another important consideration.
| https://aider.chat/docs/leaderboards/
| smallnamespace wrote:
| All the AI tools end up converging on a similar workflow:
| type what you want and interrupt if you're not getting
| what you want.
| mdhb wrote:
| Firebase Studio is the Google equivalent
| mamp wrote:
| I've been using Gemini 2.5 and Claude 3.7 for Rust
| development and I have been very impressed with Claude, which
| wasn't the case for some architectural discussions where
| Gemini impressed with it's structure and scope. OpenAI 4.5
| and o1 have been disappointing in both contexts.
|
| Gemini doesn't seem to be as keen to agree with me so I find
| it makes small improvements where Claude and OpenAI will go
| along with initial suggestions until specifically asked to
| make improvements.
| yousif_123123 wrote:
| I have noticed Gemini not accepting an instruction to
| "leave all other code the same but just modify this part"
| on a code that included use of an alpha API with a
| different interface than what Gemini knows is the correct
| current API. No matter how I promoted 2.5 pro, I couldn't
| get it to respect my use of the alpha API, it would just
| think I must be wrong.
|
| So I think patterns from the training data are still
| overriding some actual logic/intelligence in the model. Or
| the Google assistant fine-tuning is messing it up.
| Workaccount2 wrote:
| I have been using gemini daily for coding for the last
| week, and I swear that they are pulling levers and A/B
| testing in the background. Which is a very google thing
| to do. They did the same thing with assistant, which I
| was a pretty heavy user of back in the day (I was driving
| a lot).
| onlyrealcuzzo wrote:
| Yes, IME, Anthropic seemed to be ahead of Google by a decent
| amount with Sonnet 3.5 vs 1.5 Pro.
|
| However, Sonnet 3.7 seemed like a very small increase,
| whereas 2.5 Pro seemed like quite a leap.
|
| Now, IME, Google seems to be comfortably ahead.
|
| 2.5 Pro is a little slow, though.
|
| I'm not sure which model Google uses for the AI answers on
| search, but I find myself using Search for a lot of things I
| might ask Gemini (via 2.5 Pro) if it was as fast as Search's
| AI answers.
| dmix wrote:
| How's is the speed of Gemini vs 3.7?
| benhurmarcel wrote:
| I use both, Gemini 2.5 Pro is significantly slower than
| Claude 3.7.
| rockwotj wrote:
| Yeah I have read gemini pro 2.5 is a much bigger model.
| Graphon1 wrote:
| Just curious, what tool do you use to interface with these
| LLMs? Cursor? or Aider? or...
| speedgoose wrote:
| I'm on GitHub Copilot with VsCode Insiders, mostly because
| I don't have to subscribe to one more thing.
|
| They pretty quick to let you use the latest models
| nowadays.
| nicr_22 wrote:
| I really like the open source Cline extension. It
| supports most of the model APIs, just need to copy/paste
| an API key.
| jessep wrote:
| I have had a few epic refactoring failures with Gemini
| relative to Claude.
|
| For example: I asked both to change a bunch of code into
| functions to pass into a `pipe` type function, and Gemini
| truly seemed to have no idea what it was supposed to do, and
| Claude just did it.
|
| Maybe there was some user error or something, but after that
| I haven't really used Gemini.
|
| I'm curious if people are using Gemini and loving it are
| using it mostly for one-shotting, or if they're working with
| it more closely like a pair programmer? I could buy that it
| could maybe be good at one but bad at the other?
| Asraelite wrote:
| This has been my experience too. Gemini might be better for
| vibe coding or architecture or whatever, but Claude
| consistently feels better for serious coding. That is, when
| I know exactly how I want something implemented in a large
| existing codebase, and I go through the full cycle of
| implementation, refinement, bug fixing, and testing,
| guiding the AI along the way.
|
| It also seems to be better at incorporating knowledge from
| documentation and existing examples when provided.
| int_19h wrote:
| My experience has been exactly the opposite - Sonnet did
| fine on trivial tasks, but couldn't e.g. fix a bug end-
| to-end (from bug description in the tracker to
| implementing the fix and adding tests) properly because
| it couldn't understand how the relevant code worked,
| whereas Gemini would consistently figure out the root
| cause and write decent fix & tests.
|
| Perhaps this is down to specific tools and their prompts?
| In my case, this was Cursor used in agent mode.
|
| Or perhaps it's about the languages involved - my
| experiments were with TypeScript and C++.
| Asraelite wrote:
| > Gemini would consistently figure out the root cause and
| write decent fix & tests.
|
| I feel like you might be using it differently to me. I
| generally don't ask AI to find the cause of a bug,
| because it's quite bad at that. I use it to identify
| relevant parts of the code that could be involved in the
| bug, and then I come up with my own hypotheses for the
| cause. Then I use AI to help write tests to validate
| these hypotheses. I mostly use Rust.
| int_19h wrote:
| I used to use them mostly in "smart code completion" mode
| myself until very recently. But with all the AI IDEs
| adding agentic mode, I was curious to see how well that
| fares if I let it drive.
|
| And we aren't talking about trivial bugs here. For
| TypeScript, the most impressive bug it handled to date
| was an async race condition due to missing await causing
| a property to be overwritten with invalid value. For that
| one I actually had to do some manual debugging and tell
| it what I observed, but given that info, it was able to
| locate the problem in the code all by itself and fix it
| correctly and come up with a way to test it as well.
|
| For C++, the codebase in question was gdb, the bug was a
| test issue, and it correctly found problematic code based
| solely on the test log (but I had to prod it a bit in the
| right direction for the fix).
|
| I should note that this is Gemini Pro 2.5 specifically.
| When I tried Google's models previously (for all kinds of
| tasks), I was very unimpressed - it was noticeably worse
| than other SOTA models, so I was very skeptical going
| into this. Indeed, I started with Sonnet precisely
| because my past experience indicated that it was the best
| option, and I only tried Gemini after Sonnet fumbled.
| Asraelite wrote:
| I use it for basically everything I can, not just code
| completion, including end-to-end bug fixes when it makes
| sense. But most of the time even the current Gemini and
| Claude models fail with the hard things.
|
| It might be because most bugs that you would encounter in
| other languages don't occur in the first place in Rust
| because of the stronger type system. The race condition
| one you mentioned wouldn't be possible for example. If
| something like that would occur, it's a compiler error
| and the AI fixes it while still in the initial
| implementation stage by looking at the linter errors. I
| also put a lot of effort into trying to use coding
| patterns that do as much validation as possible within
| the type system. So in the end all that's left are the
| more difficult bugs where a human is needed to assist
| (for now at least, I'm confident that the models are only
| going to get better).
| int_19h wrote:
| Race conditions can span across processes (think async
| process communication).
|
| That said I do wonder if the problems you're seeing are
| simply because there isn't that much Rust in the training
| set for the models - because, well, there's relatively
| little of it overall when you compare it to something
| like C++ or JS.
| sleiben wrote:
| Same here. Especially for native app development with swift I
| had way better results and just sticked with Gemini-2.5-*
| yieldcrv wrote:
| I also cancelled my Anthropic yesterday, not because of
| Gemini but because it was the absolute _worst_ time for
| Anthropic to limit their Pro plan to upsell their Max plan
| when there is so much competition out there
|
| Manus.im also does code generation in a nice UI, but I'll
| probably be using Gemini and Deepseek
|
| No Moat strikes again
| fsndz wrote:
| More and more people are coming to the realisation that Google
| is actually winning at the model level right now.
| zaphirplane wrote:
| What's with the Google cheer squad in this thread, usually
| it's Google lost its way and is evil.
|
| Can't be employees cause usually there is a disclaimer
| pjerem wrote:
| Google can be evil and release impressive language models.
| The same way as Apple releasing incredible hardware with
| good privacy while also being a totally insufferable and
| arrogant company.
| crowbahr wrote:
| Google employees only have to disclaimer when they're
| identified as Google employees.
|
| So shit like "as a googler" requires "my opinions are my
| own yadda yadda"
| MagicMoonlight wrote:
| I haven't met a single person that uses Gemini. Companies are
| using Copilot and individuals are using ChatGPT.
|
| Also, why would I want Google to spy on my AI usage? They're
| evil.
| fsndz wrote:
| why is Google more evil than say OpenAI ?
| m3kw9 wrote:
| Using Claude code and Codex CLI and then Aider with Gemini 2.5
| pro, Aider is much faster because you feed in the files instead
| of using tools to start doing all kinds of whole know what
| spending 10x the tokens. I tried a relatively simple refactor
| which needed around 7 files changed, only Aider with 2.5 got it
| and in the first shot. Where as both Codex and Claude code
| completely fumbled it
| goshx wrote:
| Same here! It is borderline stubborn at times and I need to
| prove it wrong. Still, it is the best model to use with Cursor,
| in my experience.
| teleforce wrote:
| >obsequious
|
| Thanks for the new word, I have to look it up.
|
| "obedient or attentive to an excessive or servile degree"
|
| Apparently it means an AI that mindlessly follow your logic and
| instructions without reasoning and articulation is not good
| enough.
| nemomarx wrote:
| I think here it's referring to a common problem where the AI
| agrees with your position too easily, and/or changes it's
| answer if you tell it the answer is wrong instantly
| (therefore providing no stable true answer if you asked it
| something about a fact)?
|
| Also the slightly over cheery tone maybe.
| lylah69 wrote:
| I like to do this with Claude. It takes 5 back & forths to
| get an uncertain answer.
|
| Is there a way to tackle this?
| zoogeny wrote:
| It's a bit of a fancy way to say "yes man". Like in
| corporations or politics, if a leader surrounds themselves
| with "yes men".
|
| A synonym would be sycophantic which would be "behaving or
| done in an obsequious way in order to gain advantage." The
| connotation is the other party misrepresents their own
| opinion in order to gain favor or avoid disapproval from
| someone of a higher status. Like when a subordinate tries to
| guess what their superior wants to hear instead of providing
| an unbiased response.
|
| I think that accurately describes my experience with some
| LLMs due to heavy handed RLHF towards agreeableness.
|
| In fact, I think obsequious is a better word since it doesn't
| have the cynical connotation of sycophant. LLMs don't have a
| motive and obsequious describes the behavior without
| specifying the intent.
| teleforce wrote:
| Yes, that's the first two words that come to my mind when I
| read the meaning. The Gen Z word now I think is "simp".
| zoogeny wrote:
| Yeah, it is very close. But I feel simp has a bit of a
| sexual feel to it. Like a guy who does favors for a girl
| expecting affection in return, or donates a lot of money
| to an OnlyFans or Twitch streamer. I also see simp used
| where we used to call it white-knighting (e.g. "to simp
| for").
|
| Obsequious is a bit more general. You could imagine
| applying it to a waiter or valet who is annoyingly
| helpful. I don't think it would feel right to use the
| word simp in that case.
|
| In my day we would call it sucking up. A bit before my
| time (would sound old timey to me) people called it boot
| licking. In the novel "Catcher in the Rye", the
| protagonist uses the word "phony" in a similar way. This
| kind of behavior is universally disliked so there is a
| lot slang for it.
| snthpy wrote:
| Thanks, as an old timer TIL about simp.
| tkgally wrote:
| Another useful word in this context is "sycophancy," meaning
| excessive flattery or insincere agreement. Amanda Askell of
| Anthropic has used it to describe a trait they try to
| suppress in Claude:
|
| https://youtube.com/watch?v=ugvHCXCOmm4&t=10286
| davidsainez wrote:
| The second example she uses is really important. You (used
| to) see this a lot in stackoverflow where an inexperienced
| programmer asks how to do some convoluted thing. Sure, you
| can explain how to do the thing while maintaining their
| artificial constraints. But much more useful is to say "you
| probably want to approach the problem like this instead".
| It is surely a difficult problem and context dependent.
| pinoy420 wrote:
| XY problem
| snthpy wrote:
| Interesting that Americans appear to hold their AI models
| to a higher standard than their politicians.
| brookst wrote:
| Different Americans.
| syndeo wrote:
| Lots of folks in tech have different opinions than you
| may expect. Many will either keep quiet or play along to
| keep the peace/team cohesion, but you really never know
| if they actually agree deep down.
|
| Their career, livelihoods, ability to support their
| families, etc. are ultimately on the line, so they'll pay
| lip service if they have to. Consider it part of the job
| at that point; personal beliefs are often left at the
| door.
| sans_souse wrote:
| I wonder if anyone here will know this one; I learned the
| word "obsequious" over a decade ago while working the line of
| a restaurant. I used to listen to the 2p2 (2 plus 2) poker
| podcasts during prep and they had a regular feature with
| David Sklansky (iirc) giving tips, stories, advice etc. This
| particular one he simply gave the word "obsequious" and
| defined it later. I remember my sous chef and I were debating
| what it could mean and I guessed it right. I still can't
| remember what it had to do with poker, but that's besides the
| point.
|
| Maybe I can locate it
| sicromoft wrote:
| I didn't hear that one but I am a fan of Sklansky. And I
| also have a very vivid memory of learning the word, when I
| first heard the song Turn Around by They Might Be Giants.
| The connection with the song burned it into my memory.
| UltraSane wrote:
| I had a very interesting long debate/discussion with Gemini 2.5
| Pro about the Synapse-Evolve bank debacle among other things.
| It really feels like debating a very knowledgeable and smart
| human.
| rat9988 wrote:
| You didn't have a debate, you just researched a question.
| zoogeny wrote:
| One mans debate is another mans research.
| rat9988 wrote:
| Indeed, but a research isn't necessarily a debate. In
| this case, it was not.
| UltraSane wrote:
| All right Mr. Pednatic. Very complex linear algebra created
| a very convincing illusion of a debate. You happy now?
|
| But good LLMs will take a position and push back at your
| arguments.
| jofzar wrote:
| My work doesn't have access to 2.5 pro and all these posts are
| just making me want it so much more.
|
| I hate how slow things are sometimes.
| basch wrote:
| Can't you just go into aistudio with any free gmail account?
| sciurus wrote:
| For many workplaces, it's not just that that don't pay for
| a service, it's that using it is against policy. If I tried
| to paste some code into ChatGPT, for example, our data loss
| prevention spyware would block it and I'd soon be having an
| uncomfortable conversation with our security team.
|
| (We do have access to GitHub Copilot)
| Atotalnoob wrote:
| Good news then, your GitHub admins can enable Gemini for
| you without issue.
| d1sxeyes wrote:
| "Without issue" is an optimistic perspective on how this
| works in many organisations.
| i_love_retros wrote:
| Why is it free / so cheap (I seem to be getting charged a few
| cents a day using it with aider so not free but still crazy
| cheap compared to sonnet)
| brendanfinan wrote:
| we know how Google makes money
| d1sxeyes wrote:
| Give it a few months and it will ignore all your questions
| and just ask if you've watched Rampart.
| disgruntledphd2 wrote:
| To be fair, Google do have a cost advantage here as
| they've built their own hardware.
| redox99 wrote:
| I've had many disappointing results with gemini 2.5 pro. For
| general queries possibly involving search, chatgpt and grok
| work better for me.
|
| For code, gemini is very buggy in cursor, so I use Claude 3.7.
| But it might be partly cursor's fault.
| rgoulter wrote:
| The _1 million_ token context window also means you can just
| copy /paste so much source code or log output.
| crossroadsguy wrote:
| One difference, and imho that's a big difference -- you can't
| use any of the Google's chatbots/models without being logged
| in, unlike chatgpt.
| casey2 wrote:
| It's a big deal, but not in the way that you think. A race to
| the bottom is humanities best defense against fast takeoff.
| instagraham wrote:
| obsequious is such a nice word for this context, only possible
| in the AI age.
|
| i'd find the same word improper to describe human beings -
| other words like plaintive, obedient and compliant often do the
| job better and are less obscure.
|
| here it feels like a word whose time has come.
| _blk wrote:
| Have you tried Grok 3? It's a bit verbose for my taste even
| when prompted to be brief but answers seem better/more
| researched and less opinionated. It's also more willing to
| answer questions where the other models block an answer.
| fuzzylightbulb wrote:
| A lot of people don't want to patronize the businesses of an
| unabashed Nazi sympathizer. There are more important things
| in life than model output quality.
| zoogeny wrote:
| I have not tried any of the Grok models but that is probably
| because I am rarely on X.
|
| I have to admit I have a bias where I think Google is
| "business" while Grok is for lols. But I should probably take
| the time to asses it since I would prefer to have an opinion
| based on experience rather than vibes.
| MetaWhirledPeas wrote:
| > 100% of my casual AI usage is now in Gemini and I look
| forward to asking it questions on deep topics because it
| consistently provides me with insight.
|
| It's probably great for lots of things but it doesn't seem very
| good for recent news. I asked it about recent accusations
| around xAI and methane gas turbines and it had no clue what I
| was talking about. I asked the same question to Grok and it
| gave me all sorts of details.
| ramesh31 wrote:
| >It's probably great for lots of things but it doesn't seem
| very good for recent news.
|
| You are missing the point here. The LLM is just the
| "reasoning engine" for agents now. Its corpus of facts are
| meaningless, and shouldn't really be relied upon for
| anything. But in conjunction with a tool calling agentic
| process, with access to the web, what you described is now
| trivially doable. Single shot LLM usage is not really
| anything anyone should be doing anymore.
| darksaints wrote:
| That's all fine and dandy, but if you google anything
| related to llm agents, you get 1000 answers to 100
| questions, companies hawking their new "visual programming"
| agent composers, and a ton of videos of douchebags trying
| to be the Steve Jobs of AI. The concept I'm sure is fine,
| but execution of agentic anything is still the Wild Wild
| West and nobody knows what they're really doing.
| ramesh31 wrote:
| Indeed there is a mountain of snake oil out there at this
| point, but the underlying concepts are extremely simple,
| and can be implemented directly without frameworks.
|
| I generally point people to Anthropic's seminal blog post
| on the topic:
| https://www.anthropic.com/engineering/building-effective-
| age...
| MetaWhirledPeas wrote:
| > You are missing the point here.
|
| I'm just discussing the GP's topic of casual use. Casual
| use implies heading over to an already-hosted prompt and
| typing in questions. Implementing my own 'agentic process'
| does not sound very casual to me.
| ramesh31 wrote:
| > Implementing my own 'agentic process' does not sound
| very casual to me.
|
| It really is though. This can be as simple as using
| Claude desktop with a web search tool.
| arizen wrote:
| This was my experience as well.
|
| Gemini performing the best on coding tasks, while giving
| underwhelming responses on recent news.
|
| While Grok was OK for coding tasks, but being linked to X,
| provided best response on recent events.
| minimaxir wrote:
| One hidden note from Gemini 2.5 Flash when diving deep into the
| documentation: for image inputs, not only can the model be
| instructed to generated 2D bounding boxes of relevant subjects,
| but it can also create segmentation masks!
| https://ai.google.dev/gemini-api/docs/image-understanding#se...
|
| At this price point with the Flash model, creating segmentation
| masks is pretty nifty.
|
| The segmentation masks are a bit of a galaxy brain implementation
| by generating a b64 string representing the mask:
| https://colab.research.google.com/github/google-gemini/cookb...
|
| I am trying to test it in AI Studio but it sometimes errors out,
| likely because it tries to decode the b64 lol.
| behnamoh wrote:
| Wait, did they just kill YOLO, at least for time-insensitive
| tasks?
| minimaxir wrote:
| YOLO is probably still cheaper if bounding boxes are your
| main goal. Good segmentation models that work for arbitrary
| labels, however, are much more expensive to set up and run,
| so this type of approach could be an interesting alternative
| depending on performance.
| daemonologist wrote:
| No, the speed of YOLO/DETR inference makes it cheap as well -
| probably at least five or six orders of magnitude cheaper.
| Edit: After some experimentation, Gemini also seems to not
| perform nearly as well as a purpose-tuned detection model.
|
| It'll be interesting to test this capability and see how it
| evolves though. At some point you might be able use it as a
| "teacher" to generate training data for new tasks.
| vunderba wrote:
| Well no. You can run/host YOLO which means not having to
| submit potentially sensitive information to a company that
| generates a large amount of revenue from targeted
| advertising.
| daemonologist wrote:
| Interestingly if you run this in Gemini (instead of AI Studio)
| you get: I am sorry, but I was unable to
| generate the segmentation masks for _ in the image due to an
| internal error with the tool required for this task.
|
| (Not sure if that's a real or hallucinated error.)
| ipsum2 wrote:
| The performance is basically so bad it's unusable though,
| segmentation models and object detection models are still the
| best, for now.
| msp26 wrote:
| I've had mixed results with the bounding boxes even on 2.5 pro.
| On complex images where a lot of boxes need to be drawn they're
| in the general region but miss the exact location of objects.
| simonw wrote:
| This is SO cool. I built an interactive tool for trying this
| out (bring your own Gemini API key) here:
| https://tools.simonwillison.net/gemini-mask
|
| More details plus a screenshot of the tool working here:
| https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...
|
| I vibe coded it using Claude and O3.
| xnx wrote:
| There is a starter app in AI Studio that demos this:
| https://aistudio.google.com/apps/bundled/spatial-understandi...
| simonw wrote:
| I spotted something interesting in the Python API library code:
|
| https://github.com/googleapis/python-genai/blob/473bf4b6b5a6...
| class ThinkingConfig(_common.BaseModel): """The
| thinking features configuration."""
| include_thoughts: Optional[bool] = Field(
| default=None, description="""Indicates whether to
| include thoughts in the response. If true, thoughts are returned
| only if the model supports thought and thoughts are available.
| """, ) thinking_budget: Optional[int] =
| Field( default=None,
| description="""Indicates the thinking budget in tokens.
| """, )
|
| That thinking_budget thing is documented, but what's the deal
| with include_thoughts? It sounds like it's an option to have the
| API return the thought summary... but I can't figure out how to
| get it to work, and I've not found documentation or example code
| that uses it.
|
| Anyone managed to get Gemini to spit out thought summaries in its
| API using this option?
| phillypham wrote:
| They removed the docs and support for it
| https://github.com/googleapis/python-
| genai/commit/af3b339a9d....
|
| You can see the thoughts in AI Studio UI as per
| https://ai.google.dev/gemini-api/docs/thinking#debugging-
| and....
| lemming wrote:
| I maintain an alternative client which I build from the API
| definitions at https://github.com/googleapis/googleapis, which
| according to https://github.com/googleapis/python-
| genai/issues/345 should be the right place. But neither the AI
| Studio nor the Vertex definitions even have ThinkingConfig yet
| - very frustrating. In general it's amazing how much API
| munging is required to get a working client from the public API
| definitions.
| qwertox wrote:
| In AI Studio the flash moddels has two toggles: Enable thinking
| and Set thinking budget. If thinking budget is enabled, you can
| set tue max number of tokens it can use to think, else it's
| Auto.
| Deathmax wrote:
| It is gated behind the GOOGLE_INTERNAL visibility flag, which
| only internal Google projects and Cursor have at the moment as
| far as I know.
| msp26 wrote:
| The API won't give you the "thinking" tokens, those are only
| visible on AI studio. Probably to try to stop distillation,
| very disappointing. I find reading the cot to be incredibly
| informative to identify failure modes.
|
| > Hey Everyone,
|
| > Moving forward, our team has made a decision to only show
| thoughts in Google AI Studio. Meaning, we no longer return
| thoughts via the Gemini API. Here is the updated doc to reflect
| that.
|
| https://discuss.ai.google.dev/t/thoughts-are-missing-cot-not...
|
| ---
|
| After I wrote all of that I see that the API docs page looks
| different today and now says:
|
| >Note that a summarized version of the thinking process is
| available through both the API and Google AI Studio.
|
| https://ai.google.dev/gemini-api/docs/thinking
|
| Maybe they just updated it? Or people aren't on the same page
| at Google idk
|
| Previously it said
|
| > Models with thinking capabilities are available in Google AI
| Studio and through the Gemini API. Note that the thinking
| process is visible within Google AI Studio but is not provided
| as part of the API output.
|
| https://web.archive.org/web/20250409174840/https://ai.google...
| deanmoriarty wrote:
| Genuine naive question: when it comes to Google HN has generally
| a negative view of it (pick any random story on Chrome, ads,
| search, web, working at faang, etc. and this should be obvious
| from the comments), yet when it comes to AI there is a somewhat
| notable "cheering effect" for Google to win the AI race that goes
| beyond a conventional appreciation of a healthy competitive
| landscape, which may appear as a bit of a double standard.
|
| Why is this? Is it because OpenAI is seen as such a negative
| player in this ecosystem that Google "gets a pass on this one"?
|
| And bonus question: what do people think will happen to OpenAI if
| Google wins the race? Do you think they'll literally just go
| bust?
| antirez wrote:
| Maybe because Google is largely responsible, paying for the
| research, of most of the results we are seeing now. I'm not a
| Google fan, in the web side, and in their idea of what software
| engineering is, but they deserve to win the AI race, because
| right now all the other players provided a lot less than what
| Google did as public research. Also, with Gemini 2.5 PRO, there
| was a big hype moment, because the model is of unseen ability.
| wkat4242 wrote:
| Maybe they deserve it but it would be really bad for the
| world. Because they will enshittify the hell out of it once
| they're established. That's their MO.
|
| I don't want Google to have a stranglehold over yet another
| type of online service. So I avoid them.
|
| And things are going so fast now, whatever Google has today
| that might be better than the rest, in two months the rest
| will have it too. Of course Google will have something new
| again. But being 2 months behind isn't a huge deal. I don't
| have to have the 'winning' product. In fact most of my AI
| tasks go to an 8b llama 3.1 model. It's about on par with gpt
| 3.5 but that's fine.
| visarga wrote:
| The situation with LLMs is much different than search,
| Google doesn't have such a large lead here. LLMs are social
| things, they learn from each other, any provider with SOTA
| model will see its abilities leaked through synthetic
| training data. That's what GPT-4 did for a year, against
| the wishes of OpenAI, powering up millions of open model
| finetunes.
| 01100011 wrote:
| Didn't Google invent the transformer?
|
| I think a lot of us see Google as both an evil advertiser and
| as an innovator. Google winning AI is sort of nostalgic for
| those of us who once cheered the "Do No Evil"(now mostly "Do
| Know Evil") company.
|
| I also like how Google is making quiet progress while other
| companies take their latest incremental improvement and promote
| it as hard as they can.
| pkaye wrote:
| I think for a while some people felt the Google AI models are
| worse but now its getting much better. On the other hand Google
| has their own hardware so they can drive down the costs of
| using the models so it keeps pressure on Open AI do remain cost
| competitive. Then you have Anthropic which has very good models
| but is very expensive. But I've heard they are working with
| Amazon to build a data center with Amazons custom AI chips so
| maybe they can bring down their costs. In the end all these
| companies will need a good model and lower cost hardware to
| succeed.
| brap wrote:
| I am cheering for the old Google to make a comeback and it
| seems like the AI race has genuinely sparked something positive
| inside Google.
| wyre wrote:
| Gemini is just that good. From my usage it is much smarter than
| DeepSeek or Claude 3.7 Thinking models.
|
| A lot of Google's market share across its services comes from
| the monopoly effects Google has. The quality of Gemini 2.5 is
| noticeably smarter than its competitors so I see the applause
| for the quality of the LLM and not for Google.
|
| I think it's way too early to say anything about who is winning
| the race. There is still a long way to go; o3 scores highest in
| Humanity's Last Exam (https://agi.safe.ai/) at 20%, 2.5 scores
| 18%.
| sothatsit wrote:
| 2.5 Pro is free, and I'm sure there's a lot of people who have
| just never tried the best models because they don't want to pay
| for them. So 2.5 Pro probably blows their socks off.
|
| Whereas, if you've been paying for access to the best models
| from OpenAI and Anthropic all along, 2.5 Pro doesn't feel like
| such a drastic step-change. But going from free models to 2.5
| Pro is a crazy difference. I also think this is why DeepSeek
| got so much attention so quickly - because it was free.
| julianeon wrote:
| It's been a while since they won something the "old" Google
| way: by building a superior product that is #1 on its merits.
|
| In that sense Gemini is a throwback: there's no trick - it's
| objectively better than everything else.
| sagarpatil wrote:
| Most of us weren't using Gemini pro models (1.0, 1.5, 2.0) but
| the recent 2.5 pro is such a huge step up. It's better than 3.7
| sonnet for coding. Better than o1, o3-mini models and now o3
| and o4-mini. It's become my daily driver. It does everything I
| need with almost 100% accuracy, is cheap, fast, 1 million
| context window, uses google web search for grounding, can fetch
| YouTube video transcripts, can fetch website content, works in
| google workspace: Gmail, Docs, Sheets. Really hard to beat this
| combo. Oh and if you subscribe to their AI plan it comes with 2
| TB drive storage.
| oezi wrote:
| The key is Gemini being free through AI Studio. This makes
| their technical improvement more impressive when OpenAI sells
| their best models at ridiculous prices.
|
| If Google engages in price dumping as a monopolist remains to
| be seen but it feels like it.
|
| The LLM race is fast paced and no moat has developed. People
| are switching on a whim if better models (by some margin) show
| up. When will OpenAI, Anthropic or DeepSeek counter 2.5 Pro?
| And will it be before Google releases the next Pro?
|
| OpenAI commands a large chunk of the consumer market and they
| have considerable funds after their last round. They won't fold
| this or next year.
|
| If Google wants to win this they must come up with a product
| strategy integrating their search business without seriously
| damaging their existing search business to much. This is hard.
| int_19h wrote:
| I dislike Google rather strongly due to their ad-based business
| model, and I was previously very skeptical of their AI
| offerings because of very lackluster performance compared to
| OpenAI and Claude. But I can't help but be impressed with
| Gemini Pro 2.5 for "deep research" and agentic coding. I have
| subscriptions with all three so that I can keep up with SOTA,
| but if I had to choose only one to keep, right now it'd be
| Gemini.
|
| That said I still don't "cheer" for them and I would really
| rather someone else win the race. But that is orthogonal to
| recognition of observed objective superiority.
| greentea23 wrote:
| I prefer OpenAI and Anthropic big time because they are fresh
| players with less dominance over other aspects of digital life.
| Not having to login to an insidious tracker like Google is
| worth significantly worse performance. Although I have little
| FOMO here avoiding Gemini because evaluating these models on
| real world use cases remains quite subjective imo.
| jonas21 wrote:
| A lot of the negativity toward Google stems from the fact that
| they're the big, dominant player in search, ads, browsers,
| etc., rather than anything that they've done or any particular
| attribute of the company.
|
| In AI, they're still seen as being behind OpenAI and others, so
| we don't see the same level of negativity.
| summerlight wrote:
| Because now it has brought real competitions to the field. GPT
| was the king and Claude had been the only meaningful challenger
| for a while but OpenAI didn't care about Anthropic but just be
| obsessed with Google. Gemini took a quite time to set the
| pipeline so initial version was not enough to push the
| frontier; you remember the days when Google released a new
| model, OpenAI just responded with some old models in their silo
| within a day only to crush them. That does not happen anymore
| and they're forced to develop a better model.
| CephalopodMD wrote:
| As a googler working in LLM space, this feels like revisionist
| history to me haha! I remember a completely different
| environment only a few months ago when Anthropic was the
| darling child, and before that it was OpenAI (and for like 4
| weeks somewhere in there, it was Deepseek). For literally years
| at this point, every time Bard or Gemini would make a major
| release, it would be largely ignored or put down in favor of
| the next "big thing" OpenAI was doing or Claude saturating
| coding benchmarks, never mind that Google was often just behind
| with the exact same tech ready to go, in some cases only
| missing their demo release by literally 1 day (remember live
| voice?). And every time this happened, folks would be posting
| things to the effect of "LOL I can't believe Google is losing
| the AI race - didn't they invent this?", "this is like
| Microsoft dropping the ball on mobile", "Google is getting
| their lunch eaten by scrappy upstarts," etc. I can't lie, it
| stings a bit when that's what you work on all day.
|
| 2.5 was quite good. Not stupidly good like the jump from GPT 2
| to 3 or 3.5 to 4, but really good. It was a big jump in ELO and
| benchmarks. People like it, and I think it's just
| psychologically satisfying that the player everybody would have
| expected to win the AI race is currently in the lead. Gemini
| finally gets a day in the sun.
|
| I'm sure this will change with whenever somebody comes up with
| the next big idea though. It probably won't take much to beat
| Gemini in the long run. There is literally zero moat.
| krembo wrote:
| How is this sustainable for Google from business POV? It feels
| like Google is shooting itself in the foot while "winning" the AI
| race.. From my experience I think Google lost 99% of the ads it
| used to show me before in the search engine.
| tomr75 wrote:
| someone else will do it if they don't
| aoeusnth1 wrote:
| Their inference costs are the lowest in the business.
| zenGull wrote:
| I've been paying for googles pro llm for about six months. At 20
| it feels steep considering the free version is very good. I'm a
| devops work, and it's been very helpful. Ive tried gpt, copilot,
| Mixtral, Claude, etc and Geminis 1.5 pro was what sold me. The
| new 2.0 stuff is even better. Anecdotally, Gemini seems to forget
| to add stuff but doesn't hallucinate as much. I've been doing
| some pretty complex scripting this last week purely on Gemini
| fast 2.0 and it's been really really good.
| jdthedisciple wrote:
| Very excited to try it, but it _is_ noteworthy that o4-mini is
| _strictly better_ according to the very benchmarks shown by
| Google here.
|
| Of course it's about 4x as expensive too (I believe), but still,
| given the release of openai/codex as well, o4-mini will remain a
| strong competitor for now.
| thimabi wrote:
| I find it baffling that Google offers such impressive models
| through the API and even the free AI Studio with fine-grained
| control, yet the models used in the Gemini app feel much worse.
|
| Over the past few weeks, I've been using Gemini Advanced on my
| Workspace account. There, the models think for shorter times,
| provide shorter outputs, and even their context window is far
| from the advertised 1 million tokens. It makes me think that
| Google is intentionally limiting the Gemini app.
|
| Perhaps the goal is to steer users toward the API or AI Studio,
| with the free tier that involves data collection for training
| purposes.
| Alifatisk wrote:
| Google lack marketing for ai studio, it has only recently
| become widely known through word of mouth
| thimabi wrote:
| That does work in Google's favor. Users who are technical
| enough to want a better model eventually learn about AI
| Studio, while the rest are none the wiser.
| _delirium wrote:
| This might have changed after you posted your comment, but it
| looks like 2.5 Pro and 2.5 Flash are available in the Gemini
| app now, both web and mobile.
| thimabi wrote:
| Oh, I didn't mean to say that these models were unavailable
| through the app or website. Rather, I've realized that using
| them through the API or AI Studio yields much better results
| -- even in the free tier.
|
| You can check that by trying prompts with complex
| instructions and long inputs/outputs.
|
| For instance, ask Gemini to generate notes from a specific
| source (say, a book or class transcription). Or ask it to
| translate a long article, full of idiomatic expressions,
| while maintaining high fidelity to the source. You will see
| that the very same Gemini models are underutilized on the app
| or the website, while their performance is stellar on the API
| or AI Studio.
| bingdig wrote:
| It appears that this impacted gemini-2.5-pro-preview-03-25
| somehow? grounding with google search no longer works.
|
| I had a workflow running that would pull news articles from the
| past 24 hours. It now refuses to believe the current date is
| 2025-04-17. Even with search turned on and I ask it what the date
| is it and it always replies sometime in July 2024.
| Alifatisk wrote:
| No matter how good the new Gemini models have become, my bad
| experience with early Gemini is still stuck with me and I am
| afraid I still suffer from confirmation bias. Whenever I just
| look at the Gemini app, I already assume it's going to be a bad
| experience.
| thallavajhula wrote:
| At this point, at the current pace of AI model development, I
| feel like I can't tell which one is better. I usually end up
| using multiple LLMs to get a task done to my taste. They're all
| equally good and bad. It's like using GCP vs AWS vs Azure all
| over again, except in the AI space.
| simonw wrote:
| An often overlooked feature of the Gemini models is that they can
| write and execute Python code directly via their API.
|
| My llm-gemini plugin supports that:
| https://github.com/simonw/llm-gemini uv tool
| install llm llm install llm-gemini llm keys set
| gemini # paste key here llm -m gemini-2.5-flash-
| preview-04-17 \ -o code_excution 1 \ 'render a
| mandelbrot fractal in ascii art'
|
| I ran that just now and got this:
| https://gist.github.com/simonw/cb431005c0e0535343d6977a7c470...
|
| They don't charge anything extra for code execution, you just pay
| for input and output tokens. The above example used 10 input,
| 1,531 output which is $0.15/million for input and $3.50/million
| output for Gemini 2.5 Flash with thinking enabled, so 0.536 cents
| (just over half a cent) for this prompt.
| blahgeek wrote:
| > An often overlooked feature of the Gemini models is that they
| can write and execute Python code directly via their API.
|
| Could you elaborate? I thought function calling is a common
| feature among models from different providers
| WiSaGaN wrote:
| This common feature requires the user of the API to implement
| the tool, in this case, the user is responsible to run the
| code the API outputs. The post you replied suggests that
| Gemini will run the code for the user behind the API call.
| tempoponet wrote:
| That was how I read it as well, as if it had a built-in
| lambda type service in the cloud.
|
| If we're just talking about some API support to call python
| scripts, that's pretty basic to wire up with any model that
| supports tool use.
| simonw wrote:
| The Gemini API runs the Python code for you as part of your
| single API call, without you having to handle the tool call
| request yourself.
| tempaccount420 wrote:
| This is so much cheaper than re-prompting each tool use.
|
| I wish this was extended to things like: you could give the
| model an API endpoint that it can call to execute JS code,
| and the only requirement is that your API has to respond
| within 5 seconds (maybe less actually).
|
| I wonder if this is what OpenAI is planning to do in the
| upcoming API update to support tools in o3.
| danpalmer wrote:
| I imagine there wouldn't bd much of a cost to the
| provider on the API call there so much longer times may
| be possible. It's not like this would hold up the LLM in
| any way, execution would get suspended while the call is
| made and the TPU/GPU will serve another request.
| suchar wrote:
| They need to keep KV cache to avoid prompt reprocessing,
| so they would need to move it to ram/nvme during longer
| api calls to use gpu for another request
| pantsforbirds wrote:
| See a example full in a few commands using uv think "wow I bet
| that Simon guy from twitter would love this" ... it's already
| him.
| throaway920181 wrote:
| I wish Gemini could do this with Go. It generates plenty of
| junk/non-parseable code and I have to feed it the error
| messages and hope it properly corrects it.
| lleymrl651 wrote:
| good
| djrj477dhsnv wrote:
| Why are most comments here only comparing to Claude and just a
| few to ChatGPT and none to Grok?
|
| Grok 3 has been my main LLM since its release. Is it not as good
| as I thought it was?
| jofzar wrote:
| IMO I will not use Grok while it's owned and related to Elon,
| not only do I not trust their privacy and data usage (not that
| I "really" trust open AI/Google etc) I just despise him.
|
| It would have to be very significantly better for me to use it.
| dyauspitr wrote:
| Grok just isn't the best out there.
| WiSaGaN wrote:
| Interesting that the output price per 1M tokens is $0.6 for non-
| reasoning, but $3.5 for reasoning. This seems to defy common
| assumption of how reasoning models work, and you tweak the
| <think> token probability to control how much thinking it does,
| but underlying it's the same model and the same inference code
| path.
| michaelbrave wrote:
| Yesterday I started working through How to design programs, and
| set up a chat with Gemini 2.5 asking it to be my tutor as I go
| through it and to help answer my questions if I don't understand
| a part of the book. It has been knowledgeable, helpful and
| capable of breaking down complex things that I couldn't
| understand into understandable things. Fantastic all around.
| zenkey wrote:
| Google is totally back in the game now, but it's still going to
| take a lot more for them at this point to overcome OpenAI's
| "first-mover advantage" (clearly the favorite among younger users
| atm).
| sweca wrote:
| Google Pixel marketing is doing wonders for Gemini in young
| populations. I have been seeing a lot more of their phones in
| my generation's hands.
| sinuhe69 wrote:
| I'm not familiar with Python internals, so when I tried to
| convert a public AI model (not a LLM) to run locally, I got some
| problems no other AI could help. Asked Gemini 2.5 and it pin
| pointed the problem immediately. It solution was not practical
| but I guess it also works.
| ashu1461 wrote:
| One place where I feel gemini models lag is function calling and
| predicting correct arguments to function calls, is there a
| benchmark which scores models on the basis of this ?
| upmind wrote:
| It's a shame that Gemini doesn't seem to have as much hype as
| GPT, I hope they gain more market share.
| menshiki wrote:
| As a person mostly using AI for everyday tasks and business-
| related research, it's very impressive how quickly they've
| progressed. I would consider all models before 2.0 totally
| unusable. Their web interface, however, is so much worse than
| that of the ChatGPT macOS app.
| gcbirzan wrote:
| Some aren't even at 2.0, and the version numbers aren't related
| in any way to their... generation? Also, what is so good about
| the ChatGPT app, specifically on macOS that makes it better?
| convivialdingo wrote:
| Dang - Google finally made a quality model that doesn't make me
| want to throw my computer out a window. It's honest, neutral and
| clearly not trained by the ideologically rabid anti-bias but
| actually super biased regime.
|
| Did I miss a revolt or something in googley land? A Google model
| saying "free speech is valuable and diverse opinions are good" is
| frankly bizarre to see.
| convivialdingo wrote:
| Downvote me all you want - the fact remains that previous
| Google models were so riddled with guardrails and political
| correctness that it was practically impossible to use for
| anything besides code and clean business data. Random text and
| opinion would trigger a filter and shut down output.
|
| Even this model criticizes the failures of the previous models.
| tempaccount420 wrote:
| Yes, something definitely changed. It's still a little
| biased, it's kind of like OpenAI before Trump became
| president.
| camkego wrote:
| The pricing table image in the article really should have
| included Gemini 2.5 pro. Sure, it could be after Flash to the
| right, but it would help people understand the price performance
| benefits of 2.5 Flash.
| wanderr wrote:
| Gemini has the annoying habit of delegating tasks to me. Most
| recently I was trying to find out how to do something in
| FastRawViewer that I couldn't find a straightforward answer on.
| After hallucinating a bunch of settings and menus that don't
| exist, it told me to read the manual and check the user forums.
| So much for saving me time.
| hubraumhugo wrote:
| You can get your HN profile analyzed and roasted by it. It's
| pretty funny :) https://hn-wrapped.kadoa.com/
|
| I'll add a selection for different models soon.
| demaga wrote:
| Didn't expect to be roasted by AI this morning. Nice one
| Alifatisk wrote:
| How is this relevant to Gemini 2.5 Flash? I guess it's using it
| or something?
| few wrote:
| This is cool.
|
| Does it only use a few recent comments or entire history? I'm
| trying to figure out where it figured out my city when I
| thought I was careful not to reveal it. I'm scrolling back
| pages without finding where I said it in the past. Could it
| have inferred it based on other information or hallucinated it?
|
| I wonder if there's a more opsec-focused version of this.
| x187463 wrote:
| _Personal Projects_
|
| Will finally implement that gravity in TTE, despite vowing not
| to. We all know how well developers keep promises.
|
| _Knowledge Growth_
|
| Will achieve enlightenment on the true meaning of
| 'enshittification', likely after attempting to watch a single
| YouTube video without Premium.
|
| I found these actually funny. Cool project.
| 131hn wrote:
| If OpenAI offers Codex and Anthropic offers Claude Code, is there
| a CLI integration that Google recommends for using Gemini 2.5?
| That's what's keeping me, for now, with the other two.
| yawaramin wrote:
| I just asked "why is 'Good Friday' so called?" and it got stuck.
| Flash 2.0 worked though.
| uninformed-me00 wrote:
| I want to think that this is all great, but the fact that this is
| also one of the best way to collect unsuspecting user data by
| default without explicit consent just doesn't feel right -- that
| applies to most people who would never have a chance reading this
| comment.
|
| I don't want to be angry but screw these default opt-in to have
| your privacy violated free stuff.
|
| Before you jump in to say you can pay to keep your privacy, stop
| and read again.
| sgt wrote:
| How are they able to remain so competitive and will it last? The
| pricing almost seems too good to be true in terms of what they
| claim you get.
| sweca wrote:
| Custom TPUs ftw
| egorfine wrote:
| I am always overlooking anything Google due to the fact that they
| are the opposite of "Don't be evil" and because their developer's
| console (Google Cloud) is incredibly hostile to humans.
|
| Today I reluctantly clicked on their "AI Studio" link in the
| press-release and I was pleasantly surprised to discover that AI
| Studio has nothing in common with their typical UI/UX. It's nice
| and I love it!
| brap wrote:
| To be fair the UX of all GCP/AWS/Azure is ass. If you don't
| know exactly what you're looking for, good luck navigating that
| mess.
| techwiz137 wrote:
| I had a heart attack moment thinking they were bringing some form
| of Adobe Flash back.
| latemedium wrote:
| There's an important difference between Gemini and Claude that
| I'm not sure how to quantify. I often use shell-connected LLMs
| (LLMs with a shell tool enabled) to take care of basic CSV
| munging / file-sorting tasks for me - I work in data science so
| there's a lot of this. When I ask Claude to do something, it
| carefully looks at all the directories and files before doing
| anything. Gemini, on the other hand, blindly jumps in and just
| starts moving stuff around. Claude executes more tools and is a
| little slower, but it almost always gets the right answer because
| it appropriately gathers the right context before really trying
| to solve the problem. Gemini doesn't seem to do this at all, but
| it makes a world of difference for my set of problems. Curious to
| see if others have had the same experience or if its just a quirk
| of my particular set of tasks
| energy123 wrote:
| What's a shell connected LLM and how to do that?
| kmacdough wrote:
| Look up Claude Code, Cursor, Aider and VSCode's agent
| integration. Generally, tools to use AI more actively for
| development. There are others as well. Plenty of info around.
| Here's not the place for a tutorial.
| rjurney wrote:
| I am building a knowledge graph using BAML [baml-py] to extract
| documents [it's opinionated towards docs] and then PySpark to ETL
| the data into a node / edge list. GPT4o got few relations...
| Gemini 2.5 got so many it was nuts, all accurate but not all from
| the article! I had to reign it in and instruct it not to build so
| vast a graph. Really cool, it knows a LOT about semiconductors :)
| profsummergig wrote:
| I tried this prompt in both Gemini 2.5 Pro, and in ChatGPT.
|
| "Draw me a timeline of all the dynasties of China. Imagine a
| horizontal line. Start from the leftmost point and draw segments
| for the start and end of each dynasty. For periods where multiple
| dynasties existed simultaneously draw parallel lines or boxes to
| represent the concurrent rule."
|
| Gemini's response: "I'm just a language model, so I can't help
| you with that."
|
| ChatGPT's response: an actual visual timeline.
| renewiltord wrote:
| All the communities where people think LLMs are junk love
| Gemini. Makes me sceptical that the enthusiasm is useful
| signal.
|
| I found the full 2.0 useful for transcription of images. Very
| good OCR. But not a good assistant. Stalls often and once it
| has, loses context easily.
| thegeomaster wrote:
| Is it possible that a community of people who are constantly
| pushing LLMs to their limits would be most aware of their
| limitations, and so more inclined to think they are junk?
|
| In terms of business utility, Google has had great releases
| ever since the 2.0 family. Their models have never missed
| _some_ mark --- either a good price /performance ratio,
| insane speeds, novel modalities (they still have the only API
| for autoregressive image generation atm), state-of-the-art
| long context support and coding ability (Gemini 2.5), etc.
|
| However, most average users are using these models through a
| chat-like UI, or via generic tools like Cursor, which don't
| really optimize their pipelines to capture the strengths of
| different models. This way, it's very difficult to judge a
| model objectively. Just look at the obscene sycophancy
| exhibited by chatgpt-4o-latest and how it lifted LMArena
| scores.
| renewiltord wrote:
| Just the fact that everyone on HN is always telling us how
| LLMs are useless but that Gemini is the best of them
| convinces me of the opposite. No one who can't find a use
| for this technology is really informed on the subject. Hard
| to take them seriously.
| ncr100 wrote:
| Worked for me in 2.5 Flash, text only:
|
| https://g.co/gemini/share/bcc257f9b0a0
| asim wrote:
| I just wish the whole industry would stop using terms like
| thinking and reasoning. This is not what's happening. If we could
| come up with more appropriate terms that don't treat these models
| like they're human then we'd be in a much better place. That
| aside, it's cool to see the advancement of Google's offering.
| dbbk wrote:
| Thinking perhaps, but why not reasoning?
| lyu07282 wrote:
| Do you think any machine will ever be able to think and/or
| reason? Or is that a uniquely human thing? and do you have a
| rational standard to judge when something is reasoning or
| thinking, or just vibes?
|
| I'm asking because I wonder how much of that common attitude is
| just a sort of species-chauvinism. You are feeling anxious
| because machines getting smarter, you are feeling anger because
| "they" are taking your job away, but the machine doesn't do
| that, its people with an ideology that do that, you should be
| angry at that instead.
| aerhardt wrote:
| I am only on OpenAI because they have a native Mac app. Call me
| old-school but my preferred workflow is still for the most part
| just asking narrow questions and copying-pasting back and forth.
| I've been playing with Junie (Jetbrain's AI agent) for a couple
| of days, but I still don't trust agents to run loose in my
| codebase for any sizeable amount of work.
|
| Does anyone know if Google is planning native apps? Or any
| wrapping interfaces that work well on a Mac?
| sweca wrote:
| Raycast[0] has Gemini support in their AI offering and it's
| native, fast and intuitive.
|
| [0] https://raycast.com/ai
| sweca wrote:
| Honestly, the _best_ part about Gemini, especially as a consumer
| product, is their super lax, or lack thereof, ratelimits. They
| never have capacity issues, unlike Claude which always feels slow
| or sometimes outright rejects requests during peak hours. Gemini
| is constantly speedy and has extremely generous context window
| limits on the Gemini apps.
| onlyrealcuzzo wrote:
| Interesting. I use Claude quite a bit, and haven't encountered
| this.
|
| Is this the free version of Claude or the paid version?
|
| When are peak hours typically (in what timezone)?
| bossyTeacher wrote:
| Is everyone on here solely evaluating the models on their
| programming capabilities? I understand this is HN but vibe coding
| LLM tools won't be able to sustain the LLM industry (let's not
| call it AI please)
| barfingclouds wrote:
| I just need the Gemini app to allow push to talk :( Otherwise
| it's not usable for me in the way I want it to be
___________________________________________________________________
(page generated 2025-04-18 23:01 UTC)