[HN Gopher] Claude for Excel
___________________________________________________________________
Claude for Excel
Author : meetpateltech
Score : 662 points
Date : 2025-10-27 16:09 UTC (1 days ago)
(HTM) web link (www.claude.com)
(TXT) w3m dump (www.claude.com)
| cube00 wrote:
| [flagged]
| sdsd wrote:
| Okay. But then you could say the same for a human, isn't your
| brain just a cloud of matter and electricity that just reacts
| to senses deterministically?
| cube00 wrote:
| > isn't your brain just a cloud of matter and electricity
| that just reacts to senses deterministically?
|
| LLMs are not deterministic.
|
| I'd argue over the short term humans are more deterministic.
| I ask a human the same question multiple times and I get the
| same answer. I ask an LLM and each answer could be very
| different depending on its "temperature".
| krzyk wrote:
| If you ask human the same question repeatedly, you'll get
| different answers. I think that at third you'll get "I
| already answered that" etc.
| worldsayshi wrote:
| We hardly react to things deterministically.
|
| But I agree with the sentiment. It seems it is more important
| than ever to agree on what it means to understand something.
| qwertox wrote:
| I'm having a bad day today. I'm 100% certain that today I'll
| react completely different to any tiny issue compared to how
| I did yesterday.
| sdsd wrote:
| Right, if you change the input to your function, you get a
| different output. By that logic, the function `(def (add a
| b) (+ a b)` isn't deterministic.
| NDizzle wrote:
| I mean - try clicking the CoPilot button and see what it can
| actually do. Last I checked, it told me it couldn't change any
| of the actual data itself, but it could give you suggestions.
| Low bar for excellence here.
| baal80spam wrote:
| OK then. Groks?
| dang wrote:
| " _Eschew flamebait. Avoid generic tangents._ "
|
| https://news.ycombinator.com/newsguidelines.html
| d--b wrote:
| Ok, they weren't confident enough to let the model actually edit
| the spreadsheet. Phew..
|
| Only a matter of time before someone does it though.
| cube00 wrote:
| When I think how easy I can misclick to stuff up a spreadsheet
| I can't begin to imagine all the subtle ways LLMs will screw
| them up.
|
| Unlike code where it's all on display, with all these formulas
| are hidden in each cell, you won't see the problem unless click
| on the cell so you'll have a hard time finding the cause.
| tln wrote:
| I wish Gemini could edit more in Google sheets and docs.
|
| Little stuff like splitting text more intelligently or
| following the formatting seen elsewhere would be very
| satisfying.
| password4321 wrote:
| How well does change tracking work in Excel... how hard would
| it be to review LLM changes?
|
| AFAIK there is no 'git for Excel to diff and undo', especially
| not built-in (aka 'for free' both cost-wise and add-ons/macros
| not allowed security-wise).
|
| My limited experience has been that it is difficult to keep
| LLMs from changing random things besides what they're asked to
| change, which could cause big problems if unattackable in
| Excel.
| NewsaHackO wrote:
| I thought there was track changes on all office products.
| Most Office documents are zip files of XML files and assets,
| so I'd imagine it would be possible to rollback changes.
| strange_quark wrote:
| Yet more evidence of the bubble burst being imminent. If any of
| these companies really had some almost-AGI system internally,
| they wouldn't be spending any effort making f'ing Excel plugins.
| Or at the very least, they'd be writing their own Excel because
| AI is so amazing at coding, right?
| qsort wrote:
| You wouldn't believe the amount of shit that runs on Excel.
| efields wrote:
| This. I work in Pharma. Excel and faxes.
| powvans wrote:
| Yes. I once interviewed a developer who's previous job was
| maintaining the .NET application that used an Excel sheet as
| the brain for decisions about where to drill for oil on the
| sea floor. No one understood what was in the Excel sheet. It
| was built by a geologist who was long gone. The engineering
| team understood the inputs and outputs. That's all they
| needed to know.
| mwigdahl wrote:
| Years ago when I worked for an engineering consulting
| company we had to work with a similarly complex, opaque
| Excel spreadsheet from General Electric modeling the
| operation of a nuclear power plant in exacting detail.
|
| Same deal there -- the original author was a genius and was
| the only person who knew how it was set up or how it
| worked.
| cube00 wrote:
| I spotted a custom dialog in an Excel spreadsheet in a
| medical context the other day, I was horrified.
| dickersnoodle wrote:
| Sic
| strange_quark wrote:
| I think you're misunderstanding me. This might be something
| somewhat useful, I don't know, and I'm not judging it based
| on that.
|
| What I'm saying is that if you really believed we were 2,
| maybe 3 years tops from AGI or the singularity or whatever
| you would spend 0 effort serving what already seems to be a
| domain that is already served by 3rd parties that are already
| using your models! An excel wrapper for an LLM isn't exactly
| cutting edge AI research.
|
| They're desperate to find something that someone will pay a
| meaningful amount of money for that even remotely justifies
| their valuation and continued investment.
| FergusArgyll wrote:
| A program that can do excel for you _is_ almost AGI
| pton_xd wrote:
| The fine tuning will continue until we reach AGI.
| amlib wrote:
| The fine tuning will continue until we reach the torment
| nexus, at best
| HDThoreaun wrote:
| The current valuations do not require AGI. They require
| products like this that will replace scores of people doing
| computer based grunt work. MSFT is worth $4 trillion off the
| back of enterprise productivity software, the AI labs just need
| some of that money.
| ipaddr wrote:
| You make a great point. Where is all of the complex
| applications? They haven't been able to create than own office
| suite or word processor or really anything aside from a
| halloween matching game in js. You would think we would have
| some complex application they can point to but nothing.
| mitjam wrote:
| Excel is living business knowlege stuck in private SharePoint
| Sites, tappimg into it might kick off a nice data flywheel not
| to speak of the nice TAM.
| jawns wrote:
| Gemini already has its hooks in Google Sheets, and to be honest,
| I've found it very helpful in constructing semi-complicated Excel
| formulas.
|
| Being able to select a few rows and then use plain language to
| describe what I want done is a time saver, even though I could
| probably muddle through the formulas if I needed to.
| break_the_bank wrote:
| I would recommend trying TabTabTab at https://tabtabtab.ai/
|
| It is an entire agent loop. You can ask it to build a multi
| sheet analysis of your favorite stock and it will. We are
| seeing a lot of early adopters use it for financial modeling,
| research automation, and internal reporting tasks that used to
| take hours.
| dangoodmanUT wrote:
| Gemini integratoins to Google workspace feels like it's using
| Gemini 1.5 flash, it's so comically bad at understanding and
| generating
| gumby271 wrote:
| Last time I tried using Gemini in Google Sheets it hallucinated
| a bunch of fake data, then gave me a summary that included all
| that fake data. I'd given it a bunch of transaction data, and
| asked it to group the records into different categories for
| budgeting. When asking it to give the largest values in each
| category, all the values that came back were fake. I'm not sure
| I'd really trust it to touch a spreadsheet after that.
| genrader wrote:
| you should:
|
| -stop using the free plan -don't use gemini flash for these
| tasks -learn how to do things over time and know that all ai
| models have improved significantly every few months
| ipaddr wrote:
| Or not use it.
| frankfrank13 wrote:
| I have had the opposite experience. I've never had Gemini give
| me something useful in sheets, and I'm not asking for
| complicated things. Like "group this data by day" or "give me
| p50 and p90"
| break_the_bank wrote:
| I forgot to add, you can try TabTabTab, without installing
| anything as well.
|
| To see something much more powerful on Google Sheets than
| Gemini for free, you can add "try@tabtabtab.ai" to your sheet,
| and make a comment tagging "try@tabtabtab.ai" and see it in
| action.
|
| If that is too much just go to ttt.new!
| soared wrote:
| It's interesting to me that this page talks a lot about
| "debugging models" etc. I would've expected (from the title) this
| to be going after the average excel user, similar to how chatgpt
| went after every day people.
|
| I would've expected "make a vlookup or pivot table that tells me
| x" or "make this data look good for a slide deck" to be easier
| problems to solve.
| burkaman wrote:
| I think this is aiming to be Claude Code for people who use
| Excel as a programming environment.
| layer8 wrote:
| The issue is that the average Excel user doesn't quite have the
| skills to validate and double-check the Excel formulas that
| Claude would produce, and to correct them if needed. It would
| be similar to a non-programmer vibe-coding an app. And that's
| really not what you want to happen for professionally used
| Excel sheets.
| soared wrote:
| IMO that is exactly what people want. At my work everyone
| uses LLMs constantly and the trade off of not perfect
| information is known. People double check it, etc, but the
| information search is so much faster even if it finds the
| right confluence but misquotes it, it still sends me the
| link.
|
| For easy spreadsheet stuff (which 80% of average white
| collars workers are doing when using excel) I'd imagine the
| same approach. Try to do what I want, and even if you're half
| wrong the good 50% is still worth it and a better starting
| point.
|
| Vibe coding an app is like vibe coding a "model in excel".
| Sure you could try, but most people just need to vibe code a
| pivot table
| extr wrote:
| I think actually Anthropic themselves are having trouble with
| imagining how this could be used. Coders think like coders -
| they are imagining the primary use case being managing large
| Excel sheets that are like big programs. In reality most Excel
| worksheets are more like tiny, one-off programs. More like
| scripts than applications. AI is very very good at scripts.
| burkaman wrote:
| I'm excited to see what national disasters will be caused by
| auto-generated Excel sheets that nobody on the planet
| understands. A few selections from past HN threads to prime your
| imagination:
|
| Thousands of unreported COVID cases:
| https://news.ycombinator.com/item?id=24689247
|
| Thousands of errors in genetics research papers:
| https://news.ycombinator.com/item?id=41540950
|
| Wrong winner announced in national election:
| https://news.ycombinator.com/item?id=36197280
|
| Countries across the world implement counter-productive economic
| austerity programs:
| https://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt#Metho...
| HPsquared wrote:
| Especially combined with the dynamic array formulas that have
| recently been added (LET, LAMBDA etc). You can have much more
| going on within each cell now. Think whole temporary data
| structures. The "evaluate formula" dialog doesn't quite cut it
| anymore for debugging.
| malthaus wrote:
| from my experience in the corporate world, i'd trust an excel
| generated / checked by an LLM more than i would one that has
| been organically grown over years in a big corporation where
| nobody ever checks or even can check anything because its one
| big growing pile of technical debt people just accept as
| working
| whalesalad wrote:
| I just want Claude inside of Metabase.
| adamfeldman wrote:
| https://www.metabase.com/features/metabot-ai
| asdev wrote:
| George Hotz said there's 5 tiers of AI systems, Tier 1 - Data
| centers, Tier 2 - fabs, Tier 3 - chip makers, Tier 4 - frontier
| labs, Tier 5 - Model wrappers. He said Tier 4 is going to eat all
| the value of Tier 5, and that Tier 5 is worthless. It's looking
| like that's going to be the case
| matsur wrote:
| People were saying the same thing about AWS vs SaaS ("AWS
| wrappers") a decade ago and none of that came to pass. Same
| will be true here.
| tln wrote:
| Claude is a model wrapper, no?
| piperswe wrote:
| Anthropic is a frontier lab, and Claude is a frontier model
| tln wrote:
| Anthropic models are Sonnet / Haiku / Opus
|
| https://docs.claude.com/en/docs/about-
| claude/models/overview
| piperswe wrote:
| Okay, Claude is a _family_ of frontier models then. IMO
| that's a pedantic distinction in this context.
| extr wrote:
| George Hotz says a lot of things. I think he's directionally
| correct but you could apply this argument to tech as a whole.
| Even outside of AI, there are plenty of niches where domain-
| specific solutions matter quite a bit but are too small for the
| big players to focus on.
| rudedogg wrote:
| Tier 5 requires domain expertise until we reach AGI or
| something very different from the latest LLMs.
|
| I don't think the frontier labs have the bandwidth or domain
| knowledge (or dare I say skills) to do tier 5 tasks well. Even
| their chat UIs leave a lot to be desired and that should be
| their core competency.
| benatkin wrote:
| Interesting. I found a reference to this in a tweet [1], and it
| looks to be a podcast. While I'm not extremely knowledgable.
| I'd put it like this: Tier 1 - fabs, Tier 2 - chip makers, Tier
| 3 - data centers, Tier 4 - frontier labs, Tier 5 - Model
| wrappers
|
| However I would think more of elite data centers rather than
| commodity data centers. That's because I see Tier 4 being
| deeply involved in their data centers and thinking of buying
| the chips to feed their data centers. I wouldn't be so inclined
| to throw in my opinion immediately if I found an article
| showing this ordering of the tiers, but being a tweet of a
| podcast it might have just been a rough draft.
|
| 1: https://x.com/tbpn/status/1935072881425400016
| mediaman wrote:
| That is a common refrain by people who have no domain expertise
| in anything outside of tech.
|
| Spend a few years in an insurance company, a manufacturing
| plant, or a hospital, and then the assertion that the frontier
| labs will figure it out appears patently absurd. (After all, it
| takes humans years to understand just a part of these
| institutions, and they have good-functioning memory.)
|
| This belief that tier 5 is useless is itself a tell of a
| vulnerability: the LLMs are advancing fastest in domain-
| expertise-free generalized technical knowledge; if you have no
| domain expertise outside of tech, you are most vulnerable to
| their march of capability, and it is those with domain
| expertise who will rely increasingly less on those who have
| nothing to offer but generalized technical knowledge.
| asdev wrote:
| yeah but if Anthropic/OpenAI dedicate resources to gaining
| domain expertise then any tier 5 is dead in the water. For
| example, they recently hired a bunch of finance professionals
| to make specialized models for financial modeling. Any
| startup in that space will be wiped out
| HDThoreaun wrote:
| I dont think the claim is exactly that tier 5 is useless more
| that tier 5 synergizes so well with tier 4 that all the
| popular tier 5 products will eventually be made by the tier 4
| companies.
| mitjam wrote:
| Andrew Ng argumented in 2023
| (https://www.youtube.com/watch?v=5p248yoa3oE ) that the
| underlying tiers depend on the app tier's success.
|
| That OpenAI is now apparantly striving to become the next big
| app layer company could hint at George Hotz being right but
| only if the bets work out. I'm glad that there is competition
| on the frontier labs tier.
| extr wrote:
| What is with the negativity in these comments? This is a huge,
| huge surface area that touches a large percentage of white collar
| work. Even just basic automation/scaffolding of spreadsheets
| would be a big productivity boost for many employees.
|
| My wife works in insurance operations - everyone she manages from
| the top down lives in Excel. For line employees a large
| percentage of their job is something like "Look at this internal
| system, export the data to excel, combine it with some other
| internal system, do some basic interpretation, verify it, make a
| recommendation". Computer Use + Excel Use isn't there yet...but
| these jobs are going to be the first on the chopping block as
| these integrations mature. No offense to these people but Sonnet
| 4.5 is already at the level where it would be able to replicate
| or beat the level of analysis they typically provide.
| cube00 wrote:
| I don't trust LLMs to do the kind of precise deterministic work
| you need in a spreadsheet.
|
| It's one thing to fudge the language in a report summary, it
| can be subjective, however numbers are not subjective. It's
| widely known LLMs are terrible at even basic maths.
|
| Even Google's own AI summary admits it which I was surprised
| at, marketing won't be happy.
|
| _Yes, it is true that LLMs are often bad at math because they
| don 't "understand" it as a logical system but rather process
| it as text, relying on pattern recognition from their training
| data._
| extr wrote:
| Seems like you're very confused about what this work
| typically entails. The job of these employees is not mental
| arithmatic. It's closer to:
|
| - Log in to the internal system that handles customer
| policies
|
| - Find all policies that were bound in the last 30 days
|
| - Log in to the internal system that manages customer
| payments
|
| - Verify that for all policies bound, there exists a
| corresponding payment that roughly matches the premium.
|
| - Flag any divergences above X% for accounting/finance to
| follow up on.
|
| Practically this involves munging a few CSVs, maybe typing in
| a few things, setting up some XLOOKUPs, IF formulas,
| conditional formatting, etc.
|
| Will AI replace the entire job? No...but that's not the goal.
| Does it have to be perfect? Also no...the existing employees
| performing this work are also not perfect, and in fact
| sometimes their accuracy is quite poor.
| Ntrails wrote:
| Checking someone elses spreadsheet is a fucking nightmare.
| If your company has extremely good standards it's less
| miserable because at least the formatting etc will be
| consistent...
|
| The one thing LLMs should consistently do is ensure that
| formatting is correct. Which will help greatly in the
| checking process. But no, I generally don't trust them to
| do sensible things with basic formulation. Not a week ago
| GPT 5 got confused whether a plus or a minus was necessary
| in a basic question of "I'm 323 days old, when is my
| birthday?"
| xmprt wrote:
| I think you have a misunderstanding of the types of
| things that LLMs are good at. Yes you're 100% right that
| they can't do math. Yet they're quite proficient at basic
| coding. Most Excel work is similar to basic coding so I
| think this is an area where they might actually be pretty
| well suited.
|
| My concern would be more with how to check the work (ie,
| make sure that the formulas are correct and no columns
| are missed) because Excel hides all that. Unlike code,
| there's no easy way to generate the diff of a spreadsheet
| or rely on Git history. But that's different from the
| concerns that you have.
| collingreen wrote:
| I've built spreadsheet diff tools on Google sheets
| multiple times. As the needs grows I think we will see
| diffs and commits and review tools reach customers
| break_the_bank wrote:
| hey Collin! I am working on an AI agent on Google Sheets,
| I am curious if any of your designs are out in the
| public. We are trying to re-think how diffs should look
| like and want to make something nicer than what we
| currently have, so curious.
| collingreen wrote:
| Hi! Nothing public nor generic enough to be a good
| building block. I found myself often frustrated by the
| tools that came out of the box but I believe better apis
| could make this slightly easier to solve.
|
| The UX of spreadsheet diffs is a hard one to solve
| because of how weird the calculation loops are and how
| complicated the relationship between fields might be.
|
| I've never tried to solve this for a real end user before
| in a generic way - all my past work here was for internal
| ability to audit changes and rollback catastrophes. I
| took a lot of shortcuts by knowing which cells are input
| data vs various steps of calculations -- maybe part of
| your ux is being able to define that on a sheet by sheet
| basis? Then you could show how different data (same
| formulas) changed outputs or how different formulas (same
| data) did differently?
|
| Spreadsheets are basically weird app platforms at this
| point so you might not be able to create a single
| experience that is both deep and generic. On the other
| hand maybe treating it as an app is the unlock? Get your
| AI to noodle on what the whole thing is for, then show
| diff between before and after stable states (after all
| calculation loops stabilize or are killed) side by side
| with actual diffs of actual formulas? I feel like Id want
| to see a diff as a live final spreadsheet and be able to
| click on changed cells and see up the chain of their
| calculations to the ancestors that were modified.
|
| Fun problem that sounds extremely complicated. Good luck
| distilling it!
| alfalfasprout wrote:
| proficient != near-flawless.
|
| > Most Excel work is similar to basic coding so I think
| this is an area where they might actually be pretty well
| suited.
|
| This is a hot take. One I'm not sure many would agree
| with.
| mguerville wrote:
| Excel work of people who make a living because of their
| excel skills (Bankers, VCs, Finance pros) is truly on the
| spectrum of basic coding. Excel use by others (Strategy,
| HR, etc.) is more like crude UI to manipulate small
| datasets (filter, sort, add, share and collaborate).
| Source: have lived both lives.
| Wowfunhappy wrote:
| > Yes you're 100% right that they can't do math.
|
| The model ought to be calling out to some sort of tool to
| do the math--effectively writing code, which it can do.
| I'm surprised the major LLM frontends aren't always doing
| this by now.
| mapt wrote:
| So do it in basic code where numbering your line G53
| instead of G$53 doesn't crash a mass transit network
| because somebody's algorithm forgot to order enough fuel
| this month.
| mr_toad wrote:
| > Most Excel work is similar to basic coding
|
| Excel is similar to coding in BASIC, a giant hairy ball
| of tangled wool.
| klausnrooster wrote:
| MS Office Tools menu has a "Spreadsheet Compare"
| application. It is quite good for diffing 2 spreadsheets.
| Of course it cannot catch logic errors, human or ML.
| runarberg wrote:
| > The one thing LLMs should consistently do is ensure
| that formatting is correct.
|
| In JavaScript (and I assume most other programming
| languages) this is the job of static analysis tools (like
| eslint, prettier, typescript, etc.). I'm not aware of any
| LLM based tools which performs static analysis with as
| good a results as the traditional tools. Is static
| analysis not a thing in the spreadsheet world? Are there
| the tools which do static analysis on spreadsheets
| subpar, or offer some disadvantage not seen in other
| programming languages? And if so, are LLMs any better?
| eric-burel wrote:
| Just use a normal static analysis tool and shove the
| result to an LLM. I believe Anthropic properly figured
| that agents are the key, in addition to models, contrary
| to OpenAI that is run by a psycho that only believes in
| training the bigger model.
| koliber wrote:
| Maybe LLMs will enable a new type of work in
| spreadsheets. Just like in coding we have PR reviews,
| with an LLM it should be possible to do a spreadsheet
| review. Ask the LLM to try to understand the intent and
| point out places where the spreadsheet deviates from the
| intent. Also ask the LLM to narrate the spreadsheet so it
| can be understood.
| Insanity wrote:
| That first condition "try to understand the intent" is
| where it could go wrong. Maybe it thinks the spreadsheet
| aligns with the intent, but it misunderstood the intent.
|
| LLMs are a lossy validation, and while they work
| sometimes, when they fail they usually do so 'silently'.
| monkeydust wrote:
| Maybe we need some kind of method, framework to develop
| intent. Most of things that go wrong in knowledge working
| are down to lack of common understanding of intent.
| lossolo wrote:
| Last time, I gave claude an invoice and asked it to change
| one item on it, it did so nicely and gave me the new
| invoice. Good thing I noticed it had also changed the bank
| account number..
|
| The more complicated the spreadsheet and the more
| dependencies it has, the greater the room for error. These
| are probabilistic machines. You can use them, I use them
| all the time for different things, but you need to treat
| them like employees you can't even trust to copy a bank
| account number correctly.
| mikeyouse wrote:
| We've tried to gently use them to automate some of our
| report generation and PDF->Invoice workflows and it's a
| nightmare of silent changes and absence of logic.. basic
| things like specifically telling it "debits need to match
| credits" and "balance sheets need to balance" that are
| ignored.
| wholinator2 wrote:
| Yeah, asking llm to edit one specific thing in a large or
| complex document/ codebase is like those repeated "give
| me the exact same image" gifs. It's fundamentally a
| statistical model so the only thing we can be _certain_
| of is that _it's not_. It might get the desired change
| 100% correct but it's only gonna get the entire document
| 99 5%
| onion2k wrote:
| Something that Claude Sonnet does when you use it to code
| is write scripts to test whether or not something is
| working. If it does that for Excel (e.g. some form of
| verification) it should be fine.
|
| Besides, using AI is an exercise in a "trust but verify"
| approach to getting work done. If you asked a junior to
| do the task you'd check their output. Same goes for AI.
| dpoloncsak wrote:
| Sysadmin of a small company. I get asked pretty often to
| help with a pivot table, vlookup, or just general excel
| functions (and smartsheet, these users LOVE smartsheet)
| toomuchtodo wrote:
| Indeed, in a small enough org, the sysadmin/technologist
| becomes support of last resort for all the things.
| JumpCrisscross wrote:
| > _these users LOVE smartsheet_
|
| I hate smartsheet...
|
| Excel or R. (Or more often, regex followed by pen and
| paper followed by more regex.)
| dpoloncsak wrote:
| They're coming to me for pivot tables....
|
| Handing them regex would be like giving a monkey a
| bazooka
| AvAn12 wrote:
| > "Does it have to be perfect?"
|
| Actually, yes. This kind of management reporting is either
| (1) going to end up in the books and records of the company
| - big trouble if things have to be restated in the future
| or (2) support important decisions by leadership -- who
| will be very much less than happy if analysis turns out to
| have been wrong.
|
| A lot of what ties up the time of business analysts is
| ticking and tying everything to ensure that mistakes are
| not made and that analytics and interpretations are
| consistent from one period to the next. The math and
| queries are simple - the details and correctness are hard.
| 2b3a51 wrote:
| There is another aspect to this kind of activity.
|
| Sometimes there can be an advantage in leading or lagging
| some aspects of internal accounting data for a time
| period. Basically sitting on credits or debits to some
| accounts for a period of weeks. The tacit knowledge to
| know when to sit on a transaction and when to action it
| is generally not written down in formal terms.
|
| I'm not sure how these shenanigans will translate into an
| ai driven system.
| AvAn12 wrote:
| That's the kind of thing that can get a company into a
| lot of trouble with its auditors and shareholders. Not
| that I am offering accounting advice of course. And yeah,
| one can not "blame" and ai system or try to ai-wash any
| dodgy practices.
| iamacyborg wrote:
| > Sometimes there can be an advantage in leading or
| lagging some aspects of internal accounting data for a
| time period.
|
| This worked famously well for Enron.
| extr wrote:
| Speak for yourself and your own use cases. There are a
| huge diversity of workflows with which to apply
| automation in any medium to large business. They all have
| differing needs. Many excel workflows I'm personally
| familiar with already incoporate a "human review" step.
| Telling a business leader that they can now jump straight
| to that step, even if it requires 2x human review, with
| AI doing all of the most tediuous and low-stakes prework,
| is a clear win.
| Revanche1367 wrote:
| >Speak for yourself and your own use cases
|
| Take your own advice.
| extr wrote:
| I'm taking a much weaker position than the respondent:
| LLMs are useful for many classes of problem that do not
| require zero shot perfect accuracy. They are useful in
| contexts where the cost of building scaffolding around
| them to get their accuracy to an acceptable level is less
| than the cost of hiring humans to do the same work to the
| same degree of accuracy.
|
| This is basic business and engineering 101.
| Barbing wrote:
| >LLMs are useful for many classes of problem that do not
| require zero shot perfect accuracy. They are useful in
| contexts where the cost of building scaffolding around
| them to get their accuracy to an acceptable level is less
| than the cost of hiring humans to do the same work to the
| same degree of accuracy.
|
| Well said. Concise and essentially inarguable, at least
| to the extent it means LLMs are here to stay in the
| business world whether anyone likes it or not (barring
| the unforeseen, e.g. regulation or another pressure).
| jacksnipe wrote:
| Is this not belligerently ignoring the fact that this
| work is already done imperfectly? I can't tell you how
| many serious errors I've caught in just a short time of
| automating the generation of complex spreadsheets from
| financial data. All of them had already been checked by
| multiple analysts, and all of them contained serious
| errors (in different places!)
| harrall wrote:
| There's actually different classes of errors though.
| There's errors in the process itself versus errors that
| happen when performing the process.
|
| For example, if I ask you to tabulate orders via a query
| but you forgot to include an entire table, this is a
| major error of process but the query itself actually is
| consistently error-free.
|
| Reducing error and mistakes is very much modeling where
| error can happen. I never trust an LLM to interpret data
| from a spreadsheet because I cannot verify every
| individual result, but I am willing to ask an LLM to
| write a macro that tabulates the data because I can
| verify the algorithm and the macro result will always be
| consistent.
|
| Using Claude to interpret the data directly for me is
| scary because those kinds of errors are neither
| verifiable nor consistent. At least with the "missing
| table" example, that error may make the analysis
| completely bunk but once it is corrected, it is always
| correct.
| AvAn12 wrote:
| Very much agreed
| AvAn12 wrote:
| No belligerence intended! Yes, processes are faulty today
| even with maker-checker and other QA procedures. To me it
| seems the main value of LLMs in a spreadsheet-heavy
| process is acceleration - which is great! What is harder
| is quality assurance - like the example someone gave
| regarding deciding when and how to include or exclude
| certain tables, date ranges, calc, etc. Properly
| recording expert judgment and then consistently applying
| that judgement over time is key. I'm not sure that is the
| kind of thing LLMs are great at, even ignoring their
| stochastic nature. Let's figure out how to get best use
| out of the new kit - and like everything else, focus on
| achieving continuously improving outcomes.
| next_xibalba wrote:
| The use cases for spreadsheets are much more diverse than
| that. In my experience, spreadsheets just as often used for
| calculation. Many of them do require high accuracy, rely on
| determinism, and necessitate the understanding of maths
| ranging from basic arithmetic to statistics and engineering
| formulas. Financial models, for example, must be built up
| from ground truth and need to always use the right formulas
| with the right inputs to generate meaningful outputs.
|
| I have personally worked with spreadsheet based financial
| models that use 100k+ rows x dozens of columns and involve
| 1000s of formulas that transform those data into the
| desired outputs. There was very little tolerance for
| mistakes.
|
| That said, humans, working in these use cases, make
| mistakes >0% of the time. The question I often have with
| the incorporation of AI into human workflows is, will we
| eventually come to accept a certain level of error from
| them in the way we do for humans?
| jay_kyburz wrote:
| >Does it have to be perfect? Also no.
|
| Yeah, but it could be perfect, why are there humans in the
| loop at all? That is all just math!
| mrcwinn wrote:
| I couldn't agree more. I get all my perfectly deterministic
| work output from human beings!
| goatlover wrote:
| If only we had created some device that could perform
| deterministic calculations and then wrote software that
| made it easy for humans to use such calculations.
| bryanrasmussen wrote:
| ok but humans are idiots, if only we could make some sort
| of Alternate Idiot, a non-human but every bit as
| generally stupid as humans are! This A.I would be able to
| do every stupid thing humans did with the device that
| performed deterministic calculations only many times
| faster!
| baconbrand wrote:
| Yes and when the AI did that all the stupid humans could
| accept its output without question. This would save the
| humans a lot of work and thought and personal
| responsibility for any mistakes! See also Israel's
| Lavender for an exciting example of this in action.
| laweijfmvo wrote:
| I don't trust humans to do the kind of precise deterministic
| work you need in a spreadsheet!
| baconbrand wrote:
| Right, we shouldn't use humans or LLMs. We should use
| regular deterministic computer programs.
|
| For cases where that is not available, we should use a
| human and never an LLM.
| extr wrote:
| "regular deterministic computer programs" - otherwise
| known as the SUM function in Microsoft Excel
| davidpolberger wrote:
| I like to use Claude Code to write deterministic computer
| programs for me, which then perform the actual work. It
| saves a lot of time.
|
| I had a big backlog of "nice to have scripts" I wanted to
| write for years, but couldn't find the time and energy
| for. A couple of months after I started using Claude
| Code, most of them exist.
| baconbrand wrote:
| That's great and the only legitimate use case here. I
| suspect Microsoft will not try to limit customers to just
| writing scripts and will instead allow and perhaps even
| encourage them to let the AI go ham on a bunch of raw
| data with no intermediary code that could be reviewed.
|
| Just a suspicion.
| doug_durham wrote:
| Sure, but this isn't requiring that the LLM do any math. The
| LLM is writing formulas and code to do the math. They are
| very good at that. And like any automated system you need to
| review the work.
| causal wrote:
| Exactly, and if it can be done in a way that helps users
| better understand their own spreadsheets (which are often
| extremely complex codebases in a single file!) then this
| could be a huge use case for Claude.
| bg24 wrote:
| "I don't trust LLMs to do the kind of precise deterministic
| work" => I think LLM is not doing the precise arithmetic. It
| is the agent with lots of knowledge (skills) and tools.
| Precise deterministic work is done by tools (deterministic
| code). Skills brings domain knowledge and how to sequence a
| task. Agent executes it. LLM predicts the next token.
| zarmin wrote:
| >I don't trust LLMs to do the kind of precise deterministic
| work
|
| not just in a spreadsheet, any kind of deterministic work at
| all.
|
| find me a reliable way around this. i don't think there is
| one. mcp/functions are a band aid and not consistent enough
| when precision is important.
|
| after almost three years of using LLMs, i have not found a
| single case where i didn't have to review its output, which
| takes as long or longer than doing it by hand.
|
| ML/AI is not my domain, so my knowledge is not deep nor
| technical. this is just my experience. do we need a new
| architecture to solve these problems?
| baconbrand wrote:
| ML/AI is not my domain but you don't have to get all that
| technical to understand that LLMs run on probability. We
| need a new architecture to solve these problems.
| chpatrick wrote:
| They're not great at arithmetic but at abstract mathematics
| and numerical coding they're pretty good actually.
| mhh__ wrote:
| If LLMs can replace mathematica for me when I'm doing affine
| yield curve calculations they can do a DCF for some banker
| idiots
| sdeframond wrote:
| > I don't trust LLMs to do the kind of precise deterministic
| work you need in a spreadsheet.
|
| Rightly so! But LLMs can still make you faster. Just don't
| expect _too much_ from it.
| mbreese wrote:
| I don't see the issue so much as the deterministic precision
| of an LLM, but the lack of observability of spreadsheets.
| Just looking at two different spreadsheets, it's impossible
| to see what changes were made. It's not like programming
| where you can run a `git diff` to see what changes an LLM
| agent made to a source code file. Or even a word processing
| document where the text changes are clear.
|
| Spreadsheets work because the user sees the results of
| complex interconnected values and calculations. For the user,
| that complexity is hidden away and left in the background.
| The user just sees the results.
|
| This would be a nightmare for most users to validate what
| changes an LLM made to a spreadsheet. There could be
| fundamental changes to a formula that could easily be hidden.
|
| For me, that the concern with spreadsheets and LLMs - which
| is just as much a concern with spreadsheets themselves. Try
| collaborating with someone on a spreadsheet for modeling and
| you'll know how frustrating it can be to try and figure out
| what changes were made.
| informal007 wrote:
| you might trust when the precision is extremely high and
| others agree with that.
|
| high precision is possible because they can realize that by
| multiple cross validations
| prisonguard wrote:
| ChatGPT is actively being used as a calculator.
| game_the0ry wrote:
| _> I don 't trust LLMs to do the kind of precise
| deterministic work you need in a spreadsheet._
|
| I was thinking along the same lines, but I could not
| articulate as well as you did.
|
| Spreadsheet work is deterministic; LLM output is
| probabilistic. The two should be distinguished.
|
| Still, its a productivity boost, which is always good.
| Kiro wrote:
| Most real-world spreadsheets I've worked with were fragile
| and sloppy, not precise and deterministic. Programmers always
| get shocked when they realize how many important things are
| built on extremely messy spreadsheets, and that people simply
| accept it. They rather just spend human hours correcting
| discrepancies than trying to build something maintainable.
| bonoboTP wrote:
| Usually this is very hard because the tasks and the job
| often subtly shifts in somewhat unpredictable and
| unforeseen ways and there is no neat clean abstraction that
| you can just implement as an application. Too
| hererogeneous, too messy, too many exceptions. If you
| develop some clean elegant solution, next week there will
| be something that your shiny app doesn't allow and they'd
| have to submit a feature request or whatever.
|
| In Excel, it's possible to just ad hoc adjust things and
| make it up as you go. It's not clean but very adaptable and
| flexible.
| MangoCoffee wrote:
| LLMs are just a tool, though. Humans still have to verify
| them, like with very other tools out there
| A4ET8a8uTh0_v2 wrote:
| Eh, yes. In theory. In practice, and this is what I have
| experienced personally, bosses seem to think that you now
| have interns so you should be able to do 5x the output..
| guess what that means. No verification or rubber stamp.
| brookst wrote:
| Do you trust humans to be precise and deterministic, or even
| to be especially good at math?
|
| This is talking about applying LLMs to formula creation and
| references, which they are actually pretty good at.
| Definitely not about replacing the spreadsheet's calculation
| engine.
| amrocha wrote:
| I trust humans to not be able to shoot the company on the
| foot without even realizing it.
|
| Why are we suddenly ok with giving every underpaid and
| exploited employee a foot gun and expect them to be
| responsible with it???
| onion2k wrote:
| _It 's widely known LLMs are terrible at even basic maths._
|
| Claude for Excel isn't doing maths. It's doing Excel. If the
| llm is bad at maths then teaching it to use a tool that's
| good at maths seems sensible.
| pavel_lishin wrote:
| My concern is that my insurance company will reject a claim, or
| worse, because of something an LLM did to a spreadsheet.
|
| Now, granted, that can also happen because Alex fat-fingered
| something in a cell, but that's something that's much easier to
| track down and reverse.
| manquer wrote:
| They already doing that with AI, rejecting claims at higher
| numbers than before .
|
| Privatized insurance will always find a way to pay out less
| if they could get away with it . It is just nature of having
| the trifecta of profit motive , socialized risk and light
| regulation .
| philipallstar wrote:
| > It is just nature of having the trifecta of profit motive
| , socialized risk and light regulation.
|
| It's the nature of everything. They agree to pay you for
| something. It's nothing specific to "profit motive" in the
| sense you mean it.
| manquer wrote:
| I should have been clearer - profit maximization above
| all else as long it is mostly legal. Neither profit or
| profit maximization at all cost is nature of everything .
|
| There are many other entity types from unions[1],
| cooperatives , public sector companies , quasi government
| entities, PBC, non profits that all offer insurance and
| can occasionally do it well.
|
| We even have some in the US and don't think it is
| communism even - like the FDIC or things like social
| security/ unemployment insurance.
|
| At some level government and taxation itself is nothing
| but insurance ? We agree to paying taxes to mitigate
| against variety of risks including foreign invasion or
| smaller things like getting robbed on the street.
|
| [1] Historically worker collectives or unions self-
| organized to socialize the risks of both major work
| ending injuries or death.
|
| Ancient to modern armies operate on because of this
| insurance the two ingredients that made them not
| mercenaries - a form of long term insurance benefit
| (education, pension, land etc) or family members in the
| event of death and sovereign immunity for their actions.
| JumpCrisscross wrote:
| > _They already doing that with AI, rejecting claims at
| higher numbers than before_
|
| Source?
| nartho wrote:
| Haven't risk based models been a thing for the last 15-20
| years ?
| keernan wrote:
| >>They already doing that with AI, rejecting claims at
| higher numbers than before .
|
| That's a feature, not a bug.
| elpakal wrote:
| This is a great application of this quote. Insurance
| providers have 0 incentive to make their AI "good" at
| processing claims, in fact it's easy to see how "bad" AI
| can lead to a justification to deny more claims.
| bonoboTP wrote:
| The question is how you define good. They surely want the
| Ai to be good in the sense that it rejects all claims
| that they think can get away with rejecting. But it
| should not reject those where rejection likely results in
| litigation and losing and having to pay damages.
| jimbokun wrote:
| Couldn't they accomplish the same thing by rejecting a
| certain percentage of claims totally at random?
| manquer wrote:
| That would be illegal though, the goal is do this legally
| after all.
|
| We also have to remember all claims aren't equal. i.e.
| some claims end up being way costlier than others. You
| can achieve similar % margin outcomes by putting a ton of
| friction like, preconditions, multiple appeals processes
| and prior authorization for prior authorization, reviews
| by administrative doctors who have no expertise in the
| field being reviewed don't have to disclose their
| identity and so and on.
|
| While U.S. system is most extreme or evolved, it is not
| unique, it is what you get when you end up privatize
| insurance any country with private insurance has some
| lighter version of this and is on the same journey .
|
| Not that public health system or insurance a la NHS in UK
| or like Germany work, they are underfunded, mismanaged
| with long times in months to see a specialist and so on.
|
| We have to choose our poison - unless you are rich of
| course, then the U.S. system is by far the best, people
| travel to the U.S. to get the kind of care that is not
| possible anywhere else.
| jimbokun wrote:
| Why does saying "AI did it" make it legal, if the outcome
| is the same?
| nxobject wrote:
| > While U.S. system is most extreme or evolved, it is not
| unique, it is what you get when you end up privatize
| insurance any country with private insurance has some
| lighter version of this and is on the same journey .
|
| I disagree with the statement that healthcare insurance
| is predominantly privatized in the US: Medicare and
| Medicaid, at least in 2023, outspent private plans for
| healthcare spending by about ~10% [1]; this is before
| accounting for government subsidies for private plans.
| And boy, does America have a very unique relationship
| with these programs.
|
| https://www.healthsystemtracker.org/chart-collection/u-s-
| spe...
| manquer wrote:
| It is more nuanced, for example Medicare Advantage(Part
| C) is paid by Medicare money but it is profitable private
| operators who provide the plans and service it a fast
| growing part of Medicare .
|
| John Oliver had an excellent segment coincidentally
| yesterday on this topic.
|
| While the government pays for it, it is not managed or
| run by them so how to classify the program as public or
| private ?
| jimbokun wrote:
| That's a great and thorough analysis!
|
| My take away is that as public health costs are
| overtaking private insurance and at the same time doing a
| better job controlling costs per enrollee, it makes more
| and more sense just to have the government insure
| everyone.
|
| I can't see what argument the private insurers have in
| their favor.
| smithkl42 wrote:
| If you think that insurance companies have "light
| regulation", I shudder to think of what "heavy regulation"
| would look like. (Source: I'm the CTO at an insurance
| company.)
| lotsofpulp wrote:
| They have too much regulation, and too little auditing
| (at least in the managed healthcare business).
| nxobject wrote:
| I agree, _and_ I can see where it comes from (at least at
| the state level). The cycle is: bad trend happens that
| has deep root causes (let 's say PE buying rural
| hospitals because of reduced Medicaid/Medicare
| reimbursements); legislators (rightfully) say "this
| shouldn't happen", but don't have the ability to address
| the deep root causes so they simply regulate healthcare
| M&As - now you have a bandaid on a problem that's going
| to pop up elsewhere.
| lotsofpulp wrote:
| I mean even in the simple stuff like denying payment for
| healthcare that should have been covered. CMS will come
| by and out a handful of cases, out of millions, every few
| years.
|
| So obviously the company that prioritizes accuracy of
| coverage decisions by spending money on extra labor to
| audit itself is wasting money. Which means insureds have
| to waste more time getting the payment for healthcare
| they need.
| manquer wrote:
| Light did not mean to imply quantity of paperwork you
| have to do, rather are you allowed to do the things you
| want to do as a company.
|
| More compliance or reporting requirements usually tend to
| favor the larger existing players who can afford to do it
| and that is also used to make the life difficult and
| reject more claims for the end user.
|
| It is kind of thing that keeps you and me busy, major
| investors don't care about it all, the cost of the
| compliance or the lack is not more than a rounding number
| in the balance, the fines or penalties are puny and
| laughable.
|
| The enormous profits year on year for decades now, the
| amount of consolidation allowed in the industry show that
| the industry is able to do mostly what they want pretty
| much, that is what I meant by light regulation.
| smithkl42 wrote:
| I'm not sure we're looking at the same industry. Overall,
| insurance company profit margins are in the single
| digits, usually low single digits - and in many segments,
| they're frequently not profitable at all. To take one
| example, 2024 was the first profitable year for
| homeowners insurance companies since 2019, and even then,
| the segment's entire profit margin was 0.3% (not 3% -
| 0.3%).
|
| https://riskandinsurance.com/us-pc-insurance-industry-
| posts-...
| bonoboTP wrote:
| It's an accounting 101 thing to use all tricks in the
| book to reduce the reported profit, to avoid paying taxes
| on that profit.
| zetazzed wrote:
| The total profit of ALL US health insurance companies
| added together was $9bln in 2024:
| https://content.naic.org/sites/default/files/2024-annual-
| hea.... This is a profit margin of 0.8% down from 2.2% in
| the previous year.
|
| Meta alone made $62bln in 2024:
| https://investor.atmeta.com/investor-news/press-release-
| deta...
|
| So it's weird to see folks on a tech site talking about
| how enormous all the profits are in health insurance, and
| citations with numbers would be helpful to the
| discussion.
|
| I worked in insurance-related tech for some time, and the
| providers (hospitals, large physician groups) and
| employers who actually pay for insurance have signficant
| market power in most regions, limiting what insurers can
| charge.
| wombatpm wrote:
| Wait until a company has to restate earnings because of a bug
| in a Claudified Excel spreadsheet.
| doctorpangloss wrote:
| > What is with the negativity in these comments?
|
| Some people - normal people - understand the difference between
| the holistic experience of a mathematically informed opinion
| and an actual model.
|
| It's just that normal people always wanted the holistic
| experience of an answer. Hardly anyone wants a right answer.
| They have an answer in their heads, and they want a defensible
| journey to that answer. That is the purpose of Excel in 95% of
| places it is used.
|
| Lately people have been calling this "syncophancy." This was
| always the problem. Sycophancy is the product.
|
| Claude Excel is leaning deeply into this garbage.
| extr wrote:
| It seems like to me the answer is moreso "People on HN are so
| far removed from the real use cases for this kind of
| automation they simply have no idea what they're talking
| about".
| genrader wrote:
| This is so correct it hurts
| intended wrote:
| I used to live in excel.
|
| The issue isn't in creating a new monstrosity in excel.
|
| The issue is the poor SoB who has to spelunk through the damn
| thing to figure out what it does.
|
| Excel is the sweet spot of just enough to be useful, capable
| enough to be extensible, yet gated enough to ensure everyone
| doesn't auto run foreign macros (or whatever horror is more
| appropriate).
|
| In the simplest terms - it's not excel, it's the business
| logic. If an excel file works, it's because theres someone who
| "gets" it in the firm.
| extr wrote:
| I used to live in Excel too. I've trudged through plenty of
| awful worksheets. The output I've seen from AI is actually
| more neatly organized than most of what I used to receive in
| outlook. Most of that wasn't hyper-sophisticated cap table
| analyses. It was analysis from a Jr Analyst or line employee
| trying to combine a few different data sources to get some
| signal on how XYZ function of the business was performing. AI
| automation is perfectly suitable for this.
| intended wrote:
| How?
|
| Neat formatting didn't save any model from having the wrong
| formula pasted in.
|
| Being neat was never a substitute for being well rested, or
| sufficiently caffeinated.
|
| Have you seen how AI functions in the hands of someone who
| isn't a domain expert? I've used it for things I had no
| idea about, like Astro+ web dev. User ignorance was
| magnified spectacularly.
|
| This is going to have Jr Analysts dumping well formatted
| junk in email boxes within a month.
| gedy wrote:
| It's actually really cool. I will say that "spreadsheets"
| remain a bandaid over dysfunctional UIs, processes, etc and
| engineering spends a lot of time enabling these bandaids vs
| someone just saying "I need to see number X" and not "a BI
| analytics data in a realtime spreadsheet!", etc.
| gadders wrote:
| Yeah, this could be a pretty big deal. Not everyone is an excel
| expert, but nearly everyone finds themselves having to work
| with data in excel at some time or other.
| hbarka wrote:
| What does scaffolding of spreadsheets mean? I see the term
| scaffolding frequently in the context of AI-related articles
| and not familiar with this method and I'm hesitant to ask an
| LLM.
| Rudybega wrote:
| Scaffolding typically just refers to a larger state machine
| style control flow governing an agent's behavior and the
| suite of external tools it has access to.
| behnamoh wrote:
| > How teams use Claude for Excel
|
| Who are these teams that can get value from Anthropic? One MCP
| and my context window is used up and Claude tells me to start a
| new chat.
| fragmede wrote:
| MCPs and context window sizing, putting the engineering into
| prompt engineering.
| BuildItBusk wrote:
| I have to admit that my first thought was "April's fool". But
| you are right. It makes a lot of sense (if they can get it to
| work well). Not only is Excel the world's biggest "programming
| language". It's probably also one of the most unintuitive ways
| to program.
| adastra22 wrote:
| Why unintuitive?
| baq wrote:
| If you exclude macros with IO it's actually the most popular
| purely functional programming language (no quotes) on the
| planet by far.
| tokai wrote:
| Whats with claiming negativity when most of the comments here
| are positive?
| bartvk wrote:
| I have to remember this one. Waltz into the room and
| proclaim, why is everyone so negative? It's great because x,
| y and z. It looks pretty great.
| protonbob wrote:
| > but these jobs are going to be the first on the chopping
| block as these integrations mature.
|
| Perhaps this is part of the negativity? This is a bad thing for
| the middle class.
| jpadkins wrote:
| in the short run. In the long run, productivity gains
| benefit* all of us (in a functional market economy).
|
| *material benefit. In terms of spirit and purpose, the older
| I get the more I think maybe the Amish are on to something.
| Work gives our lives purpose, and the closer the work is to
| our core needs, the better it feels. Labor saving so that
| most of us are just entertaining each other on social
| networks may lead to a worse society (but hey, our material
| needs are met!)
| informal007 wrote:
| agree with you, but it cannot be stopped. development of
| technology always makes wealth distribution more centralized
| bartvk wrote:
| I kind of get what you're saying but can you explain your
| reasoning or provide a source?
| Workaccount2 wrote:
| I think excel is a dead end. LLM agents will probably greatly
| prefer SQL, sqlite, and Python instead of bulky made-for-
| regular-folks excel.
|
| Versatility and efficiency explode while human usability tanks,
| but who cares at that point?
| informal007 wrote:
| Database might be the future, but viable solution on excel
| are evidence to prove that it works
| informal007 wrote:
| this will push the development of open source models.
|
| people think of privacy at first regards of data, local
| deployment of open source models are the first choice for them
| threetonesun wrote:
| Probably because many people here are software developers, and
| wrapping spreadsheets in deterministic logic and a consistent
| UI covers... most software use cases.
| Scubabear68 wrote:
| Having wrangled many spreadsheets personally, and worked with
| CFOs who use them to run small-ish businesses, and all the way
| up to one of top 3 brokerage houses world-wide using them to
| model complex fixed income instruments... this is a disaster
| waiting to happen.
|
| Spreadsheet UI is already a nightmare. The formula editing and
| relationship visioning is not there at all. Mistakes are
| rampant in spreadsheets, even my own carefully curated ones.
|
| Claude is not going to improve this. It is going to make it
| far, far worse with subtle and not so subtle hallucinations
| happening left and right.
|
| The key is really this - all LLMs that I know of rely on
| entropy and randomness to emulate human creativity. This works
| pretty well for pretty pictures and creating fan fiction or
| emulating someone's voice.
|
| It is not a basis for getting correct spreadsheets that show
| what you want to show. I don't want my spreadsheet correctness
| to start from a random seed. I want it to spring from first
| principles.
| noosphr wrote:
| My first job out of uni was building a spreadsheet infra as
| code version control system after a Windows update made an
| eight year old spreadsheet go haywire and lose $10m in a
| afternoon.
|
| Spreadsheets are already a disaster.
| daveguy wrote:
| > Spreadsheets are already a disaster.
|
| Yeah, that's what OP said. Now add a bunch of random
| hallucinations hidden inside formulas inside cells.
|
| If they really have a good spreadsheet solution they've
| either fixed the spreadsheet UI issues or the LLM
| hallucination issues or both. My guess is neither.
| sally_glance wrote:
| Compared to what? Granted, Excel incidents are probably
| underreported and might produce "silent" consequential
| losses. But compared to that, for enterprise or custom
| software in general we have pretty scary estimates of the
| damages. Like Y2K (between 300-600bn) and the UK Postal
| Office thing (~1bn).
| array_key_first wrote:
| Excel spreadsheets ARE custom software, with custom
| requirements, calculations, and algorithms. They're just
| not typically written by programmers, have no version
| control or rollback abilities, are not audited, are not
| debuggable, and are typically not run through QA or QC.
| iambateman wrote:
| If I could teach managers one lesson, it would be this
| one.
| jackcviers3 wrote:
| I'll add to this - if you work on a software project to
| port an excel spreadsheet to real software that has all
| those properties, if the spreadsheet is sophisticated
| enough to warrant the process, the creators won't be able
| to remember enough details abut how they created it to
| tell you the requirements necessary to produce the
| software. You may do all the calculations right, and
| because they've always had a rounding error that they've
| worked around somewhere else, your software shows
| calculations that have driven business decisions for
| decades were always wrong, and the business will insist
| that the new software is wrong instead of owning some
| mistake. It's never pretty, and it always governs
| something extremely important.
| calgoo wrote:
| Now, if we could give that excel file to an llm and it
| creates a design document that explains everything is
| does, then that would be a great use of an LLM.
| pjmlp wrote:
| Thing is, they are also the common workaround solution
| for savy office workers that don't want to wait for the
| IT department if it exists, or some outsourced
| consultancy, to finally deliver something that only does
| half the job they need.
|
| So far no one has managed to deliver an alternative to
| spreedsheets that fix this issue, doesn't matter if we
| can do much better in Python, Java, C# whatever, if it is
| always over budget and only covers half of the work.
|
| I know, I have taken part in such project, and it run
| over budget because there was always that little workflow
| super easy to do in Excel and they would refuse to adopt
| the tool if it didn't cover that workflow as well.
| gpderetta wrote:
| exactly. And Claude and other code assistants are more of
| the same, allowing non-programmers[1] to write code for
| their needs. And that's a good thing overall.
|
| [1] well, people that don't consider themselves
| programmers.
| sally_glance wrote:
| Agreed. The tradition has been continued by workflow
| engines, low code tools, platforms like Salesforce and
| lately AI-builders. The issue is generally not that these
| are bad, but because they don't _feel_ like software
| development everyone is comfortable skipping steps of the
| development process.
|
| To be fair, I've seen shops which actually apply good
| engineering practices to Excel sheets too. Just
| definitely not a majority...
| pjmlp wrote:
| Sometimes it isn't that folks are confortable skipping
| steps, rather they aren't even available.
|
| As so happens in the LLM age, I have been recently having
| to deal with such tools, and oh boy Smalltalk based image
| development in the 1990's with Smalltalk/V is so much
| better in regards to engineering practices than those
| "modern" tools.
|
| I cannot test code, if I want to backup to some version
| control system, I have to manually export/import a
| gigantic JSON file that represents the low-code workflow
| logic, no proper debugging tools, and so many other
| things I could rant about.
|
| But I guess this is the future, AI agents based workflow
| engines calling into SaaS products, deployed in a MACH
| architecture. Great buzzword bingo, right?
| p4ul wrote:
| It's interesting that you mention disaster; there is at
| least one annual conference dedicated to "spreadsheet risk
| management".[1]
|
| [1] https://eusprig.org/
| anitil wrote:
| I know you probably can't share the details, but if you can
| I (and I'm sure all of us) would love to hear them
| MattGaiser wrote:
| > Mistakes are rampant in spreadsheets
|
| To me, the case for LLMs is strongest not because LLMs are so
| unusually accurate and awesome, but because if human
| performance were put on trial in aggregate, it would be found
| wanting.
|
| Humans already do a mediocre job of spreadsheets, so I don't
| think it is a given that Claude will make more mistakes than
| humans do.
| lionkor wrote:
| But isn't this only fine as long someone who knows what
| they are doing has oversight and can fix issues when they
| arise and Claude gets stuck?
|
| Once we all forget how to write SUM(A:A), will we just
| invent a new kind of spreadsheet once Claude gets stuck?
|
| Or in other words; what's the end game here? LLMs clearly
| cannot be left alone to do anything properly, so what's the
| end game of making people not learn anything anymore?
| solumunus wrote:
| Well the end game with AI is AGI of course. But
| realistically the best case scenario with LLM's is having
| fewer people with the required knowledge, leveraging
| LLM's to massively enhance productivity.
|
| We're already there to some degree. It is hard to put a
| number on my productivity gain, but as a small business
| owner with a growing software company it's clear to me
| already that I can reduce developer hiring going forward.
|
| When I read the skeptics I just have to conclude that
| they're either poor at context building and/or work on
| messy, inconsistent and poorly documented projects.
|
| My sense is that many weaker developers who can't learn
| these tools simply won't compete in the new environment.
| Those who can build well designed and documented projects
| with deep context easy for LLM's to digest will thrive.
|
| I assume all of this applies to spreadsheets.
| dns_snek wrote:
| Why isn't there a single study that would back up your
| observations? The only study with a representative
| experimental design that I know about is the METR study
| and it showed the opposite. Every study citing
| significant productivity improvements that I've seen is
| either:
|
| - relying on self-assessments from developers about how
| much time they think they saved, or
|
| - using useless metrics like lines of code produced or
| PRs opened, or
|
| - timing developers on toy programming assignments like
| implementing a basic HTTP server that aren't
| representative of the real world.
|
| Why is it that any time I ask people to provide examples
| of high quality software projects that were predominantly
| LLM-generated (with video evidence to document the
| process and allow us to judge the velocity), nobody ever
| answers the call? Would you like to change that?
|
| My sense is that weaker developers and especially weaker
| leaders are easily impressed and fascinated by
| substandard results :)
| nosianu wrote:
| Okay, and now you give those mediocre humans a tool hat is
| both great and terrible. The problem is, unless you know
| your way around very well, they won't know which is which.
|
| Since my company uses Excel a lot, and I know the basics
| but don't want to become an expert, I use LLMs to ask
| intermediate questions, too hard to answer with the few
| formulas I know, not too hard for a short solution path.
|
| I have great success and definitely like what I can get
| with the Excel/LLM combo. But if my colleagues used it the
| same way, they would not get my good results, which is not
| their fault, they are not IT but specialists, e.g. for
| logistics. The best use of LLMs is if you could already do
| the job without them, but it saves you time to ask them and
| then check if the result is actually acceptable.
|
| Sometimes I abandon the LLM session, because sometimes, and
| it's not always easy to predict, fixing the broken result
| would take more effort than just doing it the old way
| myself.
|
| A big problem is that the LLMs are so darn confident and
| always present a result. For example, I point it to a
| problem, it "thinks", and then it gives me new code and
| very confidently summarizes what the problem was,
| correctly, that it now for sure fixed the problem. Only
| that when I actually try the result has gotten worse than
| before. At that point I never try to get back to a working
| solution by continuing to try to "talk" to the AI, I just
| delete that session and do another, non-AI approach.
|
| But non-experts, and people who are very busy and just want
| to get some result to forward to someone waiting for it as
| quickly as possible will be tempted to accept the nice
| looking and confidently presented "solution" as-is. And you
| may not find a problem until half a year later somebody
| finds that prepayments, pro forma bills and the final
| invoices don't quite match in hard to follow ways.
|
| Not that these things don't happen now already, but adding
| a tool with erratic results might increase problems,
| depending on actual implementation of the process. Which
| most likely won't be well thought out, many will just cram
| in the new tool and think it works when it doesn't implode
| right away, and the first results, produced when people
| still pay a lot of attention and are careful, all look
| good.
|
| I am in awe of the accomplishments of this new tool, but it
| is way overhyped IMHO, still far too unpolished and random.
| Forcing all kinds of processes and people to use it is not
| a good match, I think.
| ryandrake wrote:
| This is a great point. LLMs make good developers better,
| but they make bad developers even worse. LLMs multiply
| instead of add value. So if you're a good developer, who
| is careful, pays attention, watches out for trouble, and
| is constantly reviewing and steering, the LLM is
| multiplying by a positive number and will make you
| better. However, if you're a mediocre/bad developer, who
| is not careful, who lacks attention to detail, and just
| barely gets things to compile / run, then the LLM is
| multiplying by a negative number and will make your
| output even worse.
| extr wrote:
| Is this just a feeling you have or is this downstream of
| actual use cases you've applied AI to observed and measured
| reliability on?
| lionkor wrote:
| Not OP but using LLMs in any professional setting, like
| programming, editing or writing technical specifications,
| OP is correct.
|
| Without extensive promoting and injectimg my own knowledge
| and experience, LLMs generate absolute unusable garbage (on
| average). Anyone who disagrees very likely is not someone
| who would produce good quality work by themselves (on
| average). That's not a clever quip; that's a very sad
| reality. SO MANY people cannot be bothered to learn
| anything if they can help it.
| extr wrote:
| I would completely disagree. I use LLMs daily for coding.
| They are quite far from AGI and it does not appear they
| are replacing Senior or Staff Engineers any time soon.
| But they are incredible machines that are perfectly
| capable of performing some economically valuable tasks in
| a fraction of the time it would have taken a human. If
| you deny this your head is in the sand.
| lionkor wrote:
| Capable, yeah, but not reliable, that's my point. They
| can one shot fantastic code, or they can one shot the
| code I then have to review and pull my hair out over for
| a week, because it's such crap (and the person who pushed
| it is my boss, for example, so I can't just tell him to
| try again).
|
| That's not consistent.
| wahnfrieden wrote:
| You can ask your boss to submit PRs using Codex's "try 5
| variations of the same task and select the one you like
| most though
| zxor wrote:
| Surely at that point they could write the code themselves
| faster than they can review 5 PRs.
|
| Producing more slop for someone else to work through is
| not the solution you think it is.
| extr wrote:
| Have you never used one to hunt down an obscure bug and
| found the answer quicker than you likely would have
| yourself?
| lionkor wrote:
| Actually, yeah, a couple of times, but that was a rubber-
| ducky approach; the AI said something utterly stupid, but
| while trying to explain things, I figured it out. I don't
| think an LLM has solved any difficult problem for me
| before. However, I think I'm likely an outlier because I
| do solve most issues myself anyways.
| chrisweekly wrote:
| Why do you frame the options as "one shot... or... one
| shot"?
| lionkor wrote:
| Because lazy people will use it like that, and we are all
| inherently lazy
| dns_snek wrote:
| It's not much better with planning either. The amount of
| time I spent planning, clarifying requirements, hand-
| holding implementation details always offset any
| potential savings.
| visarga wrote:
| The triad of LLM dependencies in my view: initiation of
| tasks, experience based feedback, and consequence sink.
| They can do none of these, they all connect to the outer
| context which sits with the user, not the model.
|
| You know what? This is also not unlike hiring a human,
| they need the hirer party tell them what to do, give
| feedback, and assume the outcomes.
|
| It's all about context which is non-fungible and
| distributed, not related to intelligence but to the
| reason we need intelligence for.
| KronisLV wrote:
| > Anyone who disagrees very likely is not someone who
| would produce good quality work by themselves (on
| average).
|
| So for those producing slop and not knowing any better
| (or not caring), AI just improved the speed at which they
| work! Sounds like a great investment for them!
|
| For many mastering any given craft might not be the goal,
| but rather just pushing stuff out the door and paying
| bills. A case of mismatched incentives, one might say.
| mbesto wrote:
| Not the parent poster, but this is pretty much the
| foundation of LLMs. They are by their nature probabilistic,
| not deterministic. This is precisely what the parent is
| referring to.
| extr wrote:
| All processes in reality, everywhere, are probablistic.
| The entire reason "engineering" is not the same as
| theoretical mathematics is about managing these
| probabilities to an acceptable level for the task you're
| trying to perform. You are getting a "probablistic"
| output from a human too. Human beings are not
| guaranteeing theoretically optimal excel output when they
| send their boss Final_Final_v2.xlsx. You are using your
| mental model of their capabilities to inform how much you
| trust the result.
|
| Building a process to get a similar confidence in LLM
| output is part of the game.
| jbs789 wrote:
| Yup. It becomes clearer to me when I think about the
| existing validators. Can these be improved, for sure.
|
| It's when people make the leaps to the multi-year endgame
| and in their effort to monetise by building
| overconfidence in the product where I see the inherent
| conflict.
|
| It's going to be a slog... the detailed implementations.
| And if anyone is a bit more realistic about managing
| expectations I think Anthropic is doing it a little
| better.
| mbesto wrote:
| > All processes in reality, everywhere, are probablistic.
|
| If we want to go in philosophy then sure, you're correct,
| but this not what we're saying.
|
| For example, an LLM is capable (and it's highly plausible
| for it to do so) of creating a reference to a non-
| existent source. Humans generally don't do that when
| their goal is clear and aligned (hence deterministic).
|
| > Building a process to get a similar confidence in LLM
| output is part of the game.
|
| Which is precisely my point. LLMs are supposed to be
| _better_ than humans. We 're (currently) shoehorning the
| technology.
| extr wrote:
| > Humans generally don't do that when their goal is clear
| and aligned (hence deterministic).
|
| Look at the language you're using here. Humans
| "generally" make less of these kinds of errors.
| "Generally". That is literally an assessment of
| likelihood. It is completely possible for me to hire
| someone so stupid that they create a reference to a non-
| existent source. It's completely possible for my high IQ
| genius employee who is correct 99.99% of the time to have
| an off-day and accidentally fat finger something. It
| happens. Perhaps it happens at 1/100th of the rate that
| an LLM would do it. But that is simply an input to the
| model of the process or system I'm trying to build that I
| need to account for.
| spookie wrote:
| When humans make mistakes repeatedly in their job they
| get fired.
| Scubabear68 wrote:
| I have to disagree. There are many areas where things are
| extremely deterministic, regulated financial services
| being one of those areas. As one example of zillions,
| look at something like Bond Math. All of it is very well
| defined, all the way down to what calendar model you will
| you use (360/30 or what have you), rounding, etc. It's
| all extremely well defined specifically so you can get
| apple to apple comparisons in the market place.
|
| The same applies to my checkbook, and many other areas of
| either calculating actuals or where future state is well
| defined by a model.
|
| That said, there _can_ be a statistical aspect to any
| spreadsheet model. Obviously. But not all spreadsheets
| are statistical, and therein lies the rub. If an LLM
| wants to hallucinate a 9,000 day yearly calendar because
| it confuses our notion of a year with one of the outer
| planets, that falls well within probability, but not
| within determinism following well define rules.
|
| The other side of the issue is LLMs trained on the
| Internet. What are the chances that Claude or whatever is
| going to make a change based on a widely prevalent but
| incorrect spreadsheet it found on some random corner of
| the Internet? Do I want Claude breaking my well-honed
| spreadsheet because Floyd in Nebraska counted sheep wrong
| in a spreadsheet he uploaded and forgot about 5 years
| ago, and Claude found it relevant?
| sothatsit wrote:
| I don't think tools like Claude are there yet, but I already
| trust GPT-5 Pro to be more diligent about catching bugs in
| software than me, even when I am trying to be very careful. I
| expect even just using these tools to help review existing
| Excel spreadsheets could lead to a significant boost in
| quality if software is any guide (and Excel spreadsheets seem
| even worse than software when it comes to errors).
|
| That said, Claude is still quite behind GPT-5 in its ability
| to review code, and so I'm not sure how much to expect from
| Sonnet 4.5 in this new domain. OpenAI could probably do
| better.
| admdly wrote:
| > That said, Claude is still quite behind GPT-5 in its
| ability to review code, and so I'm not sure how much to
| expect from Sonnet 4.5 in this new domain. OpenAI could
| probably do better.
|
| It's always interesting to see others opinions as it's
| still so variable and "vibe" based. Personally, for my use,
| the idea that any GPT-5 model is superior to Claude just
| doesn't resonate - and I use both regularly for similar
| tasks.
| sothatsit wrote:
| I also find the subjective nature of these models
| interesting, but in this case the difference in my
| experiences between Sonnet 4.5 and GPT-5 Codex, and
| especially GPT-5 Pro, for code review is pretty stark.
| GPT-5 is consistently much better at hard logic problems,
| which code review often involves.
|
| I have had GPT-5 point out dozens of complex bugs to me.
| Often in these cases I will try to see if other models
| can spot the same problems, and Gemini has occasionally
| but the Claude models never have (using Opus 4, 4.1, and
| Sonnet 4.5). These are bugs like complex race conditions
| or deadlocks that involve complex interactions between
| different parts of the codebase. GPT-5 and Gemini can
| spot these types of bugs with a decent accuracy, while
| I've never had Claude point out a bug like this.
|
| If you haven't tried it, I would try the codex /review
| feature and compare its results to asking Sonnet to do a
| review. For me, the difference is very clear for code
| review. For actual coding tasks, both models are much
| more varied, but for code review I've never had an
| instance where Claude pointed out a serious bug that
| GPT-5 missed. And I use these tools for code review all
| the time.
| bcrosby95 wrote:
| I've noticed something similar. I've been working on some
| concurrency libraries for elixir and Claude constantly
| gets things wrong, but GPT5 can recognize the techniques
| I'm using and the tradeoffs.
| meowface wrote:
| Try the TypeScript codex CLI with the gpt-5-codex model
| with reasoning always set to high, or GPT-5 Pro with max
| reasoning. Both are currently undeniably better than
| Claude Opus 4.1 or Sonnet 4.5 (max reasoning or
| otherwise) for all code-related tasks. Much slower but
| more reliable and more intelligent.
|
| I've been a Claude Code fanboy for many months but OpenAI
| simply won this leg of the race, for now.
| typpilol wrote:
| Same. I switched from sonnet 4 when it was out to codex.
| Went back to try sonnet 4.5 and it really hates to work
| for longer than like 5 minutes at a time
|
| Codex meanwhile seems to be smarter and plugs away at a
| massive todo list for like 2 hours
| scoot wrote:
| Or you could, you know, read the article before commenting to
| see the limited scope of this integration?
|
| Anyway, Google has already integrated Gemini into Sheets, and
| recently added direct spreadsheet editing capability so your
| comment was disproven before you even wrote it
| silenced_trope wrote:
| > The key is really this - all LLMs that I know of rely on
| entropy and randomness to emulate human creativity. This
| works pretty well for pretty pictures and creating fan
| fiction or emulating someone's voice.
|
| I think you need to turn down the temperature a little bit.
| This could be a beneficial change.
| scosman wrote:
| > all LLMs that I know of rely on entropy and randomness to
| emulate human creativity
|
| Those are tuneable parameters. Turn down the temperature and
| top_p if you don't want the creativity.
|
| > Claude is not going to improve this.
|
| We can measure models vs humans and figure this out.
|
| To your own point, humans already make "rampant" mistakes.
| With models, we can scale inference time compute to catch and
| eliminate mistakes, for example: run 6x independent
| validators using different methodologies.
|
| One-shot financial models are a bad idea, but properly
| designed systems can probably match or beat humans pretty
| quickly.
| th0ma5 wrote:
| > Turn down the temperature and top_p if you don't want the
| creativity.
|
| This also reduces accuracy in real terms. The randomness is
| used to jump out of local minima.
| scosman wrote:
| That's at training time, not inference time. And
| temp/top_p aren't used to escape local minima, methods
| like SDG batch sampling, Adam, dropout, LR decay, and
| other techniques do that.
| hansmayer wrote:
| > Those are tuneable parameters. Turn down the temperature
| and top_p if you don't want the creativity.
|
| Ah yes, we'll tell Mary from the Payroll she could just
| tune them parameters if there is more than "like 2%" error
| in her spreadsheets
| scosman wrote:
| No one said it was a user setting. The person building
| the spreadsheet agent system would tune the hyper-
| parameters with a series of eval sets.
| sally_glance wrote:
| Having AI create the spreadsheet you want is totally
| possible, just like generating bash scripts works well. But
| to get good results, there needs to be some documentation
| describing all the hidden relationships and nasty workarounds
| first.
|
| Don't try to make LLMs generate results or numbers, that's
| bound to fail in any case. But they're okay to generate a
| starting point for automations (like Excel sheets with lots
| of formulas and macros), given they get access to the same
| context we have in our heads.
| bnug wrote:
| I like this take. There seems to be an over-focus on 'one-
| shot' results, but I've found that even the free tools are
| a significant productivity booster when you focus on
| generating smaller pieces of code that you can verify.
| Maybe I'm behind the power curve since I'm not leveraging
| the full capability of the advanced LLM's, but if the
| argument is disaster is right around the corner due to
| potential hallucinations, I think we should consider that
| you still have to check your work for mission critical
| systems. That said, I don't really build mission critical
| systems - I just work in Aerospace Engineering and like
| building small time saving scripts / macros for other
| engineers to use. For this use, free LLMs even have been
| huge for me. Maybe I'm in a very small minority, but I do
| use Excel & Python nearly every day.
| mountainriver wrote:
| You can do it cursor style
| hoistbypetard wrote:
| IMO people tend to over-trust both AI and Excel. Maybe this
| will recalibrate that after it leads to a catastrophic
| business failure or two.
| phatfish wrote:
| You would hope so. But how many companies have actually
| changed their IT policy of outsourcing everything to Tata
| Consultancy Services (or similar) where a sweaty office in
| Mumbai full of people who don't give a shit run critical
| infrastructure?
|
| Jaguar Landrover had production stopped for over a month I
| think, and 100+ million impact to their business (including
| a trail of smaller suppliers put near bankruptcy). I'd bet
| Tata are still there and embedded even further in 5 years.
|
| If AI provides some day-to-day running cost reduction that
| looks good on quarterly financial statements it will be
| fully embraced, despite the odd "act of god".
| gpderetta wrote:
| to be clear, tata owns JLR.
| phatfish wrote:
| Indeed, that slipped my mind. However the Marks and
| Spencer hack was also their fault. Just searching on it
| now it seems there is a ray of hope. Although i have a
| feeling the response won't be a well trained
| onshore/internal IT department. It will be another
| offshore outsourcing jaunt but with better compensation
| for incompetent staff on the outsourcers side.
|
| "Marks & Spencer Cuts Ties With Tata Consultancy Services
| Amid PS300m Cyber Attack Fallout" (ibtimes.co.uk)
| jbs789 wrote:
| I tend to agree that dropping the tool as it is into
| untrained hands is going to be catastrophic.
|
| I've had similar professional experiences as you and have
| been experimenting with Claude Code. I've found I really need
| to know what I'm doing and the detail in order to make
| effective (safe) use out of it. And that's been a learning
| curve.
|
| The one area I hope/think it's closest to (given comments
| above) is potentially as a "checker" or validator.
|
| But even then I'd consider the extent to which it leaks data,
| steers me the wrong way, or misses something.
|
| The other case may be mocking up a simple financial model for
| a test / to bounce ideas around. But without very detailed
| manual review (as a mitigating check), I wouldn't trust it.
|
| So yeah... that's the experience of someone who maybe bridges
| these worlds somewhat... And I think many out there see the
| tough (detailed) road ahead, while these companies are racing
| to monetize.
| stocksinsmocks wrote:
| My take is more optimistic. This could be an off ramp to stop
| putting critical business workflows in spreadsheets. If
| people start to learn that general purpose programming
| languages are actually easier than Excel (and with LLMs,
| there is no barrier), then maybe more robust workflows and
| automation will be the norm.
|
| I think the world would be a lot better off if excel weren't
| in it. For example, I work at business with 50K+ employees
| where project management is done in a hellish spreadsheet
| literally one guy in Australia understands. Data entry errors
| can be anywhere and are incomprehensible. 3 or 4 versions are
| floating around to support old projects. A CRUD app with a
| web front end would solve it all. Yet it persists because
| Excel is erroneously seen as accessible whereas Rails,
| Django, or literally anything else is witchcraft.
| player1234 wrote:
| There was never a barrier to automating your office work
| with python unless you are a moron.
|
| Who fooled the world scripting some known work flow of
| yours is fucking rocket science. It should be a requirement
| to even enter the fucking office building.
| xbmcuser wrote:
| In my opinion the biggest use case for spread sheet with LLM
| is to ask them to build python scripts to do what ever
| manipulations you want to do with the data. Once people learn
| to do this workplace productivity would increase greatly I
| have been using LLM for years now to write python scripts
| that automate different repeatable tasks. Want a pdf of this
| data to be overlayed on this file create a python script with
| an LLM. Want the data exported out of this to be formated and
| tallied create a script for that.
| calgoo wrote:
| Yesterday I had to pass a bunch of data to finance as the
| person that does so had left the company. But they wanted
| me to basically group by a few columns, so instead of
| spending an hour on this in excel, I created 3 rows of fake
| data, gave it to the llm, it created a Python script which
| I ran against the dataset. After manual verification of the
| results, it could be submitted to finance.
| jb1991 wrote:
| Congrats? But you are not likely a typical user.
| brabel wrote:
| That's exactly how it should be done if accuracy is
| important.
| xbmcuser wrote:
| Yeah I am not a programmer just more tech literate than
| most as I have always been fascinated by tech. I think
| people are missing the forest for the trees when it comes
| to LLMS. I have been using them to create simple bash,
| bat, python scripts. Which I would not have been able to
| put together before even with weeks of googling. I say
| that because I used to do that unsuccessfully but my
| success rate thorugh the roof with LLM's.
|
| Now I just ask an LLM to create the scripts and explain
| all the steps. If it is a complex script I would also ask
| it to add logging to the script so that I can feed the
| log back to the LLM and explain what is going wrong which
| allowed for a lot faster fixes. In the early days I and
| the LLM would be going around in circles till I hit the
| token limits. And to start from scratch again.
| player1234 wrote:
| Learn python, the subscription for that knowledge won't
| be jacked up to 2000$/month when the VC money drys up.
| player1234 wrote:
| Just learn python, what are you a child?
| PatronBernard wrote:
| How will people without Python knowledge know that the
| script is 100% correct? You can say "Well they shouldn't
| use it for mission critical stuff" or "Yeah that's not a
| use case, it could be useful for qualitative analysis"
| etc., but you bet they will use it for everything. People
| use ChatGPT as a search engine and a therapist, which tells
| us enough
| 010101010101 wrote:
| If you have a mechanism that can prove arbitrary program
| correctness with 100% accuracy you're sitting on
| something more valuable than LLMs.
| tonyhart7 wrote:
| so human powered LLM user ??
| freedomben wrote:
| For sure, I've never seen a human write a bug or make a
| mistake in programming
| tonyhart7 wrote:
| that's why we create LLM for that
| player1234 wrote:
| Basic python knowledge should be a requirement for any
| office job.
|
| LLMs is a retarded way of spending trillions automating
| what can be done with good old reliable scripting. We
| haven't automated shit yet.
| hansmayer wrote:
| Yeah, it's like that commercial for OpenAI (or was it
| Gemini?) where the guy says it lets the tool work on it's
| complex financial spreadsheets, goes for a walk with a dog,
| gets back and it is done with "like 98% accuracy". I cannot
| imagine what the 2% margin of error looks like for a company
| that moves around hundreds of billions of dollars...
| lacker wrote:
| It's like the negativity whenever a post talks about hiring or
| firing. A lot of people are afraid that they are going to lose
| their jobs to AI.
| pluc wrote:
| Anthropic now has all your company's data, and all you saved
| was the cost of one human minus however much they charge for
| this. The good news is it can't have your data _again_! So
| starting from the 163rd-165th person you fire, you start to see
| a good return and all you 've sacrificed is exactitude,
| precision, judgement, customer service and a little bit of
| public perception!
| mapt wrote:
| The vast majority of people in business and science are using
| spreadsheets for complex algorithmic things they weren't really
| designed for, and we find a metric fuckton of errors in the
| sheets when you actually bother looking auditing them, mistakes
| which are not at all obvious without troubleshooting by...
| manually checking each and every cell & cell relation, peering
| through parentheses, following references. It's a nightmare to
| troubleshoot.
|
| LLMs specialize in making up plausible things with a minimum of
| human effort, but their downside is that they're very good at
| making up plausible things which are covertly erroneous. It's a
| nightmare to troubleshoot.
|
| There is already an abject inability to provision the labor to
| verify Excel reasoning when it's composed by humans.
|
| I'm dead certain that Claude will be able to produce plausibly
| correct spreadsheets. How important is accuracy to you? How
| life-critical is the end result? What are your odds, with the
| current auditing workflow?
|
| Okay! Now! Half of the users just got laid off because
| management thinks Claude is Good Enough. How about now?
| practice9 wrote:
| LLMs are getting quite good at reviewing the results and
| implementations, though
| lionkor wrote:
| Not really, they're only as good as their context and they
| do miss and forget important things. It doesn't matter how
| often, because they do, and they will tell you with 100%
| confidence and with every synonym of "sure" that they
| caught it all. That's the issue.
| sothatsit wrote:
| I am very confident that these tools are better than the
| median programmer at code review now. They are certainly
| much more diligent. An actually useful standard to
| compare them to is human review, and for technical
| problems, they definitely pass it. That said, they're
| still not great at giving design feedback.
|
| But GPT-5 Pro, and to a certain extent GPT-5 Codex, can
| spot complex bugs like race conditions, or subtly
| incorrect logic like memory misuse in C, remarkably well.
| It is a shame GPT-5 Pro is locked behind a $200/month
| subscription, which means most people do not understand
| just how good the frontier models are at this type of
| task now.
| rchaud wrote:
| I'd say the vast majority of Excel users in business are
| working off of a CSV sent from their database/ERP team or
| exported from a self-serve analytics tool and using pivot
| tables to do the heavy lifting, where it's nearly impossible
| to get something wrong. Investment banks and trading desks
| are different, and usually have an in-house IT team building
| custom extensions into Excel or training staff to use bespoke
| software. That's still a very small minority of Excel users.
| atleastoptimal wrote:
| HN has a base of strong anti-AI bias, I assume is partially
| motivated by insecurity over being replaced, losing their jobs
| or having missed the boat on the AI.
| extr wrote:
| Based on the comments here, it's surprisingly anything in
| society works at all. I didn't realize the bar was
| "everything perfect every time, perfectly flexible and
| adaptable". What a joy some of these folks must be to work
| with, answering every new technology with endless reasons why
| it's worthless and will never work.
| jay_kyburz wrote:
| I think perhaps you underestimate how antithetical the
| current batch of LLM AI's is to what most programmers
| strive for every day, and what we want from our tools. Its
| not about losing our job, its about "correctness". (or as
| said below - deterministic)
|
| In a lot of jobs, particularly in creative industries, or
| marketing, media and writing, the definition of a job well
| done is a fairly grey area. I think AI will be mostly
| disruptive in these areas.
|
| But in programming there is a hard minimum of quality.
| Given a set of inputs, does the program return the correct
| answer or not? When you ask it what 2+2, do you get 4?
|
| When you ask AI anything, it might be right 50% of the
| time, or 70% of the time, but you can't blindly trust the
| answer. A lot of us just find that not very useful.
| Aeolun wrote:
| Modt of the time when using AI I have a lot more than 1
| shot to ensure everything is correct.
| ytoawwhra92 wrote:
| > But in programming there is a hard minimum of quality.
| Given a set of inputs, does the program return the
| correct answer or not? When you ask it what 2+2, do you
| get 4?
|
| Whether something works or not matters less than whether
| someone will pay for it.
| extr wrote:
| I am a SWE myself and use LLMs to write ~100% of my code.
| That does not mean I fire and forget multiplexed codex
| instances. Many times I step through and approve every
| edit. Even if it was nothing but a glorified stenographer
| - there are substantial time savings in being able to
| prototype and validate ideas quickly.
| MattGaiser wrote:
| HN has an obsession with quality too, which has merit, but is
| often economically irrelevant.
|
| When US-East-1 failed, lots of people talked about how the
| lesson was cloud agnosticism and multi cloud architecture.
| The practical economic lesson for most is that if US-East-1
| fails, nobody will get mad at you. Cloud failure is viewed as
| an act of god.
| hypeatei wrote:
| > HN has a base of strong anti-AI bias
|
| Quite the opposite, actually. You can always find five
| stories on the front page about some AI product or feature.
| Meanwhile, you have people like yourself who convince
| themselves that any pushback is done by people who just don't
| see the true value of it yet and that they're about to miss
| out!! Some kind of attempt at spreading FOMO, I guess.
| lionkor wrote:
| I use AI every day. Without oversight, it does not work well.
|
| If it doesn't work well, I will do it myself, because I care
| that things are done well.
|
| None of this is me being scared of being replaced; quite the
| opposite. I'm one of the last generations of programmers who
| learned how to program and can debug and fix the mess your
| LLM leaves behind when you forgot to add "make sure it's a
| clean design and works" to the prompt.
|
| Okay, that's maybe hyperbole, but sadly only a little bit.
| LLMs make me better at my job, they don't replace me.
| sothatsit wrote:
| I really don't think this is accurate. I think the median
| opinion here is to be suspicious of claims made about AI, and
| I don't think that's necessarily a bad thing. But I also
| regularly see posts talking about AI positively (e.g.
| simonw), or talking about it negatively. I think this is a
| good thing, it is nice to have a diversity of opinions on a
| technology. It's a feature, not a bug.
| crote wrote:
| > HN has a base of strong anti-AI bias
|
| If anything, HN, has a _pro-AI_ bias. I don 't know of _any_
| other medium where discussions about AI consistently get this
| much frontpage time, this amount of discussion, and this many
| people reporting positive experiences with it. It 's
| definitely true that HN isn't the raging pro-AI hypetrain it
| was two years ago, but that shouldn't be mistaken for "strong
| anti-AI bias".
|
| Outside of HN I am seeing, _at best_ , an ambivalent
| reaction: plenty of people are interested, almost everyone
| tried it, very few people genuinely like it. They are happy
| to use it when it is convenient, but couldn't care less if it
| disappeared tomorrow.
|
| There's also a small but vocal group which absolutely _hates_
| AI and will actively boycott any creative-related company
| stupid enough to admit to using it, but that crowd doesn 't
| really seem to hang out on HN.
| impjohn wrote:
| >but couldn't care less if it disappeared tomorrow.
|
| Wonder how true that is. Some things incorporate in your
| life so subtly that you only become aware of them when
| totally switched off.
| sph wrote:
| > There's also a small but vocal group which absolutely
| hates AI and will actively boycott any creative-related
| company stupid enough to admit to using it, but that crowd
| doesn't really seem to hang out on HN.
|
| I do, but I certainly feel in the minority in here.
| mr_toad wrote:
| > HN has a base of strong anti-AI bias
|
| HN constantly points out the flaws, gaps, and failings of AI.
| But the same is true of any technology discussed on HN. You
| could describe HN as having an anti-technology bias, because
| HN complains about the failings of tech all day every day.
| StarterPro wrote:
| Anti-AI bias is motivated by the waste of natural resources
| due to a handful of non-technical douchebag tech bros.
|
| Everything isn't about money, I know that status and power
| are all you ai narcissists dream about. But you'll never be
| Bill Gates, nor will you be Elon Musk.
|
| Once ai has gone the way of "Web3", "NFTs", "blockchain", "3D
| tvs", etc; You'll find a new grift to latch your life savings
| onto.
| A4ET8a8uTh0_v2 wrote:
| It is bad in a very specific sense, but I did not see any other
| comments express the bad parts instead of focusing merely on
| the accuracy part ( which is an issue, but not the issue ):
|
| - this opens up ridiculous flood of data that would otherwise
| be semi-private to one company providing this service - this
| works well small data sets, but will choke on ones it will need
| to divvy up into chunks inviting interesting ( and yet unknown
| ) errors
|
| There is a real benefit to being able to 'talk to data', but
| anyone who has seen corporate culture up close and personal
| knows exactly where it will end.
|
| edit: an i saying all this as as person, who actually likes
| llms.
| mceoin wrote:
| I second this. Spreadsheets are the primary tool used for 15%
| of the U.S. economy. Productivity improvements will affect
| hundreds of millions of users globally. Each increment in
| progress is a massive time save and value add.
|
| The criticisms broadly fall between "spreadsheets are bad" and
| "AI will cause more trouble than it solves".
|
| This release is a dot in a trend towards everyone having a
| Goldman-Sachs level analyst at their disposal 24/7. This is a
| huge deal for the average person or business. Our expectation
| (disclaimer: I work in this space) is that spreadsheet
| intelligence will soon be a solved problem. The "harder"
| problem is the instruction set and human <> machine prompting.
|
| For the "spreadsheets are bad" crowd -- sure, they have
| problems, but users have spoken and they are the preferred
| interface for analysis, project management and lightweight
| database work globally. All solutions to "the spreadsheet
| problem" come with their own UX and usability tradeoffs, so
| it'a a balance.
|
| Congrats to the Claude team and looking forward to the next
| release!
| bonoboTP wrote:
| > Each increment in progress is a massive time save and value
| add.
|
| Based on the history of digitalization of businesses from the
| 1980s onwards, the spreadsheets will just balloon in number
| and size and there will be more rules and more procedures and
| more forms and reports to file until the efficiency gains are
| neutralized (or almost neutralized).
| mceoin wrote:
| We'll hit a new plateau somewhere, for sure. Still, I'm
| glad I'm not doing my spreadsheets on paper so net win so
| far!
| trollbridge wrote:
| The biggest problem with spreadsheets is that they tend to be
| accounts for the accumulation of technical debt, which is an
| area that AI tools are not yet very good at retiring, but very
| good at making additional withdrawals from.
| burnte wrote:
| > What is with the negativity in these comments?
|
| A lot of us have seen the effects of AI tools in the hands of
| people who don't understand how or why to use the tools. I've
| already seen AI use/misuse get two people fired. One was a
| line-of-business employee who relied on output without ever
| checking it, got herself into a pretty deep hole in 3 weeks.
| Another was a C suite person who tried to run an AI tool
| development project and wasted double their salary in 3 months,
| nothing to show for it but the bill, fired.
|
| In both cases the person did not understand the limits of the
| tools and kept replacing facts with their desires and their own
| misunderstanding of AI. The C suite person even tried to tell a
| vendor they were wrong about their own product because "I found
| out from AI".
|
| AI right now is fireworks. It's great when you know how to use
| it, but if you half-ass it you'll blow your fingers off very
| easily.
| liqilin1567 wrote:
| Yeah, the danger lies not in AI itself, but in inexperienced
| users treating it as a magic solution.
| topaz0 wrote:
| It's a bit much to blame the user for this when the product
| is crafted specifically to give the impression of being
| magical. Not to mention the marketing and media.
| II2II wrote:
| > but these jobs are going to be the first on the chopping
| block as these integrations mature.
|
| I'm not even sure that has to be true anymore. From my
| admittedly superficial impression of the page, this appears to
| be a tool for building tools. There are plenty of organizations
| that are resource constrained, that are doing things the way
| they have always done thing in Excel, simply because they
| cannot allocate someone to modify what is already in place to
| better suit their current needs. For them, this is more of a
| quality of life and quality of out improvement. This is not
| like traditional software development, where organizations are
| far more likely to purchase a product or service to do a job
| (and where the vendors of those products and services are going
| to do their best to eliminate developers).
| giancarlostoro wrote:
| Honestly as a dev I hate Excel its a whole mess I dont
| understand. I will gladly use Claude for Excel. It will
| understand the business needs from the data more than I a mere
| developer just trying to get back to regular developer work.
| nelox wrote:
| Indeed. Take the New Zealand Department of Health as an
| example; it managed its entire NZD$28 billion budget (USD$16B)
| in a single Excel spreadsheet.
|
| https://www.theregister.com/2025/03/10/nz_health_excel_sprea...
|
| [edit: Added link]
| singleshot_ wrote:
| Can't speak for everyone, but the reason I'm negative in the
| context of this idea is that it's a stupid idea.
| timpieces wrote:
| Yes it's surprising to see so much cynicism for something that
| has a real possibility of making so many people so much more
| productive. My mental model of the average excel user is of
| someone who doesn't care about excel, but cares about their
| business. If Claude can help them use excel and learn about
| their business faster, then this should make the world more
| productive and we all get richer. Claude can make mistakes, but
| it's not clear to me why people think that the ratio of results
| to mistakes will get worse here. I think there are many
| possible reasons why this could not work out, but many of the
| comments here just seem like unfounded cynicism.
| meesles wrote:
| My theory: a lot of software we build is the supposed solve for
| a 'crappy spreadsheet'. a) that isnt' much of a moat, b) you're
| watching generalization of software happen in real time.
| impjohn wrote:
| Crappy spreadsheet is just the codification of business
| processes. Those are inherently messy and there's lots of
| assumptions, lots of edge cases. That's why spreadsheets tend
| towards crappy on a long enough timeline. It's a
| fundamentally messy problem.
|
| Spreadsheets are an abstraction over a messy reality, lossy.
| They were already generalizing reality.
|
| Now we generalize the generalization. It is this lossy
| reality that people are worried about with AI in HN.
| fragmede wrote:
| > What is with the negativity in these comments?
|
| > these jobs are going to be the first on the chopping block as
| these integrations mature.
|
| Those two things are maybe related? So many of my friends don't
| enjoy the same privileges as I do, and have a more tenuous
| connection to being gainfully employed.
| eviks wrote:
| > offense to these people but Sonnet 4.5 is already at the
| level where it would be able to replicate or beat the level of
| analysis they typically provide.
|
| No offense, but this is pure fantasy. The level of analysis
| they typically provide doesn't suffer from the same high
| baseline level of completely made up numbers of your favorite
| LLM.
| rekabis wrote:
| > Even just basic automation/scaffolding of spreadsheets would
| be a big productivity boost for many employees.
|
| When most of it is wild hallucinations? Not really.
|
| For many employees leveraging Excel for manipulating important
| data, it could cripple careers.
|
| For spreadsheets that influence financial decisions or touch
| PPI/PII, it could lead to regulatory disasters and even
| bankruptcies.
|
| Purge hallucinations from LLMs, _then_ let it touch the
| important shite. Doing it in the reverse order is just begging
| for a FAFO apocalypse.
| UltraSane wrote:
| You would be far better off using an LLM to replace a complex
| spreadsheet with a Python script and SQLite.
| 3uler wrote:
| The lady doth protest too much. People see every AI limitation
| crystal clear, but zero self awareness of their own
| fallibility.
| vincnetas wrote:
| Non-reproducability is the biggest issue here. You deliver a
| report in 5 minutes to CFO, he comes back after lunch, gives
| you updated data to adjust a bit of a report and 5 minutes
| later gets a new report that has some non related to update
| number changed and asks why? what do you do?
| atwrk wrote:
| Because people will be deeply affected by this, and not in the
| positive way. We already had this with copilot:
| https://i.imgur.com/nguIAsv.jpeg
|
| Just as with copilot, this combines LLM's inability to
| repeatably do math correctly with peoples' overassurance in
| LLM's capabilities.
| hoppp wrote:
| I don't like to use excel so if I ever have to touch it I will
| use AI.
| lizardking wrote:
| First time at HN?
| ferguess_k wrote:
| > No offense to these people but Sonnet 4.5 is already at the
| level where it would be able to replicate or beat the level of
| analysis they typically provide.
|
| If this is true then why your wife is going to be happy about
| it? I found it really hard to understand. Do you prefer your
| wife to be jobless and her employer happily cut costs without
| impacting productivity? Even if it _just_ replaces the line
| workers, do you think your wife is going to be safe?
|
| I don't get it.
| slightwinder wrote:
| > What is with the negativity in these comments?
|
| Excel and AIs are huge clusterfucks on their own, where insane
| errors happens for various reasons. Combine them, and maybe we
| will see improvement, but surely we will see catastrophic
| outcomes which could not only ruin the lives of ordinary
| people, whole companies and countries, as already happened
| before...
| martinald wrote:
| This is going to be massive if it works as well as I suspect it
| might.
|
| I think many software engineers overlook how many companies have
| huge (billion dollar) processes run through Excel.
|
| It's much less about 'greenfield' new excel sheets and much more
| about fixing/improving existing ones. If it works as well as
| Claude Code works for code, then it will get pretty crazy
| adoption I suspect (unless Microsoft beats them to it).
| thewebguyd wrote:
| > This is going to be massive if it works as well as I suspect
| it might.
|
| Until Microsoft does its anti-competitive thing and find a way
| to break this in the file format, because this is exactly what
| copilot in excel does.
|
| That said, Copilot in Excel is pretty much hot garbage still so
| anything will be better than that.
| NotMichaelBay wrote:
| What do you mean, what is copilot in excel doing exactly?
| lm28469 wrote:
| > I think many software engineers overlook how many companies
| have huge (billion dollar) processes run through Excel.
|
| So they can fire the two dudes that take care of it, lose 15
| years of in house knowledge to save 200k a year and cry in a
| few months when their magic tool shits the bed ?
|
| Massive win indeed
| bsenftner wrote:
| If the company is half baked, those "two dudes" will become
| indispensable beyond belief. They are the ones that
| understand how Excel works far deeper, and paired with Claude
| for Excel they become far far more valuable.
| Balgair wrote:
| At my org it more that these AI tools finally allow the
| employees to get through things at all. The deadlines are
| getting met for the first time, maybe ever. We can at last
| get to the projects that will make the company money
| instead of chasing ghosts from 2021. The burn down charts
| are warm now.
| brookst wrote:
| You think it's better for the company to have "two dudes"
| that are completely indispensable and whose work will be
| completely useless if they die / leave?
|
| I think you're making an argument _for_ LLMs, not against.
| lm28469 wrote:
| These two dudes can train the next generation, you know,
| like we've been doing since humans exist... instead of
| relying on some centralised point of failure somewhere
| thousands of km away which might or might not break your
| company whenever they decide to update something.
|
| You're one of the people who saw nothing wrong with moving
| all our industries to asia right ? "It's cheaper so it's
| obviously better", if you don't think about any of the
| externalities and long term consequences sure...
| blitzar wrote:
| Management have been executing this genius plan for decades
| without Ai.
| warthog wrote:
| Tough day to be an AI Excel add-in startup
| jonathanstrange wrote:
| That seems to be true for any startup that offers a wrapper to
| existing AIs rather than an AI on their own. The lucky ones
| might be bought but many if not most of them will perish trying
| to compete with companies that actually create AI models and
| companies large enough to integrate their own wrappers.
| warthog wrote:
| Actually just wrote about this:
| https://aimode.substack.com/p/openai-is-below-above-and-
| arou...
|
| not sure if it binary like that but as startups we will
| probably collect the scraps leftover indeed instead
| 8note wrote:
| its a great time for your ai excel add-in to start getting
| acquired by a claude competitor though
| NotMichaelBay wrote:
| Not OpenAI, though, because they already gave $14M to an AI
| Excel add-in startup (Endex)
| mitjam wrote:
| Ask Rosie is actually shutting down right now:
| https://www.askrosie.ai/
|
| I would love to learn more about their challenges as I have
| been working on an Excel AI add-in for quite some time and have
| followed Ask Rosie from almost their start.
|
| That they now gone through the whole cycle worries me I'm too
| slow as a solo building on the side in these fast paced times.
| intended wrote:
| As an inveterate Excel lover, I can just sense the blinding pain
| wafting off the legions of accountants, associates, seniors, and
| tech people who keep the machine spirits placated.
|
| lies, damn lies, statistics, and then Excel deciding cell data
| types.
| garyclarke27 wrote:
| I guess Claude maybe useful for finding errors in large Excel
| Workbooks. May also help beginners to learn the more complex
| Excel functions (which are still pretty easy). But if you are
| proficient at building Excel models I don't see any benefit.
| Excel already has a superb very efficient UI for entering
| formulas, ranges, tables, data sources etc I'm sceptical that a
| different UI especially a text based one can improve on this.
| proteal wrote:
| I understand the sentiment about a skilled user not needing
| this, but I think having a little buddy that I can use to
| offload some menial tasks would be helpful for me to iterate
| through my models more efficiently; even if the AI is not
| perfect. As a highly skilled excel user, I admit the software
| has terrible ergonomics. It would be a productivity boon for me
| if an AI can help me stay focused on model design vs model
| implementation.
| intended wrote:
| For some reason, I find that these tools are TERRIBLE at
| helping someone learn. I suspect because turning one on,
| results in turning the problem solving part of ones brain off.
|
| Its obviously not the same experience for everyone. ( If you
| are one of those energized while working in a chat window, you
| might be in a minority - given what we see from the ongoing
| massacre of brains in education. )
|
| Paraphrasing something I read here "people don't use ChatGPT to
| do learn more, they use it to study less".
|
| Maybe some folk would be better off.
| mattas wrote:
| I'm not excited about having LLMs generate spreadsheets or
| formulas. But, I think LLMs could be particularly useful in
| helping me find inconsistent formulas or errors that are
| challenging to identify. Especially in larger, complex
| spreadsheets touched by multiple people over the course of
| months.
| thesuitonym wrote:
| For once in my life, I actually had a delightful interaction
| with an LLM last week. I was changing some text in an Excel
| sheet in a very progromatic way that could have easily been
| done with the regex functions in Excel. But I'm not really
| great with regex, and it was only 15 or so cells, so I was
| content to just do it manually. After three or four cells,
| Copilot figured out what I was doing and suggested the rest of
| the changes for me.
|
| This is what I want AI to do, not generate wrong answers and
| hallucinate girlfriends.
| klausnrooster wrote:
| Thanks for reminding me to check if the REGEXEXTRACT,
| REGEXREPLACE, and REGEXTEST functions had landed for me yet.
| They have! Good, because sometime in 2027 the library
| providing RegEx in VBA will be yanked.
| https://youtu.be/pGH9LdgkJio
| bambax wrote:
| One approach is to produce read-only data in BI tools: users
| are free to export anything they want and make their own
| spreadsheets, but those are for their own use only. Reference
| data is produced every day by a central, controlled process and
| cannot in any circumstance be modified by the end user.
|
| I have implemented this a couple of times and not only does it
| work well, it tends to be fairly well accepted. People need
| spreadsheets to work on them, but generally they kind of hate
| sending those around via email. Having a reference source of
| data is welcomed.
| gedy wrote:
| Cool but now companies POs will be like "you must add the Excel
| export for all the user data!" and when asked why, will basically
| be "so I can do this roundabout query of data for some number in
| a spreadsheet using AI (instead of just putting the number or
| chart directly in the product with a simple db call)"
| racl101 wrote:
| This could be huge! Very exciting!
| michaelmarkell wrote:
| IMO, a real solution here has to be hybrid, not full LLM, because
| these sheets can be massive and have very complicated structures.
| You want to be able to use the LLM to identify / map column
| headers, while using non-LLM tool calling to run Excel operations
| like SUMIFs or VLOOKUPs. One of the most important traits in
| these systems is consistency with slight variation in file
| layout, as so much Excel work involves consolidating /
| reconciling between reports made on a quarterly basis or produced
| by a variety of sources, with different reporting structures.
|
| Disclosure: My company builds ingestion pipelines for large
| multi-tab Excel files, PDFs, and CSVs.
| dcre wrote:
| That's exactly what they're doing.
|
| https://www.anthropic.com/news/advancing-claude-for-financia...
| levocardia wrote:
| "This won't work because (something obvious that engineers at
| Anthropic clearly thought of already)"
| michaelmarkell wrote:
| Not really. Take for example:
|
| item, date, price
|
| abc, 01/01/2023, $30
|
| cde, 02/01/2023, $40
|
| ... 100k rows ...
|
| subtotal. $1000
|
| def, 03/01,2023, $20
|
| "Hey Claude, what's the total from this file? > grep for
| headers > "Ah, I see column 3 is the price value" >
| SUM(C2:C) -> $2020 > "Great! I found your total!"
|
| If you can find me an example of tech that can solve this
| at scale on large, diverse Excel formats, then I'll
| concede, but I haven't found something actually trustworthy
| for important data sets
| stevenhuang wrote:
| That's a basic tool call that current models already can
| do well. All the sql query generation LLMs can do this
| for example.
| sunnybeetroot wrote:
| So more or less like what AI has been doing for the last couple
| of years when it comes to writing code?
| pdyc wrote:
| I have just launched a product (easyanalytica.com) to create
| dashboards from spreadsheets, and Excel is on my to-do list of
| formats to be supported. However, I'm having second thoughts.
| Although, from the description, it seems like it would be more
| helpful on the modeling side rather than the presentation side. I
| guess I'll have to wait until it's publicly available
| sunnybeetroot wrote:
| Why second thoughts?
| pdyc wrote:
| everyone will use claude if they support it why would they
| use my product. so i will have to find some other angle to
| differentiate.
| causal wrote:
| Seems everyone is speculating features instead of just reading
| TFA which does in fact list features:
|
| - Get answers about any cell in seconds: Navigate complex models
| instantly. Ask Claude about specific formulas, entire worksheets,
| or calculation flows across tabs. Every explanation includes
| cell-level citations so you can verify the logic.
|
| - Test scenarios without breaking formulas: Update assumptions
| across your entire model while preserving all dependencies. Test
| different scenarios quickly--Claude highlights every change with
| explanations for full transparency.
|
| - Debug and fix errors: Trace #REF!, #VALUE!, and circular
| reference errors to their source in seconds. Claude explains what
| went wrong and how to fix it without disrupting the rest of your
| model.
|
| - Build models or fill existing templates: Create draft financial
| models from scratch based on your requirements. Or populate
| existing templates with fresh data while maintaining all formulas
| and structure.
| Balgair wrote:
| If this can reliably deal with the REF, VALUE, and NA problems,
| it'll be worth it for that alone.
|
| Oh and deal with dates before 1900.
|
| Excel is a gift from God if you stay in its lane. If you ever
| so slightly deviate, not even the Devil can help you.
|
| But maybe, juuuuust maybe, AI can?
| libraryatnight wrote:
| "not even Devil can help you.
|
| But maybe, juuuuust maybe, AI can?"
|
| Bold assumption that the devil and AI aren't aligned ;)
| lavishlibra0810 wrote:
| The greatest trick the devil ever pulled was convincing the
| world he didn't exist
| ACCount37 wrote:
| Nah, the greatest trick the devil ever pulled was
| convincing the world that Machine Learning is a
| legitimate field of study, and not just thinly veiled
| demon summoning.
| globular-toast wrote:
| I feel similarly about MS Word. It can actually produce
| decent documents _if_ you learn how to use it, in
| particularly if you use styles consistently and never, ever
| touch the bold, italic, colours etc. (outside of defining
| said styles, although the defaults are probably all most
| people need). Unfortunately I think the appeal of Word is
| that you don 't have to learn this and it will just do what
| you want. Is AI the panacea that will both do what you want
| _and_ give you the right answers every time?
| beefnugs wrote:
| Also people complaining about AI inaccuracy are just technical
| people that like precision. The vast majority of the world is
| people who dont give a damn about accuracy or even correctness.
| They just want to appear as if not completely useless to people
| that could potentially affect their salary
| lionkor wrote:
| "just" technical people who like precision are the reason we
| are here, typing this, and why lots of parts of our world is
| pretty cool and comfortable. I wouldn't say that's useless
| and "just" some people when it clearly is generating
| unmistakable value
| Yizahi wrote:
| I can pretty reliably guess that approximately 100% of all
| companies in the world use excel tables for financial data
| and for processes. Ok, this was a joke. It's actually 99.99%
| of all companies. One would think that financial data,
| inventory and stuff like that should be damn precise. No?
| fragmede wrote:
| How precise do they really need to be? If there's 3 of a
| widget on the shelf in the factory, and the factory uses
| 1000 per day, is it crucial to know that there's 3 of them,
| and not 0 or 50? Either way, the factory ain't running
| today or tomorrow or until more of those things come in.
| Similarly, what's $3 missing from an internal spreadsheet
| when the company costs $5,000 an hour to operate (or $10
| million a year). Obviously errors accumulate so the books
| need to be reconciled, but apl that stuff only need to be
| sufficiently directionally accurate with enough precision.
| If precision is free, then sure, but if a good enough job
| is cheaper? We all make that call every day.
| Yizahi wrote:
| If you have 2000 hectares of land you need to buy the
| exact amount of seeds to sow them. If you buy less you
| are losing money, if you buy more it is useless and you
| are losing money. If you have trucks or other machinery
| in the company you need to report exact amount of fuel
| needed/used, or either they won't run or you lose money
| on machinery missing fuel. If you need to tax a company,
| it is pretty important if there are 100 tons of steel
| used or 1000 tons. Or if the company has 5 factories to
| be taxed or 15. Etc.
|
| You are anthropomorphizing LLM programs, you assume that
| if a number in a spreadsheed is big, then program can
| somehow understand it that it is a big number and if it
| will make an error it will be a small order error like a
| human would make. Human process: "hmm, here is a
| calculation where we divide our imports by number of
| subsidiaries, let me estimate this in my head, ok, looks
| like 7320." (actual correct answer was 7340, bu human
| made a small, typical mistake in the math) LLM program
| process: it literally uses heat maps and randomization to
| arrive at each particular character in a row. So it may
| be 7340, or it may be 8745632, or 1320, or whatever.
| There is a comment here at a top, from another user,
| where he queried LLM to make a change of value in the
| document and it did it correctly. But at the same time it
| replaced bank account number with a different bank
| account number. Because to LLM it is the same - sixteen
| digit in the field, or another sixteen digits in a field,
| it is the same for LLM. Because it is not AI and doesn't
| "understand" what it does.
| fragmede wrote:
| If you have 2000 hecatares land, there is no way you're
| buying the exact right amount of seeds. You overbuy seeds
| by as little as you can, but seeds get loaded via tractor
| bucket, which is fairly messy. You're going to lose a
| decent amount of seeds. Thus, a lb or kilo of seeds or <
| 1% in the scheme of things isn't even going to be
| noticed, much less cause the demise of your farm
|
| For fuel, similarly, you're going to lose militers to
| evaporation on a hot day, so similarly, being off my ml
| isn't material.
|
| If you tax a company, fine, sure, the company is going to
| want it to be right, but 1 or two tons in a 10,000 ton
| order is again, < 1%. There is some threshold below which
| precision is extra unnecessary work, though if you have
| problems with thieves and corruption, you're going to
| want additional precision that isn't necessary elsewhere.
|
| As to where in my comment I'm anthropomorphizing LLMs,
| you're going to have to point out where I did that, as
| the word LLM doesn't appear anywhere in my comment, so it
| feels like you're projecting claims my comment does not
| make, as it is LLM neutral and merely point of that 100%
| exact precision doesn't come without a cost.
| serf wrote:
| Anthropic is in a weird place for me right now. They're growing
| fast , creating little projects that i'd love to try, but their
| customer service was so bad for me as a max subscriber that I set
| an ethical boundary for myself to avoid their services until such
| point that it appears that they care about their customers
| whatsoever.
|
| I keep searching for a sign, but everyone I talk to has horror
| stories. It sucks as a technologist that just wants to play with
| the thing; oh well.
| cmrdporcupine wrote:
| Best way to think of it is this: Right now you are not the
| customer. Investors are.
|
| The money people pay in monthly fees to Anthropic for even the
| top Max sub likely doesn't come closer to covering the energy &
| infrastructure costs for running the system.
|
| You can prove this to yourself by just trying to cost out what
| it takes to build the hardware capable of running a model of
| this size at this speed and running it locally. It's tens of
| thousands of dollars just to build the hardware, not even
| considering the energy bills.
|
| So I imagine the goal right now is to pull in a mass audience
| and prove the model, to get people hooked, to get management
| and talent at software firms pushing these tools.
|
| And I guess there's some in management and the investment
| community that thinks this will come with huge labour cost
| reductions but I think they may be dreaming.
|
| ... And then.. I guess... jack the price up? Or wait for
| Moore's Law?
|
| So it's not a surprise to me they're not jumping to try and
| service individual subscribers who are paying probably a
| fraction of what it costs them to the run the service.
|
| I dunno, I got sick of paying the price for Max and I now use
| the Claude Code tool but redirect it to DeepSeek's API and use
| their (inferior but still tolerable) model via API. It's
| probably 1/4 the cost for about 3/4 the product. It's actually
| amazing how much of the intelligence is built into the tool
| itself instead of just the model. It's often incredibly hard to
| tell the difference bertween DeepSeek output and what I got
| from Sonnet 4 or Sonnet 4.5
| kridsdale1 wrote:
| You are bang on.
|
| Every AI company right now (except Google Meta and Microsoft)
| has their valuations based on the expectation of a future
| monopoly on AGI. None of their business models today or in
| the foreseeable horizon are even positive let alone world-
| dominating. The continued funding rounds are all apparently
| based on expectation of becoming the sole player.
|
| The continuing advancement of open source / open weights
| models keeps me from being a believer.
|
| I've placed my bet and feel secure where it is.
| Wowfunhappy wrote:
| I've been playing around with local LLMs in Ollama, just for
| fun. I have an RTX 4080 Super, a Ryzen 5950X with 32 threads,
| and 64 GB of system memory. A very good computer, but
| decidedly consumer-level hardware.
|
| I have primarily been using the 120b gpt-oss model. It's
| definitely worse than Claude and GPT-5, but not by, like, an
| order of magnitude or anything. It's also clearly better than
| ChatGPT was when it first came out. Text generates a bit
| slowly, but it's perfectly usable.
|
| So it doesn't seem so unreasonable to me that costs could
| come down in a few years?
| cmrdporcupine wrote:
| It's possible. Systems like the AMD AI Max 395+ with 128GB
| RAM thing get close to being able to run good coding models
| at reasonable speeds from what I hear. But, no, I'm given
| to understand they couldn't run e.e. the DeepSeek 3.2 model
| full size because there simply isn't enough GPU RAM still.
|
| To build out a system that can, I'd imagine you're looking
| at what... $20k, $30k? And then that's a machine that is
| basically _for one customer_ -- meanwhile a Claude Code Max
| or Codex Pro is $200 USD a month.
|
| The math doesn't add up.
|
| And once it _does_ add up, and these models can be
| reasonable run on lower end hardware... then the moat
| ceases to exist and there 'll be dozens of providers. So
| the valuation of e.g. Anthropic makes little sense to me.
|
| Like I said, I'm using the Claude Code tool/front-end
| pointing against the page-per-use DeepSeek platform API, it
| costs a fraction of what Anthropic is charging, and feels
| to me like the quality is about 80% there... So ...
| Wowfunhappy wrote:
| > But, no, I'm given to understand they couldn't run e.e.
| the DeepSeek 3.2 model full size because there simply
| isn't enough GPU RAM still.
|
| My RTX 4080 only has 16 GB of VRAM, and gpt-oss 120b is
| 4x that size. It looks like Ollama is actually running
| ~80% of the model off of the CPU. I was made to believe
| this would be unbearably slow, but it's really not, at
| least with my CPU.
|
| I can't run the full sized DeepSeek model because I don't
| have enough system memory. That would be relatively easy
| to rectify.
|
| > And once it does add up, and these models can be
| reasonable run on lower end hardware... then the moat
| ceases to exist and there'll be dozens of providers.
|
| This is a good point and perhaps the bigger problem.
| consumer451 wrote:
| > I keep searching for a sign, but everyone I talk to has
| horror stories. It sucks as a technologist that just wants to
| play with the thing; oh well.
|
| The reason that Claude Code doesn't have an IDE is because ~"we
| think the IDE will obsolete in a year, so it seemed like a
| waste of time to create one."
|
| Noam Shazeer said on a Dwarkesh podcast that he stopped
| cleaning his garage, because a robot will be able to do it very
| soon.
|
| If you are operating under the beliefs these folks have, then
| things like IDEs, cleaning up, and customer service are stupid
| annoyances that will become obsolete very soon.
|
| _To be clear, I have huge respect for everyone mentioned
| above, especially Noam._
| chairmansteve wrote:
| "Noam Shazeer said on a Dwarkesh podcast that he stopped
| cleaning his garage, because a robot will be able to do it
| very soon".
|
| How much is the robot going to cost in a year? 100k? 200k?
| Not mass market pricing for sure.
|
| Meanwhile, today he could pay someone $1000 to clean his
| garage.
| consumer451 wrote:
| I would do it for free, just to answer the question of what
| does a genius of his caliber have in his garage? Probably
| the same stuff most people do, but it would still be
| interesting.
|
| I don't think the point was about having a clean space, it
| was in response to a question along the lines of: when do
| you think we will achieve AGI?
| y-curious wrote:
| Trust me, I'm a genius of his caliber. Want to clean my
| garage? You free next week?
| Thrymr wrote:
| > Noam Shazeer said on a Dwarkesh podcast that he stopped
| cleaning his garage, because a robot will be able to do it
| very soon.
|
| We all come up with excuses for why we haven't done a chore,
| but some of us need to sound a bit more plausible to other
| members of the household than that.
|
| It would get about the same reaction as "I'm not going to
| wash the dishes tonight, the rapture is tomorrow."
| consumer451 wrote:
| I want to make it very clear that this was a lighthearted
| response from Noam to the "AGI timeline" question.
|
| Noam does not do a lot of interviews, and I really hope
| that stuff like my dumb comment does not prevent him from
| doing more in the future. We could all learn a lot from
| him. I am not sure that everyone understands everything
| that this man has given us.
| redhale wrote:
| What happened? I'm a Max subscriber and I'd like to know what
| to look out for!
| informal007 wrote:
| bad customer service comes from low priority. I think anthropic
| prioritize new growth point over small number of customer's
| feedback, that's why they publish new product, features so
| frequently, there are so much possible potential opportunities
| for them to focus
| Yizahi wrote:
| Customer service at B2C companies can only go downhill or stay
| level. See Google, Apple, Microsoft etc. At B2B it maaaybe can
| improve, but only when a ten times bigger customer strongarms a
| company into doing it.
| empiko wrote:
| There is this homogenization happening in AI. No matter what
| their original mission was, all the AI companies are now
| building AI-powered gimmicks hoping to stumble upon something
| profitable. The investors are waiting...
| vjvjvjvjghv wrote:
| Hope it's better than what MS is currently shipping as AI.
| Everything I try to do something, the response is "sorry, I can't
| do this".
| smithkl42 wrote:
| Copilot is getting better - I'm getting fewer of those than I
| used to - but it's still significantly more stupid than other
| agents, even when in theory it's using the same model.
| throawayonthe wrote:
| R.I.P. global economy
| fudged71 wrote:
| Interesting their X post mentions "pre-built Agent Skills" but
| it's not on the webpage. I wonder if they will give you the
| ability to edit/add/delete Skills, that would be phenomenal.
|
| Edit: found it on their other blog post
| https://www.anthropic.com/news/advancing-claude-for-financia...
| luccasiau wrote:
| You can add and customize skills in claude.ai and other
| surfaces
| Havoc wrote:
| They can try, but doubt anyone serious will adopt it.
|
| Tried integrating chatgpt into my finance job to see how far I
| can get. Mega jikes...millions of dollars of hallucinated
| mistakes.
|
| Worse you don't have the same tight feedback loop you've got in
| programming that'll tell you when something is wrong. Compile
| errors, unit tests etc. You basically need to walk through
| everything it did to figure out what's real and what's
| hallucinations. Basically fails silently. If they roll that out
| at scale in the financial system...interesting times ahead.
|
| Still presumably there is something around spreadsheets it'll be
| able to do - the spreadsheet equivalent of boilerplate code
| whatever that may be
| AppleBananaPie wrote:
| I'm bad with spread sheets so maybe this is trivial but having
| an llm tell me how to connect my sheet to whatever data I'm
| using at the moment and it coming up with a link or sql query
| or both has allowed me to quickly pull in data where I'd
| normally eyeball it and move on or worst case do it partially
| manually if really important.
|
| It's like one off scripts in a sense? I'm not doing complex
| formulas I just need to know how I can pull data into a sheet
| and then I'll bucketize or graph it myself.
|
| Again probably because I'm not the most adept user but it has
| definitely been a positive use case for me.
|
| I suspect my use case is pretty boilerplatey :)
| Havoc wrote:
| Good to know that it works well for that.
|
| >I'm not doing complex formulas
|
| Neither am I frankly. Finance stuff can get conceptually
| complicated even with simple addition & multiplication
| though. e.g. I deal with a lot of offshore stuff, so the
| average spreadsheet is a mix of currencies, jurisdictions and
| companies that are interlinked. I could probably talk you
| through it high level in an hour with a pen & paper, but the
| LLMs just can't see the forest for all the trees in the raw
| sheet.
| Culonavirus wrote:
| AI slop eaters will still eat it up and ask for seconds. Pigs
| in oats seeing dollar signs.
| humanfromearth9 wrote:
| This could be invaluable for reverse engineering complex
| workbooks with multiple data sources and hundreds or thousands of
| formulas.
| pumnikol wrote:
| If it has a concept of data sources and can digest them, sure.
| Anecdotally, most issues with Excel at my job are caused by
| data sources being renamed, moved or reformatted, by broken
| logins, or by insufficient access rights.
| keernan wrote:
| If AI turns out to be the powerhouse it is claimed to be, AI's
| impact will be corporations replacing corporate dependencies upon
| 'Excel projects' created by self-taught assistants to department
| managers.
| travisgriggs wrote:
| As I was reading through the post, and the comments here, and
| pondering my own many hours with these tools, I was suddenly
| reminded of one of my favorite studio C sketches: An Unfortunate
| Fortune
|
| https://www.youtube.com/watch?v=SF-psoWdSpo
|
| Curious, if others see the connection. :D
| davidpolberger wrote:
| I'm a co-founder of Calcapp, an app builder for formula-driven
| apps using Excel-like formulas. I spent a couple of days using
| Claude Code to build 20 new templates for us, and I was blown
| away. It was able to one-shot most apps, generating competent,
| intricate apps from having looked at a sample JSON file I put
| together. I briefly told it about extensions we had made to Excel
| functions (including lambdas for FILTER, named sort type enums
| for XMATCH, etc), and it picked those up immediately.
|
| At one point, it generated a verbose formula and mentioned, off-
| handedly, that it would have been prettier had Calcapp supported
| LET. "It does!", I replied, "and as an extension, you can use :=
| instead of , to separate names and values!") and it promptly
| rewrote it using our extended syntax, producing a sleek formula.
|
| These templates were for various verticals, like real estate,
| financial planning and retail, and I would have been hard-pressed
| to produce them without Claude's domain knowledge. And I did it
| in a weekend! Well, "we" did it in a weekend.
|
| So this development doesn't really surprise me. I'm sure that
| Claude will be right at home in Excel, and I have already thought
| about how great it would be if Claude Code found a permanent home
| in our app designer. I'm concerned about the cost, though, so I'm
| holding off for now. But it does seem unfair that I get to use
| Claude to write apps with Calcapp, while our customers don't get
| that privilege.
|
| (I wrote more about integrating Claude Code here:
| https://news.ycombinator.com/item?id=45662229)
| unshavedyak wrote:
| Dumb question, but is this Claude for Excel the.. app? The
| webapp? Does it work on Google sheets? etc
|
| There are quite a few spreadsheet apps out there, just curious
| what their implementation is or how it's implemented to work with
| multiple apps.
|
| I always find Excel (and the Office ecosystem) confusing heh.
| p_ing wrote:
| Modern Excel add-ins work in desktop Windows, macOS, and web.
| They're just a bit of XML that Excel looks at to call a
| whatever web endpoint is defined in the XML.
| rahimnathwani wrote:
| How is this different from the existing Claude skill, that uses a
| prompt and pandas to edit an Excel file?
|
| https://github.com/anthropics/skills/blob/main/document-skil...
| shooker435 wrote:
| This isn't built for Excel users who use Github and Claude
| Skills, it's built for Excel users who would run away from Git
| commands.
| rahimnathwani wrote:
| The Claude skill I linked to is built into the Claude desktop
| client. You just attach an Excel file to your chat and ask
| away.
|
| I linked to the skill prompt just to more clearly explain the
| approach that's currently available to all Claude users.
|
| It requires zero familiarity with git or command line.
| mamonster wrote:
| On the one hand, most financial companies have a lot of processes
| in Excel that could be made better with something like Claude.
|
| Banking secrecy laws + customer identifying data + AI tool = No
| bueno.
| grim_io wrote:
| If this works very well and reliable, it might not kill
| programming as such, but it might put a lot of small businesses
| who do custom software for other small businesses out of work.
|
| The HN bubble might not realize the implications.
| surume wrote:
| Checkmate, Altman
| kaspermarstal wrote:
| So cool, I hope they pull it off. So many people use Excel.
| Although, I always thought the power of AI in Excel would come
| from the ability to use AI _as_ a formula. For example,
| =PROMPT("Classify user feedback as positive, neutral or
| negative", A1). This would enable normal people (non-programmers)
| to fire off thousands of prompts at once and automate workflows
| like programmers do (disclaimer: I am the author of Cellm that
| does exactly this). Combined with Excel's built-in functions for
| deterministic work, Claude could really kill the whole copy-
| pasting data in and out of chat windows for bulk-processing data.
| starik36 wrote:
| I can't wait until someone does this, then autofills 50k rows
| down, then gets a $50k bill for all the tokens.
|
| Reminds me of when our CIO insisted on moving to the cloud
| (back when AWS was just getting started) and then was super
| pissed when he got a $60k bill because no one knew to shutdown
| their VMs when leaving for the day.
| kaspermarstal wrote:
| If someone is processing 50k rows, that means they found real
| value and the UX is working. That's the whole point.
|
| Also, 50k rows wouldn't cost $50k. More like $100 with Sonnet
| 4.5 pricing and typical numbers of input/output tokens.
| Imagine the time needed to go through 50k rows manually and
| math doesn't really work for a horror story.
| NotMichaelBay wrote:
| You may already be aware but Microsoft recently released a
| COPILOT() function that does this:
| https://support.microsoft.com/en-us/office/copilot-function-...
| kaspermarstal wrote:
| Thanks, appreciate it. Indeed, and Anthropic did something
| similar for Google sheets a year ago. I am dying to know why
| they decided this should not be part of their excel effort.
| They obviously put a lot of work and thought into claude for
| excel so it must be intentional.
|
| Anyone from Anthropic here that would like elaborate?
| btown wrote:
| From the signup form mentioning Private Equity / Venture Capital,
| Hedge Fund, Investment Banking... this seems squarely aimed at
| financial modeling. Which is really, really cool.
|
| I've worked alongside sell-side investment bankers in a prior
| startup, and so much of the work is in taking a messy set of
| statements from a company, understanding the underlying
| assumptions, and building, and rebuilding, and rebuilding,
| 3-statement models that not only adhere to standard conventions
| (perhaps best introed by
| https://www.wallstreetprep.com/knowledge/build-integrated-3-... )
| but also are highly customized for different assumptions that can
| range from seasonality to sensitivity to creative deal
| structures.
|
| It is quite common for people to pull many, many all-nighters to
| try to tweak these models in response to a senior banker or a
| client having an idea! And one might argue there are way too many
| similar-looking numbers to keep a human banker from
| "hallucinating," much less an LLM.
|
| But fundamentally, a 3-statement model and all its build-sheets
| are a dependency graph with loosely connected human-readable
| labels, and that means you can write tools that let an LLM crawl
| that dependency graph in a reliable and semantically meaningful
| way. And _that_ lets you build really cool things, really fast.
|
| I'm of the opinion that giving small companies the ability to
| present their finances to investors, the same way Fortune 500
| companies hire _armies_ of bankers to do, is vital to a healthy
| economy, and to giving Main Street the best possible chance to
| succeed and grow. This is a massive step in the right direction.
| JonChesterfield wrote:
| Presenting your finances to investors via a tool designed for
| generation of plausible looking data is fraud.
| ceh123 wrote:
| Presenting false data to investors is fraud, doesn't matter
| how it was generated. In fact, humans are quite good at
| "generating plausible looking data", doesn't mean human
| generated spreadsheets are fraud.
|
| On the other hand, presenting truthful data to investors is
| distinctly not fraud, and this again does not depend on the
| generation method.
| alfalfasprout wrote:
| If humans "generate plausible looking data" despite any
| processes to ensure data quality they've likely engaged in
| willful fraud.
|
| An LLM doing so needn't even be willful from the author's
| part. We're going to see issues with forecasts/slide decks
| full of inaccuracies that are hard to review.
| ceh123 wrote:
| I think my main point is just because an LLM can lie,
| doesn't necessarily mean an LLM generated slide is fraud.
| It could very easily be correct and verified/certified by
| the accountant and not fraud. Just cuz the text was
| generated first by an LLM doesn't mean fraud.
|
| That being said, oh for sure this will lead to more
| incidental fraud (and deliberate fraud) and I'm sure it
| already has. Would be curious to see the prevalence of
| em-dash's in 10k's over the years.
| lionkor wrote:
| > doesn't matter how it was generated
|
| is there precedent for this supposed ruling?
| ceh123 wrote:
| US v Simon 1969, see [0] for a review.
|
| Establishes that accountants who certify financials are
| liable if they are incorrect. In particular, if they have
| a reason to believe they might not be accurate and they
| certify anyway they are liable. And at this stage of
| development it's pretty clear that you need to double
| check LLM generated numbers.
|
| Obviously no clue if this would hold up with today's
| court, but I also wasn't making a legal statement before.
| I'm not a lawyer and I'm not trying to pretend to be one.
|
| [0] https://scholarship.law.stjohns.edu/cgi/viewcontent.c
| gi?arti...
| lionkor wrote:
| Fascinating thank you for the link
| Kydlaw wrote:
| You might have accidentally described what accounting is.
| btown wrote:
| Completely understand the sentiment, but it doesn't apply
| here, because what's being generated are formulas!
|
| Standardized 3-statement models in Excel are designed to be
| auditable, with or without AI, because (to only slightly
| simplify) every cell is either a blue input (which must come
| from standard exports of the company's accounting books,
| other auditable inventory/CRM/etc. data, or a visible
| hardcoded constant), or a black formula that cannot have
| hardcoded values, and must be simple.
|
| If every buyer can audit, with tools like this, that the
| formulas match the verbal semantics of the model, there's
| even less incentive than there is now to fudge the formula
| level. (And with Wall Street conventions, there's nowhere to
| hide a prompt injection, because you're supposed to keep
| every formula to only a few characters, and use breakout
| "build" rows that can themselves be visually audited.)
|
| And sure, you could conceivably use any AI tool to generate a
| plausible list of numbers at the input level, but that was
| equally easy, and equally dependent on context to be
| fraudulent or not, ever since that famous Excel 1990 elevator
| commercial: https://www.youtube.com/watch?v=kOO31qFmi9A&t=61s
|
| At the end of the day, the difference between "they want to
| see this growth, let's fudge it" and "they want to see this
| growth, let's calculate the exact metrics we need to hit to
| make that happen, and be transparent about how that's
| feasible" has always been a matter of trust, not technology.
|
| Tech like this means that people who want to do things the
| right way can do it as quickly as people who wanted to play
| loose with the numbers, and that's an equalizer that's on the
| right side of history.
| ed_elliott_asc wrote:
| I use excel but not for financial modelling, I'll use it
| mainecoder wrote:
| Yeah now tell the Auditors that the financial spreadsheet we have
| here has AI touching it left and right. "I did not cook the books
| I promise it is the AI that made our financials seem better than
| they actually are trust me bro!", said Joe from Accounting.
| JonChesterfield wrote:
| The thing really missing from multi-megabyte excel sheets of
| business critical carnage was a non-deterministic rewrite tool.
| It'll interact excitingly with the industry standard of no
| automated testing whatsoever.
|
| I 100% believe generative AI can change a spreadsheet. Turn the
| xslx into text, mutate that, turn it back into an xslx, throw it
| away if it didn't parse at all. The result will look pretty
| similar to the original too, since spreadsheets are great at
| showing immediately local context and nothing else.
|
| Also, we've done a pretty good job of training people that
| chatgpt works great, so there's good reason for them to expect
| claude for excel to work great too.
|
| I'd really like the results of this to be considered negligence
| with non-survivable fines for the reckless stupidity, but more
| likely, it'll be seen as an act of god. Like all the other broken
| shit in the IT world.
| patife wrote:
| Fodasse a Rows e pelo menos 3x melhor
| supermalvo wrote:
| 100%
| gwbas1c wrote:
| I wonder if this will be more/less useful than what we have with
| AI in software development.
|
| There's a lot less to understand than a whole codebase.
|
| I don't do spreadsheets very often, but I can emphasize with
| tracking down "Trace #REF!, #VALUE!, and circular reference
| errors to their source in seconds." I once hit something like
| that, and I found it a lot harder to trace a typical compiler
| error.
| wonderwonder wrote:
| Been working with Claude Code lately and been pretty impressed.
| If this works as well could be a nice add on. Its probably a
| smart market to enter as Excel is essentially everywhere.
|
| Just like Claude Code allows 1 dev to potentially do the work of
| 2 or 3, I could see this allowing 1 accountant or operations
| person to do the work of 2 or 3. Financial savings but human cost
| NumberCruncher wrote:
| On the first glance this seems to be a very bad idea. But re-
| readig this:
|
| > Get answers about any cell in seconds: Navigate complex models
| instantly. Ask Claude about specific formulas, entire worksheets,
| or calculation flows across tabs. Every explanation includes
| cell-level citations so you can verify the logic.
|
| this might just be an excellent tool for refactoring Excel sheets
| into something more robust and maintainable. And making a bunch
| of suits redundant.
| lionkor wrote:
| There's already a language for this, or multiple, that isn't
| English. Not having to use this language is NOT going to make
| anything better.
|
| It will, however, make people resort more quickly to "I guess
| it's just not possible if Claude can't figure it out".
| teddyh wrote:
| "Copilot in Excel is a global financial crisis waiting to
| happen."
|
| -- Zack Korman,
| <https://x.com/ZackKorman/status/1974828240679166396>
| ada1981 wrote:
| Can we get it in Sheets?
| frankacter wrote:
| for sheets, Gemini exists already natively.
|
| Alternatively, Perplexity Comet browser (or OpenAI Atlas) would
| presumably provide sidebar functionality to act within your
| spreadsheets.
| alex43578 wrote:
| On a related note, has anyone found a good local LLM option for
| working with Excel files?
|
| Here's my use case: I have a set of responses from a survey and
| want to perform sentiment analysis on them, classify them, etc.
| Ideally, I'd like to feed them one at a time to a local LLM with
| a prompt like: "Classify this survey response as positive,
| negative, or off-topic...etc".
|
| If I dump the whole spreadsheet into ChatGPT, I found that
| because of the context window, it can get "lazy"; while with a
| local LLM, I could just literally prompt it one row at a time to
| accomplish my goal, even if it takes a little longer in terms of
| GPU and wall-clock time.
|
| However, I can't find _anything_ that works off the shelf like
| this. It seems like a prime use case for local models.
| santadays wrote:
| Don't know about excel, but for Google Sheets. You can ask
| chatgpt to write you a appsscript custom function e.g
| CALL_OPENAI. Then you can pass in variables into.
| =CALL_OPEN("Classify this survey response as positive,
| negative, or off-topic: "&A1)
| thisguy47 wrote:
| Sheets also has an `AI` formula now that you can use to
| invoke Gemini models directly.
| santadays wrote:
| When I tried the Gemini/AI formula it didn't work very
| well, gpt-5 mini or nano are cheap and generally do what
| you want if you are asking something straightforward about
| a piece of content you give them. You can also give a json
| schema to make the results more deterministic.
| sexy_seedbox wrote:
| Cellm + Ollamma?
|
| https://docs.getcellm.com/models/local-models
| alex43578 wrote:
| That looks like a great fit! Not sure how I missed it, but I
| appreciate the link.
| dosnem wrote:
| Anyone understand how this could work? My mental model for llm is
| predictive text but here how can it understand cell A1 which has
| a string is the "header" for all values under it? How does it
| learn to understand table data like that?
| bonsai_spool wrote:
| > Anyone understand how this could work? My mental model for
| llm is predictive text but here how can it understand cell A1
| which has a string is the "header" for all values under it? How
| does it learn to understand table data like that?
|
| I imagine it uses the new Agent Skills features
|
| https://www.anthropic.com/news/skills
| brookst wrote:
| LLMs already understand table data. "Predictive text" is
| somewhat true but so reductive that it leads to that kind of
| misconception.
|
| HN is going to mangle this but here's a quick table:
|
| | Type of Horse | Average Height | Typical Color |
| |----------------|----------------|-----------------| | Arabian
| | 15 hh | Bay, Gray | | Thoroughbred | 16 hh | Chestnut, Bay |
| | Clydesdale | 17.5 hh | Bay with White | | Shetland Pony |
| 10.5 hh | Black, Chestnut |
|
| And after a prompt "pivot the table so rows are colors":
|
| | Typical Color | Type of Horse | Average Height | |-----------
| -----|----------------------------------------|----------------
| -------| | Bay | Arabian, Thoroughbred, Clydesdale | 15 hh, 16
| hh, 17.5 hh | | Gray | Arabian | 15 hh | | Chestnut |
| Thoroughbred, Shetland Pony | 16 hh, 10.5 hh | | Bay with White
| | Clydesdale | 17.5 hh | | Black | Shetland Pony | 10.5 hh |
| flowingfocus wrote:
| Version control and meaningful diffs for .xlsx will be in high
| demand in a few months
| andyferris wrote:
| Honestly those things are well past due - if this tips the
| scales then I hope we can all benefit.
| jprd wrote:
| If Claude is going to work on the underpinning technology of
| every business in the Capitalist world, We should let Claude
| loose on the COBOL code out there too, I can't imagine anything
| going wrong.
| StarterPro wrote:
| HA!
|
| I've worked at MULTIPLE million dollar firms whose entire
| business relies on 10 Excel workbooks that were created 30 years
| ago by a person who is either passed on or retired.
|
| Give users who aren't intimately knowledgeable with their source
| material ai, and you're asking for trouble.
|
| The undo function has a history limit.
|
| The real issue is: at what point are we going to stop chasing
| efficiency and profit at the sake of humanity?
|
| Claude and OpenAI are built on stretched truths, stolen
| creativity and what-if statements.
| voidmain0001 wrote:
| From their FAQ "Claude doesn't have advanced Excel capabilities
| including pivot tables, conditional formatting, data validation,
| data tables, macros, and VBA. We're actively working on these
| features."
| 6thbit wrote:
| Last week OpenAI hired ex-investment bankers to train a model to
| build financial models, now anthropic coming for excel.
|
| Sounds like there's some sort of an AI race after finance people
| and businesses?
| SteveLauC wrote:
| I really hope that all these kinds of integrations:
|
| * Claude for Chrome * Gemini for Chrome * ChatGPT Atlas * ...
|
| will be built on top of the ACP protocol, so that these "AI
| extensions" to everything can become standardized
| xouse wrote:
| I'm decent at excel, but not amazing. I've tried again and again
| to use LLMs including Claude to solve specific, small, well
| defined problems in excel with a 0% success rate. My experience
| so far has been if I can't do it LLMs can't either.
|
| If LLMs are a 6/10 right now at basic coding then they're a 3/10
| at excel from my experience.
| NotMichaelBay wrote:
| What kinds of problems in Excel are you trying to solve? Just
| curious as I'm also building an AI Excel addin, as a side
| project. :)
| klausnrooster wrote:
| I'd like to see it compete in the Financial Modeling World Cup,
| say in Las Vegas this December. https://excel-esports.com
| sherinjosephroy wrote:
| Pretty cool idea -- AI inside spreadsheets makes sense since most
| of our work already lives there. But I'm a bit cautious too --
| spreadsheets are messy enough, and adding probabilistic AI could
| make mistakes harder to spot. Useful if done right, risky if not.
| user3939382 wrote:
| Anthropic knows almost nothing about their own products and you
| guys know even less.
| anshulbhide wrote:
| Just spent an hour trying to figure out how to create a waterfall
| chart. ChatGPT's python interpreter failed.
|
| If this works right, this could be a game changer.
| fragmede wrote:
| https://chatgpt.com/share/69005eec-6ee0-8009-a8d3-ebb1c30e72...
|
| took me four prompts to do generate _a_ waterfall chart using
| d3 js because it didn 't want to run it. obviously with real
| numbers and not generated data, you'd need to check the results
| thoroughly.
| scrappyjoe wrote:
| Maybe this is how we get code versioning for Excel.
|
| Git LFS for workbook + the following prompt :
|
| "Create a commit explains what has changed in the workbook since
| the last commit. Be brief, but explain the change in business
| terms as well as code change terms."
| bugsense wrote:
| That's it. 1T EV added.
| d4rkp4ttern wrote:
| Weird to see so much discussion when it's still behind a
| waitlist. And it seems aimed at "enterprise" only
| theshrike79 wrote:
| The best thing that can come from this is unit tests for Excel.
|
| LLMs work best when they can call tools (edit the sheet) and test
| their results in a loop.
|
| It's like the "value seek" thing Excel has had since forever;
| "adjust these values until this cell is X"
|
| Excel doesn't have any way to verify that every formula in that
| 60k line sheet is correct and someone hasn't accidentally
| replaced one with a static number for example.
| filearts wrote:
| In a previous professional life, I did financial modelling for
| a big 4 accounting firm. We had tooling that allowed us to
| visualize contiguous ranges of identical formulas (if you
| convert formulas to R1C1 addressing, similar formulas have the
| same representation). This allowed for overrides to stick out
| like a sore thumb.
|
| I suspect similar tools could be made for Claude and other LLMs
| except that it wouldn't be plagued by the mind-numbing tedium
| of doing this sort of audit.
| hufdr wrote:
| AI can definitely save time, but sometimes it hides the real
| problems. Most spreadsheet issues aren't math errors they're
| logic messes. Claude can fix your sheet, but it can't fix your
| company culture.
___________________________________________________________________
(page generated 2025-10-28 23:02 UTC)