[HN Gopher] SpreadsheetLLM: Encoding Spreadsheets for Large Lang...
___________________________________________________________________
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Author : goplayoutside
Score : 175 points
Date : 2024-07-17 12:16 UTC (2 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| goplayoutside wrote:
| https://venturebeat.com/ai/microsofts-new-ai-system-spreadsh...
| pavel_lishin wrote:
| Ah yes, Excel, the piece of software that already famously
| mangles data, is now going to be glued to software that also
| famously mangles data.
|
| Honestly, though, I kind of kid - I _love_ spreadsheets, and if
| this _actually works_ , it could be interesting. God help whoever
| needs to troubleshoot the hallucinated results - it's already
| hard enough to figure out what byzantine knotwork someone created
| using existing Excel functions, but now we'll have to also guess
| and second-guess layers of prompts that were used to either
| generate those same functions, or just generate output that got
| mulched through some AI black-box.
| Obscurity4340 wrote:
| Spreadsheets have such a way of making chaotic things more
| clear. I wonder if there's any work on spreadsheets as a
| multidimensional thinking tool
| laborcontract wrote:
| Indeed. I accidentally taught myself linear algebra through
| spending a ridiculous amount of time in excel. I only
| realized that after taking a linear algebra class and feeling
| helpless.. until I mentally remapped the concepts into excel
| space, after which it all became easy.
| mettamage wrote:
| Could you give an example?
| laborcontract wrote:
| Sure. So back in an old finance job I was given a whole
| bunch of portfolio modeling spreadsheets that was a huge
| mess of ad hoc column so which drove me nuts so
| everything started with me learning how to use arrays,
| which significantly reduced the complexity of basic data
| transformations.
|
| But then I wanted to analyze all our portfolio data over
| time so i had to figure out then how handle multi
| dimensionality in my spreadsheets. Then I figured out how
| to integrate and transform and reduce portfolio
| characteristics into sensible components for risk
| management and portfolio optimizations across different
| asset classes.
|
| I figured out how to do some absolutely ridiculous stuff
| in excel, it's tough for me to think of tools that
| scratch the surface me if l that is nearly as good at
| helping working through
| surfingdino wrote:
| I am calmly waiting for the SEC to rip them a new hole the size
| of Manhattan when hallucinated spreadsheets inevitably make their
| way into listed companies' reports.
| bongodongobob wrote:
| Well you'll be waiting a very very long time.
| rsynnott wrote:
| That would be on the companies (or their auditors), not
| Microsoft, in general. Clearly, no-one should ever _use_ this,
| should it ever make it out of research-land, but there's not
| that much obvious risk to _making_ it as long as they're honest
| about the risks.
| SkyBelow wrote:
| If it could spit out the analysis as spreadsheets that used
| standard formulas and only use the LLM to generate the
| formulas, it could be verified. Errors would slip through,
| but no worse than people applying the wrong formula based on
| a quick internet search that calculates a close but incorrect
| answer.
| KoolKat23 wrote:
| Agreed, The workflow could also include a person checking
| it before submitting, just skimming through for errors.
| RodgerTheGreat wrote:
| There may not be much risk from a legal culpability
| perspective if they make the appropriate disclosures
| somewhere in the depths of a EULA, but even so it is a
| failure of professional ethics to build tools which are
| "dangerous at any speed" and inflict them upon the world.
| surfingdino wrote:
| They want to make back the money they invested in OpenAI.
| We'll be seeing similar "research" (repackaging ChatGPT)
| from Microsoft for a while.
| rsynnott wrote:
| Oh, I don't disagree, and I don't think burying it in the
| EULA would necessarily be sufficient (especially in Europe,
| where the courts and regulators have tended to take a dim
| view of "but we told you, in three-point type on page 473
| in the middle of the trademark acknowledgements"). But
| ultimately the blame for using known-unreliable tools is
| largely on the user.
| michaelmior wrote:
| > Clearly, no-one should ever _use_ this
|
| ...without validating the results. Otherwise why should we
| ever use LLMs for anything?
| surfingdino wrote:
| That's not the marketing message at the moment. I see ads
| for AI (LLM) powered services and they all say the same
| thing, "Stressed? Not enough time? Let AI do it faster so
| you can do more." AI is sold as a tool that can do things
| faster than a human can and since LLMs do not provide
| reference information, there is no telling where they got
| the data from and no way to verify it.
| michaelmior wrote:
| > LLMs do not provide reference information
|
| Many do, although it's true that's often not the case.
| rsynnott wrote:
| Validating that a complex spreadsheet is correct is
| notoriously extremely difficult at the best of times;
| unfortunately they are about the closest thing to a write-
| only language in common use, and you really have to front
| load a lot more care than you do in conventional modern
| languages. The usual safeguards of testing and code review
| are essentially absent.
|
| I'm sceptical that anyone really _should_ be using
| generative AI for anything where correctness matters at
| all, but spreadsheets in particular seem close to a worst-
| case scenario.
| bsenftner wrote:
| I've found that all the top foundation models already understand
| spreadsheets very well, as well as all the functions, as well as
| all the common spreadsheet problems people run into using them.
| The Internet is chock full spreadsheet support forums and
| tutorials, and the foundation models have all been trained on
| this data.
|
| With not very much effort, one can explain to an LLM "here is a
| spreadsheet, formatted as..." which takes about 150 word tokens,
| and then not much more mental effort in your favorite language to
| translate an arbitrary spreadsheet into that format, and one gets
| a very capable LLM interface that can help explain complex
| arbitrary spreadsheets as well as generate them on request.
|
| I've got finance professionals and attorneys using a tool I wrote
| doing this to help them understand and debug complex spreadsheets
| given to them by peers and clients.
| ec109685 wrote:
| The issue was that before, large spreadsheets would overflow
| the context so this "compression" technique helps the LLM do
| more from the same data.
| bsenftner wrote:
| Which strikes me as an ingenious method of locking in their
| customers with a proprietary compressed format only their
| finetuned LLMs can parse.
| fimdomeio wrote:
| Congratulations everyone, we can now automate the next global
| financial crisis.
| lainga wrote:
| Now imagine an LLM trained on LLM web content. We call that an
| AIslop-squared.
| victor9000 wrote:
| Why on earth would you task a non-deterministic technology with
| data persistence?
| bubblyworld wrote:
| The universe is fundamentally a non-deterministic technology,
| friend. We do what we can =D
| ffhhj wrote:
| Waiting for the hallucinate formula:
|
| =HAL(9000)
| chatmasta wrote:
| At Databricks summit there was a nice presentation [0] by the CEO
| of V7 labs who showed a demo of their LLM + Spreadsheet product.
|
| The kneejerk reaction of "ugh, LLM and spreadsheet?!" is
| understandable, but I encourage you to watch that demo. It makes
| clear some obvious potentials of LLMs in spreadsheets. They can
| basically be an advanced autofill. If you've used CoPilot in
| VSCode, you understand the satisfaction of feeling like an LLM is
| thinking one step ahead of you. This should be achievable in
| spreadsheets as well.
|
| [0] https://youtube.com/watch?v=0SVilfbn-HY&t=1251 (queued to
| demo at 20:51)
| Kiro wrote:
| Thank you. Tired of the usual jokers in threads like this.
| Right now the majority of comments are all sarcastic snark.
| nforgerit wrote:
| We jokers are tired as well
| grape_surgeon wrote:
| I'm new here; Hacker News is supposed to avoid the modern
| Reddit trap but feel it often falls into it. The topics are
| more relevant to me but the comments are often unbearably
| cynical and excessively dismissive
| bubblyworld wrote:
| Yeah, I feel like there's a real culture problem on HN
| right now, especially for topics that have received any
| degree of hype (AI and crypto, mainly). People can be
| excessively rude for no reason if you express an outsider
| view. Gets to the point where I can't trust anyone here to
| engage with me in good faith (the exceptions are a welcome
| blessing).
|
| I've come close to blocking the site on my network many
| times but it's an absolute goldmine of interesting info
| too... I'm not really sure if there's a solution, other
| than to practice emotionally disengaging from internet
| discussions.
| delusional wrote:
| I don't think I understand that demo. It shows him using some
| built-in workflow thing (which isn't generally considered a
| core part of a spreadsheet) and then asks some LLM about the
| total price (I guess asking it to do math, which LLM's are
| notoriously bad at), but instead it looks like he gets some
| responses telling him what the term "total price" means, in
| prose that doesn't fit in the cells.
|
| What was i supposed to take away from that demo?
| jemmyw wrote:
| The llm doesn't do the math. It outputs something the app
| then interrupted into a cell configuration with sums filled
| in. This is an area where llms can be quite good, you type
| out how you want to report the data like "give me subtotals
| of column F at every month of the date column E and a grand
| total of F at the bottom"
|
| Except sometimes you can't seem to stop the prose.
| bongodongobob wrote:
| The comments here are absolute existential crisis. "Only I do
| spreadsheets good!" I agree, this looks really neat.
| ssl-3 wrote:
| Seems like a reasonably-cromulent use-case -- or at least, it
| fits in with my own uses of LLMs.
|
| I suck at spreadsheets. I know they can do both useful and
| amazing things, but my daily life does not revolve around
| spreadsheets and I simply do not understand most of the syntax
| and operations required to make even fairly basic things work.
| It requires a lot of time and effort for me to get simple
| things done with a spreadsheet on the rare occasion that I need
| to manipulate one.
|
| There are things in life that I am very good at; spreadsheets
| are simply not amongst them.
|
| But _do_ I know what I want, and I generally even have a
| ballpark idea of what the results should look like, and how to
| calculate it by hand [ _horror_ ]. I just don't always know how
| to articulate it in a way that LibreOffice or Google Sheets or
| whatever can understand.
|
| LLMs have helped to bridge that gap for me, but it's a pain in
| the ass: I have to be very careful with the context that I give
| the LLM (because garbage in is garbage out).
|
| But in the demo, the LLM has the context already. This skips a
| ton of preamble setup steps to get the LLM ready to provide
| potentially-useful work, and moves closer to just making a
| request and getting the desired output.
|
| Having one unified interface saves even more steps.
|
| (And no, this isn't for everyone.)
| __loam wrote:
| > you understand the satisfaction of feeling like an LLM is
| thinking one step ahead of you
|
| Yes, "satisfaction"
| userbinator wrote:
| _If you've used CoPilot in VSCode, you understand the
| satisfaction of feeling like an LLM is thinking one step ahead
| of you._
|
| Tried it once. Didn't get "satisfaction"; instead felt deeply
| irritated by the "backseat driver". Maybe it works better if
| you're just churning out mediocre boilerplate.
| Closi wrote:
| I'm churning out mediocre boilerplate here and it's working
| great! Managing to build things crazy fast.
| rsynnott wrote:
| Yeah, for me it feels, as a proposition, and having used it
| briefly before turning it off, like "what if you could pair
| program with an ultra-confident, yet dangerously incompetent,
| intern, forever"?
| dr_dshiv wrote:
| ... with unlimited stamina, patience and capacity for
| negative feedback. If it was forever, you'd probably learn
| to take advantage of that resource!
| thomashop wrote:
| Like any tool, you need to learn how to use it and iterate
| with it. Trying it once is not enough.
| mwadhera wrote:
| See also: https://matrices.app
| usrbinbash wrote:
| > If you've used CoPilot in VSCode, you understand the
| satisfaction of feeling like an LLM is thinking one step ahead
| of you
|
| That "satisfaction" vanished pretty damn quickly, once I
| realised that I have often more work correcting the stuff so
| generated than I would have had writing it myself in the first
| place.
|
| LLMs in programming absolutely have their uses, Lots of them
| actually, and I don't wanna miss them. But they are not
| "thinking ahead" of the code I write, not by a long shot.
| nunodonato wrote:
| They are with me. And with many other people. Perhaps it's
| the quality of your code that is preventing better
| completions.(Or the lang you use?)
|
| There are a few things that really help the AI to understand
| what you want to do, otherwise it might struggle and come up
| with not so good code.
|
| Not to say it gets it right everytime, but definitely often
| enough for me not to even consider turning it off. The time
| save has been tremendous.
| EGreg wrote:
| I can see vague blaming the person becoming more the norm
| when LLMs are responsible for precrime and restrictions
| etc.
|
| "Oh you couldn't take a train to work? Must have been
| something you did, the Palantir is usually great and helps
| our society. It always works great for me and my friends."
| bongodongobob wrote:
| Nah, that's not it. It's more like complaining that
| someone has to drive the train and therefore "is
| completely useless to me, it can't read my mind so it's
| trash".
| ramraj07 wrote:
| I really don't know what the detractors of Copilot are
| writing, the next StuxNet? Whether I'm doing stupid EDA or
| writing some fairly original frameworks Copilot has always
| been useful to me writing both boilerplate code as well as
| completing more esoteric logic. There's definitely a slight
| modification I have made in how I type (making variable names
| obvious, stopping at the right moment knowing copilot will
| complete the next etc) but if anything it has made me a
| cleaner programmer who writes 50% less characters at the
| minimum.
| fhd2 wrote:
| While it could be that you and them work on different kinds
| of code, I believe it's just as likely that you're just
| different people with different experience and
| expectations.
|
| A "wow, that's a great start" to one could be a "damn
| there's an issue I need to fix with this" to another. To
| some, that great start really makes them more productive.
| To others that 80% solution slows them down.
|
| For some reason, programmers just love to be zealots and
| run flamewars to promote their tool of choice. Probably
| because they genuinely experience it's fantastic for them,
| and the other guy's tool wasn't, and they want them to see
| the light, too.
|
| I prefer to judge people on the quality of their output,
| not the tools they use to produce it. There's evidently
| great code being written with uEmacs (Linux, Git), and I
| assume that, all the way on the other end of the spectrum,
| there's probably great code being written with VSCode and
| Copilot.
| hlfshell wrote:
| In my experience using LLMs like CoPilot:
|
| Web server work in Go, Python, and front end work in
| JavaScript - it's pretty good. Only when I try to do
| something truly application specific that it starts to get
| tripped up.
|
| Multi threading python work - not bad, but occasionally
| makes some mistakes in understanding scope or appropriate
| safe memory access, which can be show stopping.
|
| Deep learning, computer vision work - it gets some common
| pytorch patterns down pat, and basic computer vision work
| that you'd typically find tutorials for but struggles on
| any unique task.
|
| Reinforcement learning for simulated robotics environments
| - it really struggles to keep up.
|
| ROS2 - fantastic for most of the framework code for
| robotics projects, really great and recommended for someone
| getting used to ROS.
|
| C++ work - REALLY struggles with anything beyond basic
| stuff. I was working with threading the other day and
| turned it off as all of its suggestions would never compile
| let alone do anything sensible.
| Kiro wrote:
| That's not my experience at all. I very seldom need to
| correct anything Copilot outputs.
| solumunus wrote:
| That's because you were using CoPilot. Try a much better
| option such as Supermaven. I unsubscribed from CoPilot for
| similar reasons but after using Supermaven for 3-4 months I
| will never cancel this subscription unless something better
| comes along. It's way more accurate and way faster.
| pydry wrote:
| >LLMs in programming absolutely have their uses
|
| Absolutely. LLMs let you make more programming mistakes
| faster than any other invention with the possible exceptions
| of handguns and Tequila.
|
| To be fair, it is also really good at spewing out industrial
| levels of boilerplate. As we all know, 99% of the effort in
| coding is the writing of code and the more boilerplate in
| your code base the better. /s
| Havoc wrote:
| It has some challenges ahead still:
|
| https://i.redd.it/xr8uxqayv68d1.jpeg
|
| For demos sure, but I'm not super hopeful on this frankly. LLMs
| are inherently about generating next char in sequential order.
| Nothing about real world spreadsheets is linear like that -
| they're all interlinked chaos.
| Kiro wrote:
| As the comments correctly point out that image is like 10
| years old and has nothing to do with LLMs.
| rsynnott wrote:
| > If you've used CoPilot in VSCode, you understand the
| satisfaction of feeling like an LLM is thinking one step ahead
| of you.
|
| I'm not sure it's so much 'satisfaction'; it felt more like I
| was having a stroke until I turned it off. Its suggestions
| were, like, plausibly code, but completely contextually
| nonsensical in general; frankly IntelliJ's old autocomplete-
| with-guessing functionality was better, as it at least _knows_
| a certain amount about the codebase. Now, this was on a very
| large old codebase; no doubt it's better if writing trivial new
| things.
| dimal wrote:
| > If you've used CoPilot in VSCode, you understand the
| satisfaction of feeling like an LLM is thinking one step ahead
| of you
|
| I did not get that feeling from CoPilot. I usually got the
| feeling that it was interrupting me to complete my thought but
| getting it wrong. It was incredibly annoying and distracting.
| Instead of helping me to think it was making it harder to
| think. Pair programming with an LLM has been great. Better than
| with most humans. But autocomplete sucks for me.
| christianqchung wrote:
| Goodness, I've been making a joke that AI companies are going to
| spend 500 billion dollars to make spreadsheet generators since
| 2023, and now it's becoming real. Gemini has a limited form of
| this too.
| galaxyLogic wrote:
| How will it work?
|
| I open an Excel spreadsheet and also the AI Copilot. Then
| whenever I want to do something with Excel like "Show me which
| cells have formulas" CoPilot will interact with Excel and issue
| some command I cannot remember to do that for me?
|
| Menus are good but often hard to navigate and find. So the
| CoPilot can give me a whole new (prompt-based) user-interface to
| any MS-application? Is that how it works?
| skywhopper wrote:
| Uhh, this is a paper about how to compress spreadsheet data to
| fit inside an LLM's token limits, including such novel approaches
| as ignoring exact values of numbers, meaning of data types, and
| any context outside of a detected table of values.
|
| The paper doesn't speak at all to actual uses of this approach,
| but that doesn't stop the article writer from assuming this is
| probably a big step towards automated tools that analyze
| spreadsheet data for non-numerically inclined users.
|
| This is not that.
| dang wrote:
| "The article writer" refers to
| https://news.ycombinator.com/item?id=40996429 - we merged that
| thread hither.
|
| _supporting correct context for comments everywhere_
| jimkoen wrote:
| @ludicity 's head is going to explode.
| cyanydeez wrote:
| DNA rEsearchers HAD TO STOP USING their preferrdd letter
| sequences because excel would autocorrect its typs.
|
| This.will not end. Well
| ilaksh wrote:
| Is there a github?
| janpmz wrote:
| It should become a standard for papers to publish their code.
| After all it's a publication.
| IanCal wrote:
| I love the deep technical discussions on HN, and I'm disappointed
| to see anything AI related start to just resemble Reddit threads
| of people with knee jerk reactions to the title.
|
| This is interesting, it's about how you can represent
| spreadsheets to llms.
| nunodonato wrote:
| Yes, for some reason we really have an established hate club
| around here. And the comments are usually the same thing
| everytime
| mitjam wrote:
| Spreadsheets can fill the gap between ad-hoc prompting/prompt
| workbooks and custom software for special business tasks.
|
| By using a prompt function like LABS.GENERATIVEAI in Excel you
| can create solutions that combine calculations, data, and
| Generative AI. In my experience, transforming data to and from
| CSV works best for prompting in spreadsheets. Getting data to and
| from CSV format can be done with other spreadsheet functions.
|
| I've created a book and course
| (https://mitjamartini.com/resources/ai-engineering/ebooks/han...)
| that teaches how to do this (both more beginner level). Just
| working through the examples or the examples provided by
| Anthropic for Claude for Sheets should be enough to get going.
| blueyes wrote:
| The real trick would be for LLMs, which currently do math very
| poorly, to simply send "math to be done" into a spreadsheet, and
| retrieve the results... (If anyone is aware of an LLM that's
| great at math and physics, pls lmk!!)
| nickpinkston wrote:
| I'm now so reliant on ChatGPT for gSheets, that I'd be almost
| unable to maintain my sheets' absurd formulas without it.
|
| It's also really accelerated my knowledge / skills of the
| specifics of the excel language.
|
| Having an LLM being able to directly read/write at the sheet
| level, instead of just generating formulas for one cell, would be
| amazing.
___________________________________________________________________
(page generated 2024-07-19 23:09 UTC)