[HN Gopher] SpreadsheetLLM: Encoding Spreadsheets for Large Lang...
       ___________________________________________________________________
        
       SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
        
       Author : goplayoutside
       Score  : 175 points
       Date   : 2024-07-17 12:16 UTC (2 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | goplayoutside wrote:
       | https://venturebeat.com/ai/microsofts-new-ai-system-spreadsh...
        
       | pavel_lishin wrote:
       | Ah yes, Excel, the piece of software that already famously
       | mangles data, is now going to be glued to software that also
       | famously mangles data.
       | 
       | Honestly, though, I kind of kid - I _love_ spreadsheets, and if
       | this _actually works_ , it could be interesting. God help whoever
       | needs to troubleshoot the hallucinated results - it's already
       | hard enough to figure out what byzantine knotwork someone created
       | using existing Excel functions, but now we'll have to also guess
       | and second-guess layers of prompts that were used to either
       | generate those same functions, or just generate output that got
       | mulched through some AI black-box.
        
         | Obscurity4340 wrote:
         | Spreadsheets have such a way of making chaotic things more
         | clear. I wonder if there's any work on spreadsheets as a
         | multidimensional thinking tool
        
           | laborcontract wrote:
           | Indeed. I accidentally taught myself linear algebra through
           | spending a ridiculous amount of time in excel. I only
           | realized that after taking a linear algebra class and feeling
           | helpless.. until I mentally remapped the concepts into excel
           | space, after which it all became easy.
        
             | mettamage wrote:
             | Could you give an example?
        
               | laborcontract wrote:
               | Sure. So back in an old finance job I was given a whole
               | bunch of portfolio modeling spreadsheets that was a huge
               | mess of ad hoc column so which drove me nuts so
               | everything started with me learning how to use arrays,
               | which significantly reduced the complexity of basic data
               | transformations.
               | 
               | But then I wanted to analyze all our portfolio data over
               | time so i had to figure out then how handle multi
               | dimensionality in my spreadsheets. Then I figured out how
               | to integrate and transform and reduce portfolio
               | characteristics into sensible components for risk
               | management and portfolio optimizations across different
               | asset classes.
               | 
               | I figured out how to do some absolutely ridiculous stuff
               | in excel, it's tough for me to think of tools that
               | scratch the surface me if l that is nearly as good at
               | helping working through
        
       | surfingdino wrote:
       | I am calmly waiting for the SEC to rip them a new hole the size
       | of Manhattan when hallucinated spreadsheets inevitably make their
       | way into listed companies' reports.
        
         | bongodongobob wrote:
         | Well you'll be waiting a very very long time.
        
         | rsynnott wrote:
         | That would be on the companies (or their auditors), not
         | Microsoft, in general. Clearly, no-one should ever _use_ this,
         | should it ever make it out of research-land, but there's not
         | that much obvious risk to _making_ it as long as they're honest
         | about the risks.
        
           | SkyBelow wrote:
           | If it could spit out the analysis as spreadsheets that used
           | standard formulas and only use the LLM to generate the
           | formulas, it could be verified. Errors would slip through,
           | but no worse than people applying the wrong formula based on
           | a quick internet search that calculates a close but incorrect
           | answer.
        
             | KoolKat23 wrote:
             | Agreed, The workflow could also include a person checking
             | it before submitting, just skimming through for errors.
        
           | RodgerTheGreat wrote:
           | There may not be much risk from a legal culpability
           | perspective if they make the appropriate disclosures
           | somewhere in the depths of a EULA, but even so it is a
           | failure of professional ethics to build tools which are
           | "dangerous at any speed" and inflict them upon the world.
        
             | surfingdino wrote:
             | They want to make back the money they invested in OpenAI.
             | We'll be seeing similar "research" (repackaging ChatGPT)
             | from Microsoft for a while.
        
             | rsynnott wrote:
             | Oh, I don't disagree, and I don't think burying it in the
             | EULA would necessarily be sufficient (especially in Europe,
             | where the courts and regulators have tended to take a dim
             | view of "but we told you, in three-point type on page 473
             | in the middle of the trademark acknowledgements"). But
             | ultimately the blame for using known-unreliable tools is
             | largely on the user.
        
           | michaelmior wrote:
           | > Clearly, no-one should ever _use_ this
           | 
           | ...without validating the results. Otherwise why should we
           | ever use LLMs for anything?
        
             | surfingdino wrote:
             | That's not the marketing message at the moment. I see ads
             | for AI (LLM) powered services and they all say the same
             | thing, "Stressed? Not enough time? Let AI do it faster so
             | you can do more." AI is sold as a tool that can do things
             | faster than a human can and since LLMs do not provide
             | reference information, there is no telling where they got
             | the data from and no way to verify it.
        
               | michaelmior wrote:
               | > LLMs do not provide reference information
               | 
               | Many do, although it's true that's often not the case.
        
             | rsynnott wrote:
             | Validating that a complex spreadsheet is correct is
             | notoriously extremely difficult at the best of times;
             | unfortunately they are about the closest thing to a write-
             | only language in common use, and you really have to front
             | load a lot more care than you do in conventional modern
             | languages. The usual safeguards of testing and code review
             | are essentially absent.
             | 
             | I'm sceptical that anyone really _should_ be using
             | generative AI for anything where correctness matters at
             | all, but spreadsheets in particular seem close to a worst-
             | case scenario.
        
       | bsenftner wrote:
       | I've found that all the top foundation models already understand
       | spreadsheets very well, as well as all the functions, as well as
       | all the common spreadsheet problems people run into using them.
       | The Internet is chock full spreadsheet support forums and
       | tutorials, and the foundation models have all been trained on
       | this data.
       | 
       | With not very much effort, one can explain to an LLM "here is a
       | spreadsheet, formatted as..." which takes about 150 word tokens,
       | and then not much more mental effort in your favorite language to
       | translate an arbitrary spreadsheet into that format, and one gets
       | a very capable LLM interface that can help explain complex
       | arbitrary spreadsheets as well as generate them on request.
       | 
       | I've got finance professionals and attorneys using a tool I wrote
       | doing this to help them understand and debug complex spreadsheets
       | given to them by peers and clients.
        
         | ec109685 wrote:
         | The issue was that before, large spreadsheets would overflow
         | the context so this "compression" technique helps the LLM do
         | more from the same data.
        
           | bsenftner wrote:
           | Which strikes me as an ingenious method of locking in their
           | customers with a proprietary compressed format only their
           | finetuned LLMs can parse.
        
       | fimdomeio wrote:
       | Congratulations everyone, we can now automate the next global
       | financial crisis.
        
         | lainga wrote:
         | Now imagine an LLM trained on LLM web content. We call that an
         | AIslop-squared.
        
       | victor9000 wrote:
       | Why on earth would you task a non-deterministic technology with
       | data persistence?
        
         | bubblyworld wrote:
         | The universe is fundamentally a non-deterministic technology,
         | friend. We do what we can =D
        
       | ffhhj wrote:
       | Waiting for the hallucinate formula:
       | 
       | =HAL(9000)
        
       | chatmasta wrote:
       | At Databricks summit there was a nice presentation [0] by the CEO
       | of V7 labs who showed a demo of their LLM + Spreadsheet product.
       | 
       | The kneejerk reaction of "ugh, LLM and spreadsheet?!" is
       | understandable, but I encourage you to watch that demo. It makes
       | clear some obvious potentials of LLMs in spreadsheets. They can
       | basically be an advanced autofill. If you've used CoPilot in
       | VSCode, you understand the satisfaction of feeling like an LLM is
       | thinking one step ahead of you. This should be achievable in
       | spreadsheets as well.
       | 
       | [0] https://youtube.com/watch?v=0SVilfbn-HY&t=1251 (queued to
       | demo at 20:51)
        
         | Kiro wrote:
         | Thank you. Tired of the usual jokers in threads like this.
         | Right now the majority of comments are all sarcastic snark.
        
           | nforgerit wrote:
           | We jokers are tired as well
        
           | grape_surgeon wrote:
           | I'm new here; Hacker News is supposed to avoid the modern
           | Reddit trap but feel it often falls into it. The topics are
           | more relevant to me but the comments are often unbearably
           | cynical and excessively dismissive
        
             | bubblyworld wrote:
             | Yeah, I feel like there's a real culture problem on HN
             | right now, especially for topics that have received any
             | degree of hype (AI and crypto, mainly). People can be
             | excessively rude for no reason if you express an outsider
             | view. Gets to the point where I can't trust anyone here to
             | engage with me in good faith (the exceptions are a welcome
             | blessing).
             | 
             | I've come close to blocking the site on my network many
             | times but it's an absolute goldmine of interesting info
             | too... I'm not really sure if there's a solution, other
             | than to practice emotionally disengaging from internet
             | discussions.
        
         | delusional wrote:
         | I don't think I understand that demo. It shows him using some
         | built-in workflow thing (which isn't generally considered a
         | core part of a spreadsheet) and then asks some LLM about the
         | total price (I guess asking it to do math, which LLM's are
         | notoriously bad at), but instead it looks like he gets some
         | responses telling him what the term "total price" means, in
         | prose that doesn't fit in the cells.
         | 
         | What was i supposed to take away from that demo?
        
           | jemmyw wrote:
           | The llm doesn't do the math. It outputs something the app
           | then interrupted into a cell configuration with sums filled
           | in. This is an area where llms can be quite good, you type
           | out how you want to report the data like "give me subtotals
           | of column F at every month of the date column E and a grand
           | total of F at the bottom"
           | 
           | Except sometimes you can't seem to stop the prose.
        
         | bongodongobob wrote:
         | The comments here are absolute existential crisis. "Only I do
         | spreadsheets good!" I agree, this looks really neat.
        
         | ssl-3 wrote:
         | Seems like a reasonably-cromulent use-case -- or at least, it
         | fits in with my own uses of LLMs.
         | 
         | I suck at spreadsheets. I know they can do both useful and
         | amazing things, but my daily life does not revolve around
         | spreadsheets and I simply do not understand most of the syntax
         | and operations required to make even fairly basic things work.
         | It requires a lot of time and effort for me to get simple
         | things done with a spreadsheet on the rare occasion that I need
         | to manipulate one.
         | 
         | There are things in life that I am very good at; spreadsheets
         | are simply not amongst them.
         | 
         | But _do_ I know what I want, and I generally even have a
         | ballpark idea of what the results should look like, and how to
         | calculate it by hand [ _horror_ ]. I just don't always know how
         | to articulate it in a way that LibreOffice or Google Sheets or
         | whatever can understand.
         | 
         | LLMs have helped to bridge that gap for me, but it's a pain in
         | the ass: I have to be very careful with the context that I give
         | the LLM (because garbage in is garbage out).
         | 
         | But in the demo, the LLM has the context already. This skips a
         | ton of preamble setup steps to get the LLM ready to provide
         | potentially-useful work, and moves closer to just making a
         | request and getting the desired output.
         | 
         | Having one unified interface saves even more steps.
         | 
         | (And no, this isn't for everyone.)
        
         | __loam wrote:
         | > you understand the satisfaction of feeling like an LLM is
         | thinking one step ahead of you
         | 
         | Yes, "satisfaction"
        
         | userbinator wrote:
         | _If you've used CoPilot in VSCode, you understand the
         | satisfaction of feeling like an LLM is thinking one step ahead
         | of you._
         | 
         | Tried it once. Didn't get "satisfaction"; instead felt deeply
         | irritated by the "backseat driver". Maybe it works better if
         | you're just churning out mediocre boilerplate.
        
           | Closi wrote:
           | I'm churning out mediocre boilerplate here and it's working
           | great! Managing to build things crazy fast.
        
           | rsynnott wrote:
           | Yeah, for me it feels, as a proposition, and having used it
           | briefly before turning it off, like "what if you could pair
           | program with an ultra-confident, yet dangerously incompetent,
           | intern, forever"?
        
             | dr_dshiv wrote:
             | ... with unlimited stamina, patience and capacity for
             | negative feedback. If it was forever, you'd probably learn
             | to take advantage of that resource!
        
           | thomashop wrote:
           | Like any tool, you need to learn how to use it and iterate
           | with it. Trying it once is not enough.
        
         | mwadhera wrote:
         | See also: https://matrices.app
        
         | usrbinbash wrote:
         | > If you've used CoPilot in VSCode, you understand the
         | satisfaction of feeling like an LLM is thinking one step ahead
         | of you
         | 
         | That "satisfaction" vanished pretty damn quickly, once I
         | realised that I have often more work correcting the stuff so
         | generated than I would have had writing it myself in the first
         | place.
         | 
         | LLMs in programming absolutely have their uses, Lots of them
         | actually, and I don't wanna miss them. But they are not
         | "thinking ahead" of the code I write, not by a long shot.
        
           | nunodonato wrote:
           | They are with me. And with many other people. Perhaps it's
           | the quality of your code that is preventing better
           | completions.(Or the lang you use?)
           | 
           | There are a few things that really help the AI to understand
           | what you want to do, otherwise it might struggle and come up
           | with not so good code.
           | 
           | Not to say it gets it right everytime, but definitely often
           | enough for me not to even consider turning it off. The time
           | save has been tremendous.
        
             | EGreg wrote:
             | I can see vague blaming the person becoming more the norm
             | when LLMs are responsible for precrime and restrictions
             | etc.
             | 
             | "Oh you couldn't take a train to work? Must have been
             | something you did, the Palantir is usually great and helps
             | our society. It always works great for me and my friends."
        
               | bongodongobob wrote:
               | Nah, that's not it. It's more like complaining that
               | someone has to drive the train and therefore "is
               | completely useless to me, it can't read my mind so it's
               | trash".
        
           | ramraj07 wrote:
           | I really don't know what the detractors of Copilot are
           | writing, the next StuxNet? Whether I'm doing stupid EDA or
           | writing some fairly original frameworks Copilot has always
           | been useful to me writing both boilerplate code as well as
           | completing more esoteric logic. There's definitely a slight
           | modification I have made in how I type (making variable names
           | obvious, stopping at the right moment knowing copilot will
           | complete the next etc) but if anything it has made me a
           | cleaner programmer who writes 50% less characters at the
           | minimum.
        
             | fhd2 wrote:
             | While it could be that you and them work on different kinds
             | of code, I believe it's just as likely that you're just
             | different people with different experience and
             | expectations.
             | 
             | A "wow, that's a great start" to one could be a "damn
             | there's an issue I need to fix with this" to another. To
             | some, that great start really makes them more productive.
             | To others that 80% solution slows them down.
             | 
             | For some reason, programmers just love to be zealots and
             | run flamewars to promote their tool of choice. Probably
             | because they genuinely experience it's fantastic for them,
             | and the other guy's tool wasn't, and they want them to see
             | the light, too.
             | 
             | I prefer to judge people on the quality of their output,
             | not the tools they use to produce it. There's evidently
             | great code being written with uEmacs (Linux, Git), and I
             | assume that, all the way on the other end of the spectrum,
             | there's probably great code being written with VSCode and
             | Copilot.
        
             | hlfshell wrote:
             | In my experience using LLMs like CoPilot:
             | 
             | Web server work in Go, Python, and front end work in
             | JavaScript - it's pretty good. Only when I try to do
             | something truly application specific that it starts to get
             | tripped up.
             | 
             | Multi threading python work - not bad, but occasionally
             | makes some mistakes in understanding scope or appropriate
             | safe memory access, which can be show stopping.
             | 
             | Deep learning, computer vision work - it gets some common
             | pytorch patterns down pat, and basic computer vision work
             | that you'd typically find tutorials for but struggles on
             | any unique task.
             | 
             | Reinforcement learning for simulated robotics environments
             | - it really struggles to keep up.
             | 
             | ROS2 - fantastic for most of the framework code for
             | robotics projects, really great and recommended for someone
             | getting used to ROS.
             | 
             | C++ work - REALLY struggles with anything beyond basic
             | stuff. I was working with threading the other day and
             | turned it off as all of its suggestions would never compile
             | let alone do anything sensible.
        
           | Kiro wrote:
           | That's not my experience at all. I very seldom need to
           | correct anything Copilot outputs.
        
           | solumunus wrote:
           | That's because you were using CoPilot. Try a much better
           | option such as Supermaven. I unsubscribed from CoPilot for
           | similar reasons but after using Supermaven for 3-4 months I
           | will never cancel this subscription unless something better
           | comes along. It's way more accurate and way faster.
        
           | pydry wrote:
           | >LLMs in programming absolutely have their uses
           | 
           | Absolutely. LLMs let you make more programming mistakes
           | faster than any other invention with the possible exceptions
           | of handguns and Tequila.
           | 
           | To be fair, it is also really good at spewing out industrial
           | levels of boilerplate. As we all know, 99% of the effort in
           | coding is the writing of code and the more boilerplate in
           | your code base the better. /s
        
         | Havoc wrote:
         | It has some challenges ahead still:
         | 
         | https://i.redd.it/xr8uxqayv68d1.jpeg
         | 
         | For demos sure, but I'm not super hopeful on this frankly. LLMs
         | are inherently about generating next char in sequential order.
         | Nothing about real world spreadsheets is linear like that -
         | they're all interlinked chaos.
        
           | Kiro wrote:
           | As the comments correctly point out that image is like 10
           | years old and has nothing to do with LLMs.
        
         | rsynnott wrote:
         | > If you've used CoPilot in VSCode, you understand the
         | satisfaction of feeling like an LLM is thinking one step ahead
         | of you.
         | 
         | I'm not sure it's so much 'satisfaction'; it felt more like I
         | was having a stroke until I turned it off. Its suggestions
         | were, like, plausibly code, but completely contextually
         | nonsensical in general; frankly IntelliJ's old autocomplete-
         | with-guessing functionality was better, as it at least _knows_
         | a certain amount about the codebase. Now, this was on a very
         | large old codebase; no doubt it's better if writing trivial new
         | things.
        
         | dimal wrote:
         | > If you've used CoPilot in VSCode, you understand the
         | satisfaction of feeling like an LLM is thinking one step ahead
         | of you
         | 
         | I did not get that feeling from CoPilot. I usually got the
         | feeling that it was interrupting me to complete my thought but
         | getting it wrong. It was incredibly annoying and distracting.
         | Instead of helping me to think it was making it harder to
         | think. Pair programming with an LLM has been great. Better than
         | with most humans. But autocomplete sucks for me.
        
       | christianqchung wrote:
       | Goodness, I've been making a joke that AI companies are going to
       | spend 500 billion dollars to make spreadsheet generators since
       | 2023, and now it's becoming real. Gemini has a limited form of
       | this too.
        
       | galaxyLogic wrote:
       | How will it work?
       | 
       | I open an Excel spreadsheet and also the AI Copilot. Then
       | whenever I want to do something with Excel like "Show me which
       | cells have formulas" CoPilot will interact with Excel and issue
       | some command I cannot remember to do that for me?
       | 
       | Menus are good but often hard to navigate and find. So the
       | CoPilot can give me a whole new (prompt-based) user-interface to
       | any MS-application? Is that how it works?
        
       | skywhopper wrote:
       | Uhh, this is a paper about how to compress spreadsheet data to
       | fit inside an LLM's token limits, including such novel approaches
       | as ignoring exact values of numbers, meaning of data types, and
       | any context outside of a detected table of values.
       | 
       | The paper doesn't speak at all to actual uses of this approach,
       | but that doesn't stop the article writer from assuming this is
       | probably a big step towards automated tools that analyze
       | spreadsheet data for non-numerically inclined users.
       | 
       | This is not that.
        
         | dang wrote:
         | "The article writer" refers to
         | https://news.ycombinator.com/item?id=40996429 - we merged that
         | thread hither.
         | 
         |  _supporting correct context for comments everywhere_
        
       | jimkoen wrote:
       | @ludicity 's head is going to explode.
        
       | cyanydeez wrote:
       | DNA rEsearchers HAD TO STOP USING their preferrdd letter
       | sequences because excel would autocorrect its typs.
       | 
       | This.will not end. Well
        
       | ilaksh wrote:
       | Is there a github?
        
         | janpmz wrote:
         | It should become a standard for papers to publish their code.
         | After all it's a publication.
        
       | IanCal wrote:
       | I love the deep technical discussions on HN, and I'm disappointed
       | to see anything AI related start to just resemble Reddit threads
       | of people with knee jerk reactions to the title.
       | 
       | This is interesting, it's about how you can represent
       | spreadsheets to llms.
        
         | nunodonato wrote:
         | Yes, for some reason we really have an established hate club
         | around here. And the comments are usually the same thing
         | everytime
        
       | mitjam wrote:
       | Spreadsheets can fill the gap between ad-hoc prompting/prompt
       | workbooks and custom software for special business tasks.
       | 
       | By using a prompt function like LABS.GENERATIVEAI in Excel you
       | can create solutions that combine calculations, data, and
       | Generative AI. In my experience, transforming data to and from
       | CSV works best for prompting in spreadsheets. Getting data to and
       | from CSV format can be done with other spreadsheet functions.
       | 
       | I've created a book and course
       | (https://mitjamartini.com/resources/ai-engineering/ebooks/han...)
       | that teaches how to do this (both more beginner level). Just
       | working through the examples or the examples provided by
       | Anthropic for Claude for Sheets should be enough to get going.
        
       | blueyes wrote:
       | The real trick would be for LLMs, which currently do math very
       | poorly, to simply send "math to be done" into a spreadsheet, and
       | retrieve the results... (If anyone is aware of an LLM that's
       | great at math and physics, pls lmk!!)
        
       | nickpinkston wrote:
       | I'm now so reliant on ChatGPT for gSheets, that I'd be almost
       | unable to maintain my sheets' absurd formulas without it.
       | 
       | It's also really accelerated my knowledge / skills of the
       | specifics of the excel language.
       | 
       | Having an LLM being able to directly read/write at the sheet
       | level, instead of just generating formulas for one cell, would be
       | amazing.
        
       ___________________________________________________________________
       (page generated 2024-07-19 23:09 UTC)