[HN Gopher] Regular expression functions in Excel
       ___________________________________________________________________
        
       Regular expression functions in Excel
        
       Author : thunderbong
       Score  : 122 points
       Date   : 2024-05-21 10:57 UTC (3 days ago)
        
 (HTM) web link (insider.microsoft365.com)
 (TXT) w3m dump (insider.microsoft365.com)
        
       | leshenka wrote:
       | truly amazing times: Excel gets regex support (search-in-
       | selected-cells is not available though, as well as search by
       | regex)
        
       | GregoireDev wrote:
       | I am confused between all these AI announcements from Microsoft
       | and the exciting new regex support in "preview" (aka soon to be
       | dropped)
        
       | intended wrote:
       | Good gravy.
       | 
       | Things im terrified of:
       | 
       | - making an error in a financial model - checking a model -
       | making an error in a regexes - checking a regex
       | 
       | Thins I love; - dumb excel stuff
       | 
       | This is a great time to be alive, and at arms length from
       | finance.
       | 
       | To all the people who will deal with the fallout- I salute you.
       | 
       | This is awesome.
        
         | cjk2 wrote:
         | I am terrified of the same things. I got given a _validated_
         | financial model for something ages ago and found a major Excel
         | fuck up in it. When I say validated, it was run through an
         | expensive validation company who charged a lot of money and
         | they didn 't even notice it.
        
           | intended wrote:
           | Oh I know that fear. You can't trust anything anymore.
           | 
           | It's so easy to just miss something. A date function which is
           | off.
           | 
           | Missing something in excel is a big deal for a firm which is
           | mean to do it - and it will always happen. And that's the
           | experts.
           | 
           | I'm betting Someone was too tired, and uninterested and just
           | followed the script.
           | 
           | Hopefully that experience put the fear of God into some
           | analyst and associate.
           | 
           | Pretty much the only thing that I can rely on when it comes
           | to modeling.
           | 
           | Knowing that someone is as terrified as I am and has put the
           | work in to not be embarrassed.
           | 
           | And let's agree - it is work.
           | 
           | I remember seeing a date function used by another firm for
           | the first time.
           | 
           | It too me at least an hour to decipher it.
           | 
           | It was cool, I learnt a lot. But it takes time.
           | 
           | Regexes are cool, and I guess people will be learning a lot.
        
             | BitwiseFool wrote:
             | Almost a decade ago I worked for a bank and in our little
             | corner of the organization I was rapidly becoming the
             | "Excel Guy". While I was happy to help people, the prospect
             | of being pigeonholed into such a role terrified me and I
             | switched jobs. That wasn't the only reason, but it was a
             | major factor.
        
               | bongodongobob wrote:
               | This is why I pretend to not be good at Excel. Once word
               | gets out, it becomes your job.
        
           | Chris2048 wrote:
           | Just to be clear, was the POC financial model implemented in
           | Excel?
           | 
           | If so, did it at least have embedded VB or was it all cell
           | logic?
        
             | cjk2 wrote:
             | There was VBA and cell logic. The sheet was validated by a
             | third party then someone ported it to a proper language
             | (incorrectly). That model was not validated.
        
               | listenallyall wrote:
               | Why do you call it "a major Excel fuck up" when the error
               | arose from the language port?
        
               | Chris2048 wrote:
               | Did they not compare data input/output of the excel
               | original versus the port?
        
               | cjk2 wrote:
               | Nope
        
       | jeisc wrote:
       | never too late to do something that should have been since day
       | one
        
       | blitzar wrote:
       | Yet another meta breaker - this is another move aimed at the
       | casual audience to bring in the party gamers and make the program
       | an absolute mess competitively.
        
         | curiousgal wrote:
         | Solid reference!
        
       | eviks wrote:
       | Could we get some easy aliasing of REGEXREPLACE to reRepl and
       | picking a regex engine that matches the syntax rules you're used
       | to in a the next decade or so?
       | 
       | > Try asking Bing Copilot for regex patterns!
       | 
       | Or maybe embed a cheaper and more reliable solution like
       | https://regex101.com?
        
         | cjblomqvist wrote:
         | You could quite easily store a lambda formula (that just passes
         | on the arguments) as a named function. Very neat trick for
         | organizing and re-using formulas in excel.
        
       | underlines wrote:
       | Coincidentally using the same function names as Google sheets did
       | for years. :D
        
         | filcuk wrote:
         | Why would spreadsheet software try to differ in formula names?
         | You want users to be able to switch to your platform with
         | little friction.
        
           | hbossy wrote:
           | It used to be the other way around, with Excel leading the
           | way.
        
             | brewdad wrote:
             | This is one of those fringe features that 1% of users are
             | clamoring for and 99% of users should never ever use. I'm
             | not surprised it was never at the top of the Excel team's
             | TODO list.
        
               | IshKebab wrote:
               | Yeah the problem is that those 1% of users are probably
               | writing the most critical spreadsheets, and when
               | companies using GSuite think about moving to Office365
               | the more pain it is to migrate spreadsheets the worse for
               | Microsoft.
        
               | eb123 wrote:
               | So why are only 2/3 of the formulas named the same?
               | Excel's new REGEXTEST [1] is REGEXMATCH in Sheets [2]
               | 
               | [1] https://support.microsoft.com/en-us/office/regextest-
               | functio...
               | 
               | [2]https://support.google.com/docs/answer/3098292?hl=en
        
             | eb123 wrote:
             | I've been using Sheets's regex functions for over a year,
             | they're really helpful for basic data transformation.
             | 
             | Excel also got a "split" function much later than Sheets
             | did.
             | 
             | I was excited to read on the Sheets blog that Sheets
             | finally will have a table functionality, which Excel has
             | always had:
             | https://workspaceupdates.googleblog.com/2024/05/tables-in-
             | go...
        
             | jgalt212 wrote:
             | Or Excel mimicking the broken calendar of Lotus 123.
        
               | mnau wrote:
               | And it was an excellent decision, see rationale:
               | https://www.joelonsoftware.com/2000/06/03/strategy-
               | letter-ii...
               | 
               | TBH, they should have dumped during switch to OOXML, that
               | was a wasted opportunity. On the other hand, that's hard
               | sell to business.
        
             | _factor wrote:
             | Good ideas should have a provisional naming period where a
             | standardize name can be agreed on. Anything else leads to
             | fragmentation and lock in due to domain specific
             | terminology.
        
           | Akronymus wrote:
           | Well, excel translates the formula names... (One of the many
           | reasons I switched over to using windows in english, rather
           | than german)
        
       | Shorel wrote:
       | I am waiting for the day I can use Excel as a REST API client, so
       | I write the ID of some record in a cell, and Excel does all the
       | work of calling the API repeatedly and fills the rows and sheets
       | with the required information.
       | 
       | This will kill a few Python jobs and make it a very popular REST
       | client =)
        
         | actionfromafar wrote:
         | How about Excel as an API server, too?
        
           | jodrellblank wrote:
           | You've been able to do that for ages, a client automating
           | Excel through COM and listening for events in Excel from
           | script: https://news.ycombinator.com/item?id=37229850
           | 
           | Microsoft's advice page for wrapping Office in a web
           | application fronted by ASP/ASP.Net:
           | https://support.microsoft.com/en-us/topic/considerations-
           | for...
           | 
           | And that has links to documentation about Excel Web Services
           | of old (on-premises SharePoint days, not sure if it
           | translates to today's cloud), which is SOAP/HTTP:
           | https://learn.microsoft.com/en-us/sharepoint/dev/general-
           | dev...
        
             | WorldMaker wrote:
             | The "new hotness" for today's cloud are all the Excel APIs
             | in the Microsoft Graph:
             | 
             | https://learn.microsoft.com/en-
             | us/graph/api/resources/excel?...
        
           | kevin_thibedeau wrote:
           | MS has had ODBC integration for ages. That is a more rational
           | way to connect Excel with data sources.
        
           | nhatcher wrote:
           | I'm getting closer! https://github.com/ironcalc/IronCalc
        
         | xnx wrote:
         | Can you do this with the =WEBSERVICE() function?
         | https://support.microsoft.com/en-us/office/webservice-functi...
        
           | ComodoHacker wrote:
           | Can I parse response (JSON, XML, whatever) into cells without
           | VBA?
        
             | xnx wrote:
             | With some effort, probably: =FILTERXML()
             | https://support.microsoft.com/en-us/office/filterxml-
             | functio...
             | 
             | Could also do some crude/fragile parsing with
             | aforementioned regex functions.
        
         | jmkni wrote:
         | Knock yourself out!
         | 
         | https://stackoverflow.com/a/158657/969613
        
         | voidUpdate wrote:
         | Until they add python into excel, then there will be python
         | jobs again :P
         | 
         | https://support.microsoft.com/en-gb/office/get-started-with-...
        
           | personalityson wrote:
           | Every self-respecting corporate org will block execution in
           | the cloud + no IDE (you edit the code inside the cells), its
           | dead on arrival
        
         | drivers99 wrote:
         | I've actually done that. I needed to create a lot of json
         | documents and POST them somewhere (it was metadata for some
         | Chef cookbooks), and a lot of the information was similar with
         | a few variations. Excel lets me create what I call a
         | "conductor's score" of that information. (In music, the
         | conductor's score has all the instruments on one page. Each row
         | is a different instrument, and each column is the same measure
         | [point in time] in the music.) In other words, I can easily
         | look for commonalities, differences, and presence of the data
         | as I update it.
         | 
         | I was just going to copy/paste the fields into JSON which is
         | repetitive and error prone, and realized I could automate it.
         | So I wrote VBA to output a json version of that. (The annoying
         | part of writing that in VBA is that JSON data needed a lot of
         | double-quotes, but then you have to use escape sequences for
         | every double-quote.)
         | 
         | Once I had the JSON data generated, I was going to post each
         | one manually, so again, I looked and realized you can POST the
         | data directly from VBA. Added a button to the page and you
         | could update the data in Excel and click a button to POST it.
         | 
         | Of course, once I turned it over to someone else, they were
         | like "what the hell" and started doing it a different way. lol
        
         | deathanatos wrote:
         | I've done this in Google Sheets, with "Google Apps Script", the
         | most horridly named thing I think I've ever worked with.
         | 
         | It's just (utterly ancient) Javascript (I think ECMAScript,
         | technically, since there's no DOM, no browser, etc.), bolted
         | on. I wrote in modern JS and used Babel to transpile the source
         | back into the stone age.
         | 
         | ... I'd _much_ rather write Python.
        
       | marcus0x62 wrote:
       | What's the joke for this - now you have three problems?
        
       | 6c696e7578 wrote:
       | LibreOffice has had regex for donkeys years.
        
         | dylan604 wrote:
         | I read the title, and was actually surprised how in 2024 this
         | is a headline. I'm surprised that this has not been a feature
         | since forever as well.
        
         | alwillis wrote:
         | Apple's Numbers spreadsheet has had regex support for a while
         | as well.
        
           | airstrike wrote:
           | which is probably the one good thing you can say about
           | Apple's Numbers
        
             | animal_spirits wrote:
             | I can say the UX of Apple Numbers is the best, much above
             | the usability I've found of Excel. Much less powerful, but
             | it is much easier to use Apple Numbers
        
               | airstrike wrote:
               | easier to use for power users or for beginners?
        
               | perryprog wrote:
               | As someone who regularly flits between Google Sheets,
               | LibreOffice Calc, and Numbers regularly (with my most
               | time in Google Sheets and Numbers), and as someone who's
               | made some extremely complicated spreadsheets in all
               | three, I have to say I vastly prefer Numbers if I can get
               | away with it. It has a lot of issues that can make some
               | stuff hard fast (no array formulas is a big one), but I
               | find it significantly easier and faster to prototype in
               | Numbers regardless.
               | 
               | I think the main bit I love so much about it is having
               | actual tables instead of the Infinite Grid that most
               | spreadsheet software uses. You get named ranges for free,
               | and it makes semantical sense too, among a good number of
               | other benefits (sheet organization, refactoring, simpler
               | styling...).
               | 
               | There are some really nice things that Google Sheets
               | does, and I've done a few fancy things with App Script
               | which isn't too bad, and I do really like QUERY though I
               | wish it was a bit higher power. I just always find myself
               | missing the UX of Numbers, though.
        
       | zorrn wrote:
       | Am I the only one who distrusts domains with numbers in it?
        
       | curiousgal wrote:
       | No lookahead/lookbehind? Boo
        
       | neallindsay wrote:
       | Kudos using "regex" instead of "regexp", as it is much nicer to
       | say out loud.
        
       | aronhegedus wrote:
       | huh, TIL that regex wasn't a thing in vanilla excel. Thought
       | that'd be a basic thing that's included.
        
         | appleiigs wrote:
         | I realized that long ago... it's incredible that it took this
         | long given how common it is in programming.
        
           | axus wrote:
           | The strategy of freezing VBA and BAT scripting, offering
           | better alternatives, and hoping they'd go away on their own
           | hasn't worked.
        
         | wruza wrote:
         | You never expect basic ergonomics from an ALL CAPS language.
         | They all suck.
        
         | layer8 wrote:
         | You can using VBA functions and _VBScript.RegExp_. This is
         | restricted to Windows though. Microsoft wants to get rid of
         | VBScript, so maybe they are going through the most common use
         | cases in Excel.
        
           | blowski wrote:
           | Have they officially said they want to get rid of VBScript?
        
             | layer8 wrote:
             | Pretty much:
             | https://techcommunity.microsoft.com/t5/windows-it-pro-
             | blog/v...
        
       | against_entropy wrote:
       | Hope the office copilot can do this for me
        
       | louthy wrote:
       | Now you've got two problems
        
       | smartmic wrote:
       | Any information on which standard they have implemented (POSIX
       | BRE, ERE, PCRE, ...)? Since they are Microsoft, I would not be
       | surprised if there is none.
        
         | shawn_w wrote:
         | According to the documentation links in the article, it's using
         | PCRE2.
        
         | sebsebmc wrote:
         | > All regular expressions for this function, as well as
         | REGEXEXTRACT and REGEXREPLACE use the PCRE2 'flavor' of regex.
        
           | smartmic wrote:
           | Okay, thanks! Now next question which comes into my mind: is
           | there info about the regex engine they are using? I would
           | expect there is some (proprietary?) C++ library also used in
           | other MS products or are they even using a FOSS licensed one?
        
             | jandrese wrote:
             | Seems likely that if they explicitly say they're supporting
             | the PCRE2 syntax it is because they are using the BSD
             | licensed libpcre.
             | 
             | Reinventing a regular expression system is very far down on
             | the list of things I'd ever want to do. Those things are
             | filled with dragons and require years of refinement to get
             | the bugs out.
        
       | stockhorn wrote:
       | Now I can finally parse (X)HTML with Excel ;)
        
         | layer8 wrote:
         | Now I need to build a regex for zalgoifying text in Excel.
        
       | mindaslab wrote:
       | Use Libre Office if you value your freedom.
        
         | Tronickle wrote:
         | I love freedom and I love open source, but the problem is that
         | Libre Office is too far from MS Office in terms of UX and
         | functionality.
         | 
         | If your usecases are not complicated, Libre office could work
         | well. Otherwise your efficiency will be behind those that you
         | can achieve with MS Office.
        
         | resource_waste wrote:
         | Oh man Libre Office is trash and so are the
         | leadership/maintainers. No tears.
         | 
         | I once got banned for asking to put 'text size' on the main
         | screen for the powerpoint knockoff.
         | 
         | Text size seems pretty important to have. You shouldn't have to
         | google how to change the text size.
         | 
         | I'm deeply convinced there is a Microsoft plant undermining
         | everything.
        
           | zamadatix wrote:
           | Currently at least text properties and size is on the main
           | screen in the current defaults for Impress and, just like
           | PowerPoint, only made visible and editable when you click
           | something which has a text size. Maybe it was different in
           | the timeframe you were talking about (barring any actual
           | information on the request) but this combined with how
           | abrasive your comment is leaves little reasoning for why it's
           | horrible and casts serious doubts the ban was actually
           | related to asking about interface enhancements even if that's
           | when you were banned.
           | 
           | FWIW I prefer PowerPoint over Impress as well (particularly
           | when it comes to the browser side of things on the go), using
           | one or the other is just not a vendetta of mine.
        
           | pessimizer wrote:
           | > I'm deeply convinced there is a Microsoft plant undermining
           | everything.
           | 
           | It's conspiracy theory thinking, but I'm getting there, too.
           | These things have been too close to being complete
           | replacements for commercial products for so long, but somehow
           | still fall over on problems that have been complained about
           | for a decade or more.
           | 
           | There's not one of them that some massive corporation with
           | sights set on adding a letter to faang couldn't pick up and
           | turn into a legitimate competitor in a month or six, khtml-
           | style. Instead they often sit on moribund subpages of some
           | larger project website, with a blog updated once every year
           | or two.
           | 
           | These projects _have to be_ targets for sabotage by their
           | commercial competitors, just as government initiatives are,
           | just cheaper. For e.g. Adobe it 's less than a rounding error
           | to e.g. spend 500K/year supplying a developer to e.g.
           | Scribus[0] to make sure that it remains difficult to
           | contribute to, or makes bad architecture choices, and that
           | hypothetical developer could be one of its biggest actual
           | contributors.
           | 
           | Maybe it's just because they have to spend a lot of time
           | chasing Gtk, which makes it another redhat problem?
           | 
           | edit: The 95% state of all a lot of these FOSS packages is
           | also evidence that there are zero tech billionaire
           | philanthropists. It would take a total of _one_ of them to
           | grab all of these projects and wrangle them into good form.
           | 
           | [0] _This is an actual hypothetical, I 'm not making an
           | accusation about Scribus._ Between the GIMP and Inkscape,
           | they literally are the only people who made a real effort
           | with color for years. Inkscape openly said that their
           | software was only for making images for the web (as opposed
           | to print), as if there were a rational reason to cripple
           | their product and narrow people's interest in it. Now that
           | deviantart is gone, will anyone care about Inkscape anymore?
           | Will the dopamine hit hobbyists get from sharing generative
           | art cause Inkscape to be totally left behind? Why is it hard
           | to design a form for your office's paperwork in Inkscape, a
           | vector drawing program, if forms are just straight horizontal
           | lines and text? Why aren't they trying to merge with Scribus
           | (and LibreOffice Impress/Draw) and create a complete pdf
           | solution? I have no idea.
           | 
           |  _I 'm being very ranty here, and I do want to say that I
           | very much appreciate the work that people are doing for free,
           | in their spare time, for others._
        
       | personalityson wrote:
       | The reason is MS is gradually deprecating VBScript. Today it's
       | the VBScript.RegExp reference that is being used in VBA.
        
       | nickpeterson wrote:
       | Does excel have gpu accelerated calculations, seems an obvious
       | thing to add if not?
        
         | layer8 wrote:
         | It may appear obvious, but only a very limited subset of real-
         | life spreadsheets could meaningfully take advantage of it.
         | Excel does not support GPU-accelerated calculations.
        
           | nickpeterson wrote:
           | doesnt the back-end of excel have a columnar data model that
           | can store millions of records in a single excel file?
        
             | recursive wrote:
             | The front-end of excel doesn't allow the creation of more
             | than 1048576 rows in a sheet. (1<<20)
        
       | smokefoot wrote:
       | It's interesting to see the give and take between google sheets
       | and excel. Google sheets came on the scene shooting for total
       | backwards compatibility and then proceeded to develop some really
       | interesting innovations. Now we see new features emerging on both
       | sides that are quickly replicated by the other player. Notably
       | off the top of my mind:
       | 
       | * spill formulas - google first, now supported in MSFT
       | 
       | * # notation - MSFT enhancement to spill formulas not yet adopted
       | in google
       | 
       | * regular expressions - google first, now in Excel
       | 
       | * check boxes - google first, now supported in excel
       | 
       | There must be others. I would expect competitive dynamics where
       | each side tries to build extensions that can't be replicated on
       | the other side
        
       | JoeyBananas wrote:
       | Congrats on catching up to Perl! Maybe it took so long because
       | Bill was on vacation on Epstein island.
        
         | recursive wrote:
         | Perl's UI still has some work to do if we're making
         | incomprehensible comparisons.
        
       | tezza wrote:
       | At last the production reliability of Excel meets the clarity of
       | regular expressions and the simplicity of matrix slices
        
       | jupin wrote:
       | Finally, an excel formula I will actually remember
        
       ___________________________________________________________________
       (page generated 2024-05-24 23:01 UTC)