[HN Gopher] Excel: Getting rid of everything except numbers
       ___________________________________________________________________
        
       Excel: Getting rid of everything except numbers
        
       Author : damir
       Score  : 37 points
       Date   : 2022-07-16 08:01 UTC (1 days ago)
        
 (HTM) web link (excel.tips.net)
 (TXT) w3m dump (excel.tips.net)
        
       | blacksqr wrote:
       | Numbers?! You kids are spoiled with your ints and your floats.
       | 
       | When I was young, we had nothing but ones and zeroes!
       | 
       | Sometimes we only had zeroes!
       | 
       | I once wrote a whole database program using nothing but zeroes.
        
       | yosito wrote:
       | As an ONLYOFFICE user, I would just use the JavaScript API to
       | write a macro that iterates through the cells and calls something
       | like Regex.Replace(val, "[^0-9.]", "") on all the values.
        
         | Karellen wrote:
         | In LibreOffice, you can just use the formula
         | =IF(ISNUMBER(A1), A1, NUMBERVALUE(REGEX(A1, "[^0-9]", "",
         | "g")))
         | 
         | to get the equivalent of the column C to paste special from.
         | 
         | No need to invoke any other languages or APIs. Just use the
         | native cell functions.
        
       | coderintherye wrote:
       | Reminds me of one thing I hate about Excel (and Google Sheets):
       | 
       | There is no way to permanently disable scientific notation. I
       | _never_ want scientific notation. It 's the 21st century and we
       | use computers, we don't write numbers on paper and so don't need
       | to shorten our numbers and even for people that do it should be
       | opt-in or at the very least something that can be permanently
       | turned off for users who will never use them.
        
         | mNovak wrote:
         | I guess I'm confused what alternative you want, to display long
         | numbers? Even more so than a sheet of paper, an excel doc has
         | tiny windows to display a huge potential range of values, and
         | something like 0.00000567 simply might not fit.
         | 
         | Seems like if it bugs you, just ctrl+a then set the number
         | format, same way you might set the font and size at the start
         | of a word doc.
        
       | kjellsbells wrote:
       | So the task is to remove all alpha chars from a column containing
       | strings of alphanumerics?
       | 
       | I applaud the author's inventiveness to writing complex formulas
       | and cranking out some VBA, but this is job for sed: copy the
       | column to a text file, sed to strip out alphas, copy back, done.
        
         | dmitriid wrote:
         | Parapgraph two. I advise you to read it:
         | 
         | --- start quote ---
         | 
         | There are a few ways you can approach this problem. Before
         | proceeding with any solution, however, you should make sure
         | that you aren't trying to change something that isn't really
         | broken. For instance, you'll want to make sure that the "E"
         | that appears in the number isn't part of the format of the
         | number--in other words, a designation of exponentiation
         | 
         | --- end quote ---
        
         | emehex wrote:
         | I think the venn diagram of excel and sed users is smaller than
         | imagined...
        
           | ALittleLight wrote:
           | I think this is only true because the circle of sed users is
           | pretty small. Excel is pretty useful for many things. I'm
           | constantly throwing data in Excel to reason with it and graph
           | it.
           | 
           | Assuming that there are many columns and I only wanted to
           | remove the data from one I personally would find it more
           | difficult and awkward to write a sed command to do this than
           | the three or four lines of Python it would take (open the
           | file in pandas, insert a new column where the values are the
           | numbers from the old column, save as csv/xlsx).
        
         | CamperBob2 wrote:
         | Now watch somebody earn a $8B market cap on sed As A Service...
        
       | CraigJPerry wrote:
       | A chance to use my favourite feature of excel, flash fill:
       | 
       | Given:                   A       B         a1b2c3  123
       | 2g34         34f5         4l5p6         1.23E+06
       | 
       | Selecting column B with the one example cell and hitting flash
       | fill yields:                   A         B         a1b2c3    123
       | 2g34      234         34f5      345         4l5p6     456
       | 1.23E+06  12306
       | 
       | I'd still absolutely love to see the code behind this feature.
       | 
       | EDIT: i just noticed an error in the last row, should be 1230000
       | not 12306. Ahh well.
        
         | jodrellblank wrote:
         | > " _I 'd still absolutely love to see the code behind this
         | feature._"
         | 
         | Windows PowerShell has a cmdlet ConvertFrom-String which
         | "supports automatically-generated, example-driven parsing based
         | on the FlashExtract, research work by Microsoft Research."[1].
         | Then when they open-sourced Powershell, that was one of the
         | cmdlets which did not come over and is no longer supported, so
         | no way to see the source code there, sadly.
         | 
         | (Video [2] shows that FlashFill and FlashExtract and Excel /
         | PowerShell uses are related, and he says it generates many
         | possible programs which fit the given examples, and uses
         | 'machine learning based ranking techniques' to choose which one
         | fits best).
         | 
         | [1] https://docs.microsoft.com/en-
         | us/powershell/module/microsoft...
         | 
         | [2] https://www.youtube.com/watch?v=w-k9WjRJvIY
        
           | tadkar wrote:
           | I think this is the paper https://arxiv.org/abs/1204.6079
        
         | drumhead wrote:
         | Ah yes, Flash fill. It always falls at the last hurdle. Nice
         | idea, never quite gets it right.
        
         | [deleted]
        
       | drumhead wrote:
       | Imagine putting that in a spreadsheet that does a key bit of
       | work, then watching as the orginal author leaves and subsequent
       | users have no idea how the magic works, just that it does, until
       | it goes wrong. Which then throws a part of a organisation into
       | complete chaos as they try to figure out a. whats going wrong and
       | b. how to fix it.
       | 
       | They then call in IT who tell them nothing to do with us guv, we
       | dont support your spreadsheets. Which then leads to hastily
       | calling in a "consultant" who'll charge whatever they want to
       | sort of fix the problem, documenting it all of course so that it
       | can be fixed in his absence. However, people leave again, the
       | documentation gets lost and the cycle of dealing with complex
       | Excel spreadsheets starts again.
        
         | extr wrote:
         | I don't think that's quite fair. You run into that sort of
         | problem with traditional software solutions as well, except
         | sometimes even worse because business logic is deep in some
         | esoteric codebase/language with all sorts of unclear
         | dependencies. And instead of being able to ask your buddy in
         | the accounting department to take a look, you need someone who
         | knows Java or Python or whatever, you need a literal software
         | engineer.
         | 
         | I've never been at a company that didn't have at least one
         | Excel guru (well okay, I have. It was a tech startup!). The
         | nice thing about Excel is that it's the same everywhere, widely
         | used, and Well Understood, even if your particular spreadsheet
         | isn't. Personally I would rather be handed a messy spreadsheet
         | than a messy 5k LoC Python codebase.
        
           | drumhead wrote:
           | You tend to find these sorts of spreadsheets everywhere
           | though, not just massive companies but small businesses,
           | charities, non-profits. They dont have the manpower or
           | resources to deal effectively with complexity. Its best not
           | to add anything too complex to to their spreadsheets, but it
           | still happens and organisations grow to reliant on them.
        
             | extr wrote:
             | Very true. It's kind of a lesser evils situation as to
             | where you want to put the complexity. Personally I'd feel
             | more comfortable putting it in Excel than a traditional
             | programming language, for the above reasons. But there are
             | situations where that doesn't make sense either.
        
       | clircle wrote:
       | I would have loaded the spreadsheet into r and extracted the
       | matches to a regex.
        
       ___________________________________________________________________
       (page generated 2022-07-17 23:01 UTC)