[HN Gopher] Microsoft.Recognizers.Text: numbers, units, and date...
       ___________________________________________________________________
        
       Microsoft.Recognizers.Text: numbers, units, and date/time in
       multiple languages
        
       Author : nailer
       Score  : 84 points
       Date   : 2023-01-03 17:30 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | revskill wrote:
       | There's no Ruby yet.
        
       | s4i wrote:
       | Can someone be awesome and ELI5 this? Not sure if I'm dumb or the
       | people behind this have a very focused target audience in mind.
        
         | marcosdumay wrote:
         | From what I understood, is a library that you can sends strings
         | with "twenty thousand ninety eight" on it, and it tells you
         | there is the number 20098 there.
        
         | trilobyte wrote:
         | +1 for this. I have some sense reading the page but would love
         | an ELI5.
        
           | kapp_in_life wrote:
           | It seems like its meant for standardizing/parsing freeform
           | input, which I guess could be useful for things like
           | chatbots?
           | 
           | For example you send the user localized text, "Did you
           | receive your package?" then parse the yes/no and continue
           | your decision tree with something like https://www.nuget.org/
           | packages/Microsoft.Recognizers.Text.Ch...
        
       | UglyToad wrote:
       | Used this for a quick proof of concept I was putting together
       | where I needed to parse the output of an OCRed date and number
       | (price) string (with known country). Seems to work very nicely on
       | the sample size of 3 I tried with - so I can't speak to how well
       | it works on a more diverse sample set - but it's great to have
       | something like this available for free.
       | 
       | In my second job we maintained our own, also Regex based, logic
       | for this and it was a nightmare so having a library to do it is
       | quite the timesaver!
        
       | [deleted]
        
       | KRAKRISMOTT wrote:
       | Are these done via NER or old school heuristics?
        
         | Closi wrote:
         | It's all Regex.
         | 
         | > The Patterns folder contains all the regular expressions that
         | fulfill the recognizers logic. It's divided by supported
         | language.
        
           | sergiotapia wrote:
           | jesus! https://github.com/microsoft/Recognizers-
           | Text/blob/master/Ja...
        
             | shadowgovt wrote:
             | Yep. Regular expressions are great, but they rapidly fall
             | over into write-only code for nontrivial applications.
        
             | Closi wrote:
             | Worth noting that these are machine-generated from a more
             | verbose and clear codebase.
        
         | meindnoch wrote:
         | https://github.com/microsoft/Recognizers-Text/blob/master/Pa...
        
       | binarymax wrote:
       | This is awesome! A nice replacement for the Duckling service if
       | you need fast regex based NER in your code.
        
         | sc4les wrote:
         | Finally! Maintaining Duckling as a native Python library isn't
         | trivial unfortunately
        
       | stuaxo wrote:
       | It could do with a link early on to some docs about what a
       | recogniser is.
        
       | xnx wrote:
       | Now if only Microsoft would put some of their recognizer smarts
       | into Excel so that ZIP codes like "02201" don't get converted
       | into "2,201".
        
         | [deleted]
        
         | hnlmorg wrote:
         | You already can do that. Albeit it doesn't detect it as a zip
         | code specifically, but you can format the cells so that values
         | that look like numbers aren't automatically converted into
         | numbers.
         | 
         | Excel does have its quirks (CSV handling is one of my pet
         | peeves) but it does also have a surprising amount of
         | sophisticated adjustable logic that is often hidden in plain
         | sight.
        
           | xnx wrote:
           | Definitely. I wish there was a global/permanent setting to
           | not strip leading zeros. I imagine the design decision might
           | have been made decades ago for Lotus 1-2-3 compatibility or
           | something, but it's almost certainly cost much more time and
           | mistakes than it has saved.
        
             | trekkie1024 wrote:
             | This might be the setting you're looking for?
             | https://www.howtogeek.com/816620/microsoft-is-finally-
             | fixing...
        
         | midasuni wrote:
         | That would break the business logic of millions of small
         | businesses globally who rely on write-only excel spreadsheets
         | to encode their business logic.
        
       | gardenfelder wrote:
       | It's way bigger that that: covers not just JS but Java, c# and
       | more, and many languages
       | 
       | https://github.com/microsoft/Recognizers-Text
        
         | dang wrote:
         | Ok, we've changed the URL to that from
         | https://github.com/microsoft/Recognizers-
         | Text/tree/master/Ja.... Thanks!
        
         | johns wrote:
         | The link should be updated to this.
        
       ___________________________________________________________________
       (page generated 2023-01-03 23:01 UTC)