[HN Gopher] Microsoft.Recognizers.Text: numbers, units, and date...
___________________________________________________________________
Microsoft.Recognizers.Text: numbers, units, and date/time in
multiple languages
Author : nailer
Score : 84 points
Date : 2023-01-03 17:30 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| revskill wrote:
| There's no Ruby yet.
| s4i wrote:
| Can someone be awesome and ELI5 this? Not sure if I'm dumb or the
| people behind this have a very focused target audience in mind.
| marcosdumay wrote:
| From what I understood, is a library that you can sends strings
| with "twenty thousand ninety eight" on it, and it tells you
| there is the number 20098 there.
| trilobyte wrote:
| +1 for this. I have some sense reading the page but would love
| an ELI5.
| kapp_in_life wrote:
| It seems like its meant for standardizing/parsing freeform
| input, which I guess could be useful for things like
| chatbots?
|
| For example you send the user localized text, "Did you
| receive your package?" then parse the yes/no and continue
| your decision tree with something like https://www.nuget.org/
| packages/Microsoft.Recognizers.Text.Ch...
| UglyToad wrote:
| Used this for a quick proof of concept I was putting together
| where I needed to parse the output of an OCRed date and number
| (price) string (with known country). Seems to work very nicely on
| the sample size of 3 I tried with - so I can't speak to how well
| it works on a more diverse sample set - but it's great to have
| something like this available for free.
|
| In my second job we maintained our own, also Regex based, logic
| for this and it was a nightmare so having a library to do it is
| quite the timesaver!
| [deleted]
| KRAKRISMOTT wrote:
| Are these done via NER or old school heuristics?
| Closi wrote:
| It's all Regex.
|
| > The Patterns folder contains all the regular expressions that
| fulfill the recognizers logic. It's divided by supported
| language.
| sergiotapia wrote:
| jesus! https://github.com/microsoft/Recognizers-
| Text/blob/master/Ja...
| shadowgovt wrote:
| Yep. Regular expressions are great, but they rapidly fall
| over into write-only code for nontrivial applications.
| Closi wrote:
| Worth noting that these are machine-generated from a more
| verbose and clear codebase.
| meindnoch wrote:
| https://github.com/microsoft/Recognizers-Text/blob/master/Pa...
| binarymax wrote:
| This is awesome! A nice replacement for the Duckling service if
| you need fast regex based NER in your code.
| sc4les wrote:
| Finally! Maintaining Duckling as a native Python library isn't
| trivial unfortunately
| stuaxo wrote:
| It could do with a link early on to some docs about what a
| recogniser is.
| xnx wrote:
| Now if only Microsoft would put some of their recognizer smarts
| into Excel so that ZIP codes like "02201" don't get converted
| into "2,201".
| [deleted]
| hnlmorg wrote:
| You already can do that. Albeit it doesn't detect it as a zip
| code specifically, but you can format the cells so that values
| that look like numbers aren't automatically converted into
| numbers.
|
| Excel does have its quirks (CSV handling is one of my pet
| peeves) but it does also have a surprising amount of
| sophisticated adjustable logic that is often hidden in plain
| sight.
| xnx wrote:
| Definitely. I wish there was a global/permanent setting to
| not strip leading zeros. I imagine the design decision might
| have been made decades ago for Lotus 1-2-3 compatibility or
| something, but it's almost certainly cost much more time and
| mistakes than it has saved.
| trekkie1024 wrote:
| This might be the setting you're looking for?
| https://www.howtogeek.com/816620/microsoft-is-finally-
| fixing...
| midasuni wrote:
| That would break the business logic of millions of small
| businesses globally who rely on write-only excel spreadsheets
| to encode their business logic.
| gardenfelder wrote:
| It's way bigger that that: covers not just JS but Java, c# and
| more, and many languages
|
| https://github.com/microsoft/Recognizers-Text
| dang wrote:
| Ok, we've changed the URL to that from
| https://github.com/microsoft/Recognizers-
| Text/tree/master/Ja.... Thanks!
| johns wrote:
| The link should be updated to this.
___________________________________________________________________
(page generated 2023-01-03 23:01 UTC)