[HN Gopher] Show HN: uFuzzy.js - A tiny, efficient fuzzy search ...
___________________________________________________________________
Show HN: uFuzzy.js - A tiny, efficient fuzzy search that doesn't
suck
Hello HN! I became frustrated with the unpredictible/poor match
quality and opaqueness of "relevance scores" in existing fuzzy and
fulltext search libs, so I tried something different and this is
the result. The main selling point is the result quality /
ordering, with best-in-class memory overhead and excellent
performance being bonuses. The API is pretty stable at this point,
but looking for feedback before committing to 1.0. TL;DR The test
corpus is a 4MB json file with 162k words/phrases, so give it a
second for initial download. You can also drag/drop your own
text/json corpus into the UI to try it against your own dataset.
Live demo/compare with a few other libs (there are many more in the
codebase, in various states of completion, WIP):
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF... In
isolation for perf assessment:
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF... To
increase fuzziness and get broader results, try setting intraMax=1
(core) and enable outOfOrder (userland):
https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uF...
Also play with the sortPreset selector to swap out the default
Array.sort() for one in userland that prioritizes typehead-ness
(the resultset remains identical). Still TODO: -
Example of stripping diacritics - Example of using non-latin
charsets - Example of prefix-caching to improve typeahead
perf even further - Example of poor man's document search
(matching multiple object properties) That's all, thanks!
Author : leeoniya
Score : 49 points
Date : 2022-09-30 14:44 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| simonw wrote:
| I'm impressed. 3.54KB minified, great performance, good results.
| moralestapia wrote:
| Thank you for this!
|
| I am also quite frustrated with the current state of full text
| search in the javascript world. All libs I've tried miss the most
| basic examples and their community seems to ignore it. Will give
| yours a try but it already looks much better from the comparison
| page.
|
| Edit: Nope, your lib doesn't seem to handle substitution well
| (THE most common type of typo), so yep, we are back in square one
| ...
| leeoniya wrote:
| right, the core is regexp based and there is no string distance
| assertion of any kind, so this wont be a use case uFuzzy can
| accomodate.
|
| the intro does mention that it would make a poor spellcheck :)
|
| FlexSearch actually does pretty well and can work for you,
| though can get quite memory hungry depending on your
| tokenization settings. try other libs in my compare demo, too.
| there are a lot of options!
| [deleted]
| throwaway14356 wrote:
| nice!
|
| for a language that evolved to manipulate text documents it is
| odd that it has no features of this kind. StartsWith endsWith and
| indexOf seems an amazingly unsophisticated set of tools.
|
| autocomplete ui is also terrible compared to phones?
|
| why?
| chiefalchemist wrote:
| Nice. I didn't know it but I was about to be looking for
| something like this.
|
| How difficult - or not - would it be to use it with
| https://bootstrap-table.com?
| leeoniya wrote:
| not familiar with it, but depends on what you need.
|
| filtering one one column? easy.
|
| filtering multiple columns via AND? also easy.
|
| filtering multiple columns plus highlighting matched parts in
| each? will take more work, but shouldnt be daunting.
| forrestthewoods wrote:
| Nice. Here's the fuzzymatcher I wrote years ago. My main
| implementation was C++ but there's a JS version and web demo.
|
| https://www.forrestthewoods.com/blog/reverse_engineering_sub...
| leeoniya wrote:
| heh, think i have yours in my comparison demos (on phone
| currently, cannot verify)
|
| you can make uFuzzy behave similarly by setting intraMax to
| Infinity (just remove 0 from the field). but the results are
| usually too fuzzy in this config, though it depends on the
| corpus and application (auto-complete vs search)
___________________________________________________________________
(page generated 2022-09-30 23:00 UTC)