https://github.com/leeoniya/uFuzzy Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + By Plan + Enterprise + Teams + Compare all + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} leeoniya / uFuzzy Public * Notifications * Fork 2 * Star 98 A tiny, efficient fuzzy search that doesn't suck License MIT license 98 stars 2 forks Star Notifications * Code * Issues 1 * Pull requests 0 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights leeoniya/uFuzzy This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags 1 branch 3 tags Code * Clone HTTPS GitHub CLI [https://github.com/l] Use Git or checkout with SVN using the web URL. [gh repo clone leeoni] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @leeoniya leeoniya update bench numbers for automated runs ... 73ccf03 Sep 30, 2022 update bench numbers for automated runs 73ccf03 Git stats * 100 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time demos auto-adapt Lunr needle Sep 30, 2022 dist 0.6.0 Sep 30, 2022 src don't boost interRgt Sep 30, 2022 .editorconfig a commitment Sep 16, 2022 .gitignore a commitment Sep 16, 2022 LICENSE Initial commit Sep 16, 2022 README.md update bench numbers for automated runs Sep 30, 2022 bench.png readme bench updates Sep 30, 2022 jsconfig.json a commitment Sep 16, 2022 package.json 0.6.0 Sep 30, 2022 rollup.config.js a commitment Sep 16, 2022 uFuzzy.png shorter image Sep 23, 2022 View code [ ] # mFuzzy Overview Features Demos Installation Node Browser Usage API Options A biased appraisal of similar work Search quality Performance Benchmark README.md # mFuzzy A tiny, efficient, fuzzy search that doesn't suck. This is my fuzzy . There are many like it, but this one is mine. --------------------------------------------------------------------- Overview uFuzzy is a fuzzy search library designed to match a relatively short search phrase (needle) against a large list of short-to-medium phrases (haystack). It might be best described as a more forgiving String.indexOf(). Common use cases are list filtering, auto-complete/ suggest, and title/name/description/filename/function searches. Each uFuzzy match must contain all alpha-numeric characters from the needle in the same sequence, so is likely a poor fit for applications like spellcheck or fulltext/document search. However, its speed leaves ample headroom to match out-of-order terms by combining results from all permutations of the needle. When held just right, it can efficiently match against multiple object properties, too. --------------------------------------------------------------------- Features * Junk-free, high quality results that are dataset-independent. No need to fine-tune indexing options or boosting params to attain some arbitrary quality score cut-off. * Straightforward fuzziness control that can be explained to your grandma in 5min. * Sorting you can reason about and customize using a simple Array.sort() which gets access to each match's stats/counters. There's no composite, black box "score" to understand. * Concise set of options that don't interact in mysterious ways to drastically alter combined behavior. * Fast with low resource usage - there's no index to build, so startup is below 1ms with near-zero memory overhead. Searching a three-term phrase in a 162,000 phrase dataset takes 12ms with in-order terms or 50ms with out-of-order terms. * Micro, with zero dependencies - currently < 4KB min uFuzzy demo --------------------------------------------------------------------- Demos NOTE: The testdata.json file is a diverse 162,000 string/phrase dataset 4MB in size, so first load may be slow due to network transfer. Try refreshing once it's been cached by your browser. First, uFuzzy in isolation to demonstrate its performance. https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy& search=super%20ma Now the same comparison page, booted with fuzzysort, QuickScore, and Fuse.js: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs= uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma Here is the full library list but with a reduced dataset (just hearthstone_750, urls_and_titles_600) to avoid crashing your browser: https://leeoniya.github.io/uFuzzy/demos/compare.html?lists= hearthstone_750,urls_and_titles_600&search=moo --------------------------------------------------------------------- Installation Node npm i @leeoniya/ufuzzy const uFuzzy = require('@leeoniya/ufuzzy'); Browser --------------------------------------------------------------------- Usage uFuzzy works in 3 phases: 1. Filter - This filters the full haystack with a fast RegExp compiled from your needle without doing any extra ops. It returns an array of matched indices in original order. 2. Info - This collects more detailed stats about the filtered matches, such as start offsets, fuzz level, prefix/suffix counters, etc. It also gathers substring match positions for range highlighting. Finally, it filters out any matches that don't conform to the desired prefix/suffix rules. To do all this it re-compiles the needle into two more-expensive RegExps that can partition each match. Therefore, it should be run on a reduced subset of the haystack, usually returned by the Filter phase. The uFuzzy demo is gated at <= 1,000 filtered items, before moving ahead with this phase. 3. Sort - This does an Array.sort() to determine final result order, utilizing the info object returned from the previous phase. A custom sort function can be provided via a uFuzzy option: {sort: (info, haystack, needle) => idxsOrder}. let haystack = [ 'puzzle', 'Super Awesome Thing (now with stuff!)', 'FileName.js', '/feeding/the/catPic.jpg', ]; let needle = 'feed cat'; let opts = {}; let uf = new uFuzzy(opts); // pre-filter let idxs = uf.filter(haystack, needle); // sort/rank only when <= 1,000 items if (idxs.length <= 1e3) { let info = uf.info(idxs, haystack, needle); // order is a double-indirection array (a re-order of the passed-in idxs) // this allows corresponding info to be grabbed directly by idx, if needed let order = uf.sort(info, haystack, needle); // render post-filtered & ordered matches for (let i = 0; i < order.length; i++) { // using info.idx here instead of idxs because uf.info() may have // further reduced the initial idxs based on prefix/suffix rules console.log(haystack[info.idx[order[i]]]); } } else { // render pre-filtered but unordered matches for (let i = 0; i < idxs.length; i++) { console.log(haystack[i]); } } --------------------------------------------------------------------- API A liberally-commented 100 LoC uFuzzy.d.ts file. --------------------------------------------------------------------- Options Options with an inter prefix apply to allowances in between search terms, while those with an intra prefix apply to allowances within each search term. Option Description Default Examples Max number of Searching "cat"... extra chars 0 can match: cat, scat, cat intraMax allowed 0 ch, vacate between each char 1 also matches: cart, chapt within a term er, outcast Max number of Searching "where is"... extra chars Infinity can match: where interMax allowed between Infinity is, where have blah wisdom terms 5 cannot match: where have blah wisdom Partial regexp [a-z\d] matches only for allowed extra alpha-numeric intraChars chars between [a-z\d] (case-insensitive) each char within [\w-] would match a term alpha-numeric, undercore, and hyphen Callback for (term, Do your own thing, maybe... intraFilt excluding results match, - Length diff threshold based on term & index) => - Levenshtein distance match true - Term offset or content Partial regexp . matches all chars interChars for allowed chars . [^a-z\d] would only match between terms whitespace and punctuation Searching "mania"... 0 any - anywhere: romanian Determines 1 loose - whitespace, interLft allowable term 0 punctuation, alpha-num, left boundary case-change transitions: TrackMania, maniac 2 strict - whitespace, punctuation: maniacally Searching "mania"... 0 any - anywhere: romanian Determines 1 loose - whitespace, interRgt allowable term 0 punctuation, alpha-num, right boundary case-change transitions: ManiaStar 2 strict - whitespace, punctuation: mania_foo Default: Search sort, (info, prioritizes full term sort Custom result haystack, matches and char density sorting function needle) => Demo: Typeahead sort, idxsOrder prioritizes start offset and match length --------------------------------------------------------------------- A biased appraisal of similar work This assessment is extremely narrow and, of course, biased towards my use cases, text corpus, and my complete expertise in operating my own library. It is highly probable that I'm not taking full advantage of some feature in other libraries that may significantly improve outcomes along some axis; I welcome improvement PRs from anyone with deeper library knowledge than afforded by my hasty 10min skim over any "Basic usage" example and README doc. Search quality Can-of-worms #1. Before we discuss performance let's talk about search quality, because speed is irrelevant when your results are a strange medly of "Oh yeah!" and "WTF?". Search quality is very subjective. What constitutes a good top match in a "typeahead / auto-suggest" case can be a poor match in a "search / find-all" scenario. Some solutions optimize for the latter, some for the former. It's common to find knobs that skew the results in either direction, but these are often by-feel and imperfect, being little more than a proxy to producing a single, composite match "score". Let's take a look at some matches produced by the most popular fuzzy search library, Fuse.js and some others for which match highlighting is implemented in the demo. Searching for the partial term "twili", we see these results appearing above numerous obvious "twilight" results: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs= uFuzzy,fuzzysort,QuickScore,Fuse&search=twili * twirling * **The total number of received alerts that were invalid. * Tom Clancy's Ghost Recon Wildlands - ASIA Pre-order Standard Uplay Activation * theHunter(tm): Call of the Wild - Bearclaw Lite CB-60 Not only are these poor matches in isolation, but they actually rank higher than literal substrings. Finishing the search term to "twilight", still scores bizzare results higher: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs= uFuzzy,fuzzysort,QuickScore,Fuse&search=twilight * Magic: The Gathering - Duels of the Planeswalkers Wings of Light Unlock * The Wild Eight Some engines do better with partial prefix matches, at the expense of higher startup/indexing cost: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs= uFuzzy,FlexSearch,match-sorter,MiniSearch&search=twili Here, match-sorter returns 1,384 results, but only the first 40 are relevant. How do we know where the cut-off is? https://leeoniya.github.io/uFuzzy/demos/compare.html?libs= uFuzzy,FlexSearch,match-sorter,MiniSearch&search=super Performance Can-of-worms #2. All benchmarks suck, but this one might suck more than others. * I've tried to follow any "best performance" advice when I could find it in each library's docs, but it's a certainty that some stones were left unturned when implementing ~20 different search engines. * Despite my best efforts, result quality is still extremely variable between libraries, and even between search terms. In some cases, results are very poor but the library is very fast; in other cases, the results are better, but the library is quite slow. What use is extreme speed when the search quality is sub-par? This is a subjective, nuanced topic that will surely affect how you interpret these numbers. I consider uFuzzy's search quality second-to-none, so my view of most faster libraries is typically one of quality trade-offs I'm happy not to have made. I encourage you to evaluate the results for all benched search phrases manually to decide this for yourself. * Many fulltext & document-search libraries compared here are designed to work best with exact terms rather than partial matches (which this benchmark is skewed towards). Still, something is better than a hand-wavy YMMV/do-it-yourself dismissal and certainly better than nothing. Benchmark * Each benchmark can be run by changing the libs parameter to the desired library name: https://leeoniya.github.io/uFuzzy/demos/ compare.html?bench&libs=uFuzzy * Results output is suppressed in bench mode to avoid benchmarking the DOM. * Measurements are taken in the Performance secrion of Chrome's DevTools by recording several reloads of the bench page, with forced garbage collection in between. The middle/typical run is used to collect numbers. * The search corpus is 162,000 words and phrases, loaded from a 4MB testdata.json. * The benchmark types and then deletes, character-by-character (every 100ms) the following search terms, triggering a search for each keypress: test, chest, super ma, mania, puzz, prom rem stor, twil. To evaluate the results for each library, or to compare several, simply visit the same page with more libs and without bench: https:// leeoniya.github.io/uFuzzy/demos/compare.html?libs= uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma. profile example There are several metrics evaluated: * Init time - how long it takes to load the library and build any required index to perform searching. * Bench runtime - how long it takes to execute all searches. * Memory required - peak JS heap size used during the bench as well as how much is still retained after a forced garbage collection at the end. * GC cost - how much time is needed to collect garbage at the end (main thread jank) Lib Stars Size Init Search Heap Retained (min) (peak) uFuzzy (try) 0 4KB 0.3ms 620ms 25.5MB 7.5MB Fuse.js (try) 23.5KB 40ms 35432ms 372MB 14MB 14.8k FlexSearch (Light) 8.9k 5.9KB 3600ms 130ms 673MB 316MB (try) Lunr.js (try) 8.2k 29.4KB 1900ms 755ms 450MB 231MB LyraSearch (try) 3.3k match-sorter (try) 3.1k 7.3KB 0.03ms 8900ms 73.4MB 7.4MB fuzzysort (try) 3k 5.5KB 50ms 1400ms 176MB 84MB Wade (try) 3k 4KB 770ms 315ms 436MB 42MB fuzzysearch (try) 2.6k Elasticlunr.js (try 1.9k 18.1KB 1000ms 1630ms 224MB 70MB ) MiniSearch (try) 1.5k 22.4KB 550ms 1900ms 423MB 64MB Fuzzyset (try) 1.3k 2.8KB 3100ms 850ms 660MB 238MB search-index (try) 1.3k LiquidMetal (try) 285 ItemJS (try) 260 FuzzySearch (try) 184 FuzzySearch2 (try) 173 QuickScore (try) 131 9.1KB 26ms 7100ms 171MB 12.3MB fzy (try) 115 fuzzyMatch (try) 0 About A tiny, efficient fuzzy search that doesn't suck Topics search autocomplete fuzzy-search typeahead fuzzy-matching ranking-algorithm filter-list typeahead-search Resources Readme License MIT license Stars 98 stars Watchers 1 watching Forks 2 forks Releases 3 0.6.0 Latest Sep 30, 2022 + 2 releases Packages 0 No packages published Languages * JavaScript 100.0% Footer (c) 2022 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.