https://github.com/leeoniya/uFuzzy

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
      + Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
      + By Plan
      + Enterprise
      + Teams
      + Compare all
      + By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
      + Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
      + Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this user All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
leeoniya / uFuzzy Public

  * Notifications
  * Fork 2
  * Star 98

A tiny, efficient fuzzy search that doesn't suck

License

MIT license
98 stars 2 forks
Star
Notifications

  * Code
  * Issues 1
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

leeoniya/uFuzzy

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags
1 branch 3 tags
Code

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/l]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone leeoni]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@leeoniya
leeoniya update bench numbers for automated runs
...
73ccf03 Sep 30, 2022
update bench numbers for automated runs
73ccf03

Git stats

  * 100 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
demos
auto-adapt Lunr needle
Sep 30, 2022
dist
0.6.0
Sep 30, 2022
src
don't boost interRgt
Sep 30, 2022
.editorconfig
a commitment
Sep 16, 2022
.gitignore
a commitment
Sep 16, 2022
LICENSE
Initial commit
Sep 16, 2022
README.md
update bench numbers for automated runs
Sep 30, 2022
bench.png
readme bench updates
Sep 30, 2022
jsconfig.json
a commitment
Sep 16, 2022
package.json
0.6.0
Sep 30, 2022
rollup.config.js
a commitment
Sep 16, 2022
uFuzzy.png
shorter image
Sep 23, 2022
View code
[                    ]
# mFuzzy Overview Features Demos Installation Node Browser Usage API
Options A biased appraisal of similar work Search quality Performance
Benchmark

README.md

 # mFuzzy

A tiny, efficient, fuzzy search that doesn't suck. This is my fuzzy
. There are many like it, but this one is mine.

---------------------------------------------------------------------

 Overview

uFuzzy is a fuzzy search library designed to match a relatively short
search phrase (needle) against a large list of short-to-medium
phrases (haystack). It might be best described as a more forgiving
String.indexOf(). Common use cases are list filtering, auto-complete/
suggest, and title/name/description/filename/function searches.

Each uFuzzy match must contain all alpha-numeric characters from the
needle in the same sequence, so is likely a poor fit for applications
like spellcheck or fulltext/document search. However, its speed
leaves ample headroom to match out-of-order terms by combining
results from all permutations of the needle. When held just right, it
can efficiently match against multiple object properties, too.

---------------------------------------------------------------------

 Features

  * Junk-free, high quality results that are dataset-independent. No
    need to fine-tune indexing options or boosting params to attain
    some arbitrary quality score cut-off.
  * Straightforward fuzziness control that can be explained to your
    grandma in 5min.
  * Sorting you can reason about and customize using a simple
    Array.sort() which gets access to each match's stats/counters.
    There's no composite, black box "score" to understand.
  * Concise set of options that don't interact in mysterious ways to
    drastically alter combined behavior.
  * Fast with low resource usage - there's no index to build, so
    startup is below 1ms with near-zero memory overhead. Searching a
    three-term phrase in a 162,000 phrase dataset takes 12ms with
    in-order terms or 50ms with out-of-order terms.
  * Micro, with zero dependencies - currently < 4KB min

uFuzzy demo

---------------------------------------------------------------------

 Demos

NOTE: The testdata.json file is a diverse 162,000 string/phrase
dataset 4MB in size, so first load may be slow due to network
transfer. Try refreshing once it's been cached by your browser.

First, uFuzzy in isolation to demonstrate its performance.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&
search=super%20ma

Now the same comparison page, booted with fuzzysort, QuickScore, and
Fuse.js:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=
uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma

Here is the full library list but with a reduced dataset (just
hearthstone_750, urls_and_titles_600) to avoid crashing your browser:

https://leeoniya.github.io/uFuzzy/demos/compare.html?lists=
hearthstone_750,urls_and_titles_600&search=moo

---------------------------------------------------------------------

 Installation

 Node

npm i @leeoniya/ufuzzy

const uFuzzy = require('@leeoniya/ufuzzy');

 Browser

<script src="./dist/uFuzzy.iife.min.js"></script>

---------------------------------------------------------------------

 Usage

uFuzzy works in 3 phases:

 1. Filter - This filters the full haystack with a fast RegExp
    compiled from your needle without doing any extra ops. It returns
    an array of matched indices in original order.
 2. Info - This collects more detailed stats about the filtered
    matches, such as start offsets, fuzz level, prefix/suffix
    counters, etc. It also gathers substring match positions for
    range highlighting. Finally, it filters out any matches that
    don't conform to the desired prefix/suffix rules. To do all this
    it re-compiles the needle into two more-expensive RegExps that
    can partition each match. Therefore, it should be run on a
    reduced subset of the haystack, usually returned by the Filter
    phase. The uFuzzy demo is gated at <= 1,000 filtered items,
    before moving ahead with this phase.
 3. Sort - This does an Array.sort() to determine final result order,
    utilizing the info object returned from the previous phase. A
    custom sort function can be provided via a uFuzzy option: {sort:
    (info, haystack, needle) => idxsOrder}.

let haystack = [
    'puzzle',
    'Super Awesome Thing (now with stuff!)',
    'FileName.js',
    '/feeding/the/catPic.jpg',
];

let needle = 'feed cat';

let opts = {};

let uf = new uFuzzy(opts);

// pre-filter
let idxs = uf.filter(haystack, needle);

// sort/rank only when <= 1,000 items
if (idxs.length <= 1e3) {
  let info = uf.info(idxs, haystack, needle);

  // order is a double-indirection array (a re-order of the passed-in idxs)
  // this allows corresponding info to be grabbed directly by idx, if needed
  let order = uf.sort(info, haystack, needle);

  // render post-filtered & ordered matches
  for (let i = 0; i < order.length; i++) {
    // using info.idx here instead of idxs because uf.info() may have
    // further reduced the initial idxs based on prefix/suffix rules
    console.log(haystack[info.idx[order[i]]]);
  }
}
else {
  // render pre-filtered but unordered matches
  for (let i = 0; i < idxs.length; i++) {
    console.log(haystack[i]);
  }
}

---------------------------------------------------------------------

 API

A liberally-commented 100 LoC uFuzzy.d.ts file.

---------------------------------------------------------------------

 Options

Options with an inter prefix apply to allowances in between search
terms, while those with an intra prefix apply to allowances within
each search term.

  Option      Description      Default             Examples
           Max number of                  Searching "cat"...
           extra chars                    0 can match: cat, scat, cat
intraMax   allowed           0            ch, vacate
           between each char              1 also matches: cart, chapt
           within a term                  er, outcast
           Max number of                  Searching "where is"...
           extra chars                    Infinity can match: where
interMax   allowed between   Infinity     is, where have blah wisdom
           terms                          5 cannot match: where have
                                          blah wisdom
           Partial regexp                 [a-z\d] matches only
           for allowed extra              alpha-numeric
intraChars chars between     [a-z\d]      (case-insensitive)
           each char within               [\w-] would match
           a term                         alpha-numeric, undercore,
                                          and hyphen
           Callback for      (term,       Do your own thing, maybe...
intraFilt  excluding results match,       - Length diff threshold
           based on term &   index) =>    - Levenshtein distance
           match             true         - Term offset or content
           Partial regexp                 . matches all chars
interChars for allowed chars .            [^a-z\d] would only match
           between terms                  whitespace and punctuation
                                          Searching "mania"...
                                          0 any - anywhere: romanian
           Determines                     1 loose - whitespace,
interLft   allowable term    0            punctuation, alpha-num,
           left boundary                  case-change transitions:
                                          TrackMania, maniac
                                          2 strict - whitespace,
                                          punctuation: maniacally
                                          Searching "mania"...
                                          0 any - anywhere: romanian
           Determines                     1 loose - whitespace,
interRgt   allowable term    0            punctuation, alpha-num,
           right boundary                 case-change transitions:
                                          ManiaStar
                                          2 strict - whitespace,
                                          punctuation: mania_foo
                                          Default: Search sort,
                             (info,       prioritizes full term
sort       Custom result     haystack,    matches and char density
           sorting function  needle) =>   Demo: Typeahead sort,
                             idxsOrder    prioritizes start offset
                                          and match length

---------------------------------------------------------------------

 A biased appraisal of similar work

This assessment is extremely narrow and, of course, biased towards my
use cases, text corpus, and my complete expertise in operating my own
library. It is highly probable that I'm not taking full advantage of
some feature in other libraries that may significantly improve
outcomes along some axis; I welcome improvement PRs from anyone with
deeper library knowledge than afforded by my hasty 10min skim over
any "Basic usage" example and README doc.

 Search quality

Can-of-worms #1.

Before we discuss performance let's talk about search quality,
because speed is irrelevant when your results are a strange medly of
"Oh yeah!" and "WTF?".

Search quality is very subjective. What constitutes a good top match
in a "typeahead / auto-suggest" case can be a poor match in a "search
/ find-all" scenario. Some solutions optimize for the latter, some
for the former. It's common to find knobs that skew the results in
either direction, but these are often by-feel and imperfect, being
little more than a proxy to producing a single, composite match
"score".

Let's take a look at some matches produced by the most popular fuzzy
search library, Fuse.js and some others for which match highlighting
is implemented in the demo.

Searching for the partial term "twili", we see these results
appearing above numerous obvious "twilight" results:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=
uFuzzy,fuzzysort,QuickScore,Fuse&search=twili

  * twirling
  * **The total number of received alerts that were invalid.
  * Tom Clancy's Ghost Recon Wildlands - ASIA Pre-order Standard
    Uplay Activation
  * theHunter(tm): Call of the Wild - Bearclaw Lite CB-60

Not only are these poor matches in isolation, but they actually rank
higher than literal substrings.

Finishing the search term to "twilight", still scores bizzare results
higher:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=
uFuzzy,fuzzysort,QuickScore,Fuse&search=twilight

  * Magic: The Gathering - Duels of the Planeswalkers Wings of Light
    Unlock
  * The Wild Eight

Some engines do better with partial prefix matches, at the expense of
higher startup/indexing cost:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=
uFuzzy,FlexSearch,match-sorter,MiniSearch&search=twili

Here, match-sorter returns 1,384 results, but only the first 40 are
relevant. How do we know where the cut-off is?

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=
uFuzzy,FlexSearch,match-sorter,MiniSearch&search=super

 Performance

Can-of-worms #2.

All benchmarks suck, but this one might suck more than others.

  * I've tried to follow any "best performance" advice when I could
    find it in each library's docs, but it's a certainty that some
    stones were left unturned when implementing ~20 different search
    engines.
  * Despite my best efforts, result quality is still extremely
    variable between libraries, and even between search terms. In
    some cases, results are very poor but the library is very fast;
    in other cases, the results are better, but the library is quite
    slow. What use is extreme speed when the search quality is
    sub-par? This is a subjective, nuanced topic that will surely
    affect how you interpret these numbers. I consider uFuzzy's
    search quality second-to-none, so my view of most faster
    libraries is typically one of quality trade-offs I'm happy not to
    have made. I encourage you to evaluate the results for all
    benched search phrases manually to decide this for yourself.
  * Many fulltext & document-search libraries compared here are
    designed to work best with exact terms rather than partial
    matches (which this benchmark is skewed towards).

Still, something is better than a hand-wavy YMMV/do-it-yourself
dismissal and certainly better than nothing.

 Benchmark

  * Each benchmark can be run by changing the libs parameter to the
    desired library name: https://leeoniya.github.io/uFuzzy/demos/
    compare.html?bench&libs=uFuzzy
  * Results output is suppressed in bench mode to avoid benchmarking
    the DOM.
  * Measurements are taken in the Performance secrion of Chrome's
    DevTools by recording several reloads of the bench page, with
    forced garbage collection in between. The middle/typical run is
    used to collect numbers.
  * The search corpus is 162,000 words and phrases, loaded from a 4MB
    testdata.json.
  * The benchmark types and then deletes, character-by-character
    (every 100ms) the following search terms, triggering a search for
    each keypress: test, chest, super ma, mania, puzz, prom rem stor,
    twil.

To evaluate the results for each library, or to compare several,
simply visit the same page with more libs and without bench: https://
leeoniya.github.io/uFuzzy/demos/compare.html?libs=
uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma.

profile example

There are several metrics evaluated:

  * Init time - how long it takes to load the library and build any
    required index to perform searching.
  * Bench runtime - how long it takes to execute all searches.
  * Memory required - peak JS heap size used during the bench as well
    as how much is still retained after a forced garbage collection
    at the end.
  * GC cost - how much time is needed to collect garbage at the end
    (main thread jank)

        Lib         Stars    Size    Init  Search    Heap    Retained
                            (min)                   (peak)
uFuzzy (try)         0    4KB      0.3ms  620ms   25.5MB    7.5MB
Fuse.js (try)             23.5KB   40ms   35432ms 372MB     14MB
                    14.8k
FlexSearch (Light)   8.9k 5.9KB    3600ms 130ms   673MB     316MB
(try)
Lunr.js (try)        8.2k 29.4KB   1900ms 755ms   450MB     231MB
LyraSearch (try)     3.3k
match-sorter (try)   3.1k 7.3KB    0.03ms 8900ms  73.4MB    7.4MB
fuzzysort (try)      3k   5.5KB    50ms   1400ms  176MB     84MB
Wade (try)           3k   4KB      770ms  315ms   436MB     42MB
fuzzysearch (try)    2.6k
Elasticlunr.js (try  1.9k 18.1KB   1000ms 1630ms  224MB     70MB
)
MiniSearch (try)     1.5k 22.4KB   550ms  1900ms  423MB     64MB
Fuzzyset (try)       1.3k 2.8KB    3100ms 850ms   660MB     238MB
search-index (try)   1.3k
LiquidMetal (try)    285
ItemJS (try)         260
FuzzySearch (try)    184
FuzzySearch2 (try)   173
QuickScore (try)     131  9.1KB    26ms   7100ms  171MB     12.3MB
fzy (try)            115
fuzzyMatch (try)     0

About

A tiny, efficient fuzzy search that doesn't suck

Topics

search autocomplete fuzzy-search typeahead fuzzy-matching 
ranking-algorithm filter-list typeahead-search

Resources

Readme

License

MIT license

Stars

98 stars

Watchers

1 watching

Forks

2 forks

Releases 3

 
0.6.0 Latest
Sep 30, 2022
+ 2 releases

Packages 0

No packages published

Languages

  * JavaScript 100.0%

Footer

 (c) 2022 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.