https://github.com/thesephist/libsearch Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By company size + Enterprises + Small and medium teams + Startups By use case + DevSecOps + DevOps + CI/CD + View all use cases By industry + Healthcare + Financial services + Manufacturing + Government + View all industries View all solutions * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners + Executive Insights * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} thesephist / libsearch Public * Notifications You must be signed in to change notification settings * Fork 5 * Star 291 Simple, index-free full-text search for JavaScript thesephist.github.io/libsearch/ License MIT license 291 stars 5 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 1 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights thesephist/libsearch main BranchesTags [ ] Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 21 Commits docs docs lib lib src src test test .eslintrc.cjs .eslintrc.cjs .gitignore .gitignore .travis.yml .travis.yml LICENSE LICENSE README.md README.md litterate.config.cjs litterate.config.cjs package.json package.json tsconfig.json tsconfig.json webpack.config.js webpack.config.js yarn.lock yarn.lock View all files Repository files navigation * README * MIT license libsearch npm libsearch TypeScript types Build Status Simple, index-free text search for JavaScript, used across my personal projects like YC Vibe Check, linus.zone/entr, and my personal productivity software. Read the annotated source to understand how it works under the hood. The API Let's begin with some quick examples: import { search } from 'libsearch'; // on Node.js const { search } = window.libsearch; // in the browser const articles = [ { title: 'Weather in Berkeley, California' }, { title: 'University report: UC Berkeley' }, { title: 'Berkeley students rise in solidarity...' }, { title: 'Californian wildlife returning home' }, ]; // basic usage search(articles, 'berkeley cali', a => a.title); // => [{ title: 'Weather in Berkeley, California' }] search(articles, 'california', a => a.title); // => [ // { title: 'Weather in Berkeley, California' }, // { title: 'Californian wildlife returning home' }, // ] // mode: 'word' only returns whole-word matches search(articles, 'california', a => a.title, { mode: 'word' }); // => [{ title: 'Weather in Berkeley, California' }] // case sensitivity search(articles, 'W', a => a.title, { caseSensitive: true }); // => [{ title: 'Weather in Berkeley, California' }] // empty query returns the full list, unmodified search(articles, '', a => a.title); // => [{...}, {...}, {...}, {...}] More formally, libsearch exposes a single API, the search function. This function takes two required arguments and two optional arguments: function search( items: T[], query: string, by?: (it: T) => string, options?: { caseSensitive: boolean, mode: 'word' | 'prefix' | 'autocomplete', }, ): T[] * items is a list of items to search. Typically items will be an array of strings or an array of objects with some string property. * query is a string query with which to search the list of items. * by (optional) is a predicate function that takes an item from items and returns a string value by which to search for that item. For example, if items is a list of objects like { name: 'Linus' }, by will need to be a function x => x.name. This has the value x => String(x) by default, which works for an items of type string[]. * options (optional) is a dictionary of options: + caseSensitive makes a search case-sensitive. It's false by default. + mode controls the way in which incomplete query words are matched: o mode: 'word' requires every query word to match only full, exact words rather than parts of words. For example, the query "California" will match "University of California" but not "Californian University". o mode: 'prefix' means that every query word may be an incomplete "prefix" of the matched word. "Uni Cali" will match both "University of California" and "Californian University" Even in this mode, every query word must match somewhere -- "California" is not a match, because it doesn't match the query word "Uni". o mode: 'autocomplete' is a hybrid of the other two modes that's useful when used in autocomplete-style searches, where a user is continuously typing in a query as search results are being returned. This mode is identical to mode: 'word', except that the last query word may be incomplete like in mode: 'prefix'. It means "University of Cali" will match "University of California", which is useful because the user may find their match before having typed in their full query. You can find more examples of how these options combine together in the unit tests. Installation and usage On the web, with This will expose the search function as window.libsearch.search. Via NPM npm install libsearch # or yarn add libsearch And use in your code: import { search } from 'libsearch'; // search(...); Using TypeScript types libsearch ships with TypeScript type definitions generated from the source file. Using libsearch from NPM should get them picked up by the TypeScript compiler. How it works libsearch lets you perform basic full-text search across a list of JavaScript objects quickly, without requiring a pre-built search index, while offering reasonably good TF-IDF ranking of results. It doesn't deliver the wide array of features that come with libraries like FlexSearch and lunr.js, but is a big step above text.indexOf (query) > -1, and is fast enough to be usable for searching thousands of documents on every keystroke in my experience. There are two key ideas in how libsearch delivers this: 1. Transforming queries into regular expressions Modern JavaScript engines ship with highly optimized regular expression engines, and libsearch takes advantage of this for fast, index-free text search by transforming query strings into regular expression filters at search time. Most full-text search libraries work by first requiring the developer to build up an "index" data structure mapping search terms to documents in which they appear. This is usually a good tradeoff, because it moves some of the computational work of "searching" to be done ahead of time, so search itself can remain fast and accurate. It also allows for fancy transformations and data cleanup like lemmatization on the indexed data without destroying search speed. But when building prototypes and simple web apps, I often didn't want to incur the complexity of having a separate "indexing" step to get a "good enough" search solution. An index needs to be stored somewhere and maintained constantly as the underlying dataset changes and grows. The main task of a search index is mapping "tokens" or keywords that appear in the dataset to the documents in which they appear, so that the question "which documents contain the word X?" is fast (O(1)) to answer at search time. Without an index, this turns into an O(n) question, as every document needs to be scanned for the keyword. But often, on modern hardware, for small-enough datasets (of a few MBs) typical in a client-side web app, the n is pretty small, small enough that O(n) on every keystroke isn't noticeable. libsearch transforms a query like "Uni of California" into a list of regular expression filters, (^|\W)Uni($|\W), (^|\W)of($|\W), (^|\W) California. It then "searches" without needing an index by filtering the corpus through each of those regular expressions. 2. "Good enough" TF-IDF ranking based on RegExp matches and document length The conventional TF-IDF metric is computed for each word as: (# matches) / (# words in the doc) * log(# total docs / # docs that matched) Getting the number of words in a doc requires tokenizing the document, or at least splitting the document by whitespaces, which is computationally expensive. So libsearch approximates this by using the length of the document (number of characters) instead. Using the regular expression queries described above, libsearch's TF-IDF formula is: (# RegExp matches) / (doc.length) * log(# docs / # docs that matched RegExp) which is computed for each word as the search is performed, and then aggregated at the end for sorting. Development libsearch's source code is written in TypeScript. To allow the library to be used across TypeScript, vanilla Node.js and the web, we compile two builds: * The ES module build, which is just search.ts type-checked and types removed. This is the code imported when libsearch is imported in Node.js * The browser build, which exports the main search function to the window.libsearch global The ES module build is produced with tsc, the TypeScript compiler, and the minified browser build is further produced with Webpack. NPM/Yarn commands: * lint and fmt, which lint and automatically format source code in the repository * test runs unit tests on the latest build of the library; you should run build:tsc before running test * Various build:* commands orchestrate producing the different types of library builds: + build:tsc builds the ES module build + build:w runs build:tsc on every file write + build:cjs builds the browser build from the ES module build + build:all builds both builds, in order * clean removes all generated/build files in dist/ * docs builds the Litterate-based documentation, which lives at thesephist.github.io/libsearch. Before pushing to main or publishing, I usually run yarn fmt && yarn build:all && yarn test && yarn docs to make sure I haven't forgotten anything. About Simple, index-free full-text search for JavaScript thesephist.github.io/libsearch/ Topics search npm-package full-text-search Resources Readme License MIT license Activity Stars 291 stars Watchers 4 watching Forks 5 forks Report repository Releases 1 First release Latest Jul 21, 2022 Languages * JavaScript 78.8% * TypeScript 21.2% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.