https://github.com/wolfgarbe/PruningRadixTrie Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} wolfgarbe / PruningRadixTrie Public * Notifications * Fork 23 * Star 341 PruningRadixTrie - 1000x faster Radix trie for prefix search & auto-complete seekstorm.com/blog/pruning-radix-trie/ License MIT license 341 stars 23 forks Activity Star Notifications * Code * Issues 4 * Pull requests 0 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights wolfgarbe/PruningRadixTrie This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. master Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 1 branch 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/w] Use Git or checkout with SVN using the web URL. [gh repo clone wolfga] Work fast with our official CLI. Learn more about the CLI. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @wolfgarbe wolfgarbe Update README.md ... e86ca38 Sep 30, 2023 Update README.md e86ca38 Git stats * 68 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time PruningRadixTrie.Benchmark - Migrate PruningRadixTrie => PruningRadixTrie.Benchmark project folder July 25, 2020 18:30 PruningRadixTrie - Migrate PruningRadixTrie => PruningRadixTrie.Benchmark project folder July 25, 2020 18:30 .gitattributes Add .gitignore and .gitattributes. May 28, 2020 20:13 .gitignore Add .gitignore and .gitattributes. May 28, 2020 20:13 LICENSE Create LICENSE May 28, 2020 20:25 PruningRadixTrie.sln - Migrate PruningRadixTrie => PruningRadixTrie.Benchmark project folder July 25, 2020 18:30 README.md Update README.md September 30, 2023 19:18 View code PruningRadixTrie Performance Dictionary Blog Posts Application: Usage: Ports README.md PruningRadixTrie MIT License PruningRadixTrie - 1000x faster Radix trie for prefix search & auto-complete The PruningRadixTrie is a novel data structure, derived from a radix trie - but 3 orders of magnitude faster. A Radix Trie or Patricia Trie is a space-optimized trie (prefix tree). A Pruning Radix trie is a novel Radix trie algorithm, that allows pruning of the Radix trie and early termination of the lookup. In many cases, we are not interested in a complete set of all children for a given prefix, but only in the top-k most relevant terms. Especially for short prefixes, this results in a massive reduction of lookup time for the top-10 results. On the other hand, a complete result set of millions of suggestions wouldn't be helpful at all for autocompletion. The lookup acceleration is achieved by storing in each node the maximum rank of all its children. By comparing this maximum child rank with the lowest rank of the results retrieved so far, we can heavily prune the trie and do early termination of the lookup for non-promising branches with low child ranks. --------------------------------------------------------------------- Performance Benchmark The Pruning Radix Trie is up to 1000x faster than an ordinary Radix Trie. While 36 ms for an autocomplete might seem fast enough for a single user, it becomes insufficient when we have to serve thousands of users in parallel. Then autocomplete lookups in large dictionaries are only feasible when powered by something much faster than an ordinary radix trie. While a prefix of length=1 is not very useful for the Latin alphabet, it does make sense for CJK languages. Also, there are many more application fields for a fast prefix search algorithm beyond character-wise word completion: Instead of characters, the prefix can be composed of arbitrary items, e.g. whole words in a query completion, or towns in a long routing sequence. Dictionary Terms.txt contains 6 million unigrams and bigrams derived from English Wikipedia, with term frequency counts used for ranking. But you can use any frequency dictionary for any language and domain of your choice. Blog Posts The Pruning Radix Trie -- a Radix trie on steroids 1000x Faster Spelling Correction algorithm SymSpell vs. BK-tree: 100x faster fuzzy string search & spell checking Application: The PruningRadixTrie is perfect for auto-completion, query completion or any other prefix search in large dictionaries. While 37 ms for an auto-complete might seem fast enough for a single user, it becomes a completely different story if we have to serve thousands of users in parallel. Then autocomplete lookups in large dictionaries become only feasible when powered by something much faster than an ordinary radix trie. Usage: Create Object PruningRadixtrie pruningRadixTrie = new PruningRadixtrie(); AddTerm: insert term and term frequency count into Pruning Radix Trie. Frequency counts for same term are summed up. pruningRadixTrie.AddTerm("microsoft", 1000); GetTopkTermsForPrefix: retrieve the top-k most relevant terms for a given prefix from the Pruning Radix Trie. string prefix="micro"; int topK=10; var results = pruningRadixTrie.GetTopkTermsForPrefix(prefix, topK, out long termFrequencyCountPrefix); foreach ((string term,long termFrequencyCount) in results) Console.WriteLine(term+" "+termFrequencyCount); ReadTermsFromFile: Deserialise the Pruning Radix Trie from disk for persistence. pruningRadixTrie.ReadTermsFromFile("terms.txt"); WriteTermsToFile: Serialise the Pruning Radix Trie to disk for persistence. pruningRadixTrie.WriteTermsToFile("terms.txt"); Ports The following third party ports or reimplementations to other programming languages have not been tested by myself whether they are an exact port, error free, provide identical results or are as fast as the original algorithm. Java https://github.com/benldr/JPruningRadixTrie Python https://github.com/otto-de/PyPruningRadixTrie Rust https://github.com/peterall/pruning_radix_trie --------------------------------------------------------------------- PruningRadixTrie is contributed by SeekStorm - the high performance Search as a Service & search API About PruningRadixTrie - 1000x faster Radix trie for prefix search & auto-complete seekstorm.com/blog/pruning-radix-trie/ Topics autocomplete tree trie auto-complete radix-tree trees radix-trie auto-completion patricia-tree prefix-search trie-tree auto-suggest patricia-trie Resources Readme License MIT license Activity Stars 341 stars Watchers 6 watching Forks 23 forks Report repository Releases No releases published Packages 0 No packages published Contributors 2 * * Languages * C# 100.0% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time.