https://github.com/winkjs/wink-nlp Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Compare all + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} winkjs / wink-nlp Public * Notifications * Fork 30 * Star 371 Developer friendly Natural Language Processing winkjs.org/wink-nlp/ License MIT license 371 stars 30 forks Star Notifications * Code * Issues 0 * Pull requests 0 * Discussions * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights winkjs/wink-nlp This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. master Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 1 branch 28 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/w] Use Git or checkout with SVN using the web URL. [gh repo clone winkjs] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @sanjayaksaxena sanjayaksaxena docs(README): change new feature emoji ... 42f81ae Nov 24, 2022 docs(README): change new feature emoji 42f81ae Git stats * 197 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time benchmark feat(benchmark/run): add option to include/exclude custom entities Aug 21, 2020 models chore: update language model version Jul 15, 2021 src feat(*): add pipeConfig() in nlp to allow inquiry of active annotators Apr 3, 2022 test test(wink-nlp-specs): ensure printed data is also tested May 30, 2022 types feat: add bowOf() method in bm25 vectorizer Jan 30, 2022 utilities feat: allow OOV processing in bowOf method of BM25 May 10, 2022 .eslintrc.json chore(*): add config files Dec 15, 2018 .gitignore chore: update config files May 18, 2020 .jsdoc.json chore: update config files May 18, 2020 .npmignore chore: update config files May 18, 2020 .nycrc.json chore: update config files May 18, 2020 .travis.yml build(travis.yml): add node 16 and remove 10 May 22, 2021 CHANGELOG.md docs(CHANGELOG): update README and benchmarks Nov 18, 2022 CODE_OF_CONDUCT.md chore(*): add code of conduct & contributing guidelines Dec 15, 2018 CONTRIBUTING.md docs(CONTRIBUTING): update security considerations May 27, 2022 LICENSE chore(LICENSE): update copyright year Oct 7, 2022 README.md docs(README): change new feature emoji Nov 24, 2022 ROADMAP.md docs(ROADMAP): add short/medium term roadmap May 16, 2022 SECURITY.md docs(SECURITY): add vulnerability reporting details May 9, 2022 package-lock.json build: bpm npm version -- patch Nov 18, 2022 package.json build: bpm npm version -- patch Nov 18, 2022 runkit-example.js chore: update example in runkit and readme Oct 13, 2022 View code [ ] winkNLP Developer friendly Natural Language Processing Build amazing apps quickly Blazing fast Features Documentation Installation How to install for Web Browser Get started Speed & Accuracy Memory Requirement Need Help? Usage query Bug report New feature About winkJS Copyright & License README.md winkNLP Build Status Coverage Status Known Vulnerabilities CII Best Practices Gitter Follow on Twitter Developer friendly Natural Language Processing [68747470733a2] WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applications easier and faster, winkNLP is optimized for the right balance of performance and accuracy. It is built ground up with a lean code base that has no external dependency. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence. WinkNLP with full Typescript support, runs on Node.js and browsers. Build amazing apps quickly Wikipedia article Context aware word Key sentences timeline cloud detection [202497363-] [202506181-] [202506490-] Head to live examples to explore further. Blazing fast WinkNLP can easily process large amount of raw text at speeds over 650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser. Environment Benchmarking Command Node.js node benchmark/run Browser How to measure winkNLP's speed on browsers? Features WinkNLP has a comprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set: For example, the multilingual text string "!Hola! nmskaar! Hi! Bonjour cheri" is tokenized as ["!", "Hola", "!", "nmskaar", Fast, lossless & multilingual tokenizer "!", "Hi", "!", "Bonjour", "cheri"]. The tokenizer processes text at a speed close to 4 million tokens/second on a M1 MBP's browser. With winkNLP, process any text using a Developer friendly and intuitive API simple, declarative syntax; most live examples have 30-40 lines of code. Programmatically mark tokens, sentences, Best-in-class text visualization entities, etc. using HTML mark or any other tag of your choice. Remove and/or retain tokens with specific attributes such as part-of-speech, named entity type, token type, stop word, shape and many more; compute Flesch reading ease score; generate n-grams; [?][?] Extensive text processing features normalize, lemmatise or stem. Checkout how with the right kind of text preprocessing, even Naive Bayes classifier achieves impressive (>=90%) accuracy in sentiment analysis and chatbot intent classification tasks. Compact sizes Pre-trained language models starting from <3MB - reduced model loading time drastically. BM25 vectorizer; Several similarity methods - Cosine, Tversky, Sorensen-Dice, Host of utilities & tools Otsuka-Ochiai; Helpers to get bag of words, frequency table, lemma/stem, stop word removal and many more. WinkJS also has packages like Naive Bayes classifier, multi-class averaged perceptron and popular token and string distance methods , which complement winkNLP. Documentation * Concepts -- everything you need to know to get started. * API Reference -- explains usage of APIs with examples. * Change log -- version history along with the details of breaking changes, if any. * Examples -- live examples with code to give you a head start. Installation Use npm install: npm install wink-nlp --save In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command: Node.js Version Installation 16 or 18 npm install wink-eng-lite-web-model --save 14 or 12 node -e "require('wink-nlp/models/install')" The wink-eng-lite-web-model is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is the recommended model. The second command installs the wink-eng-lite-model, which works with Node.js version 14 or 12. How to install for Web Browser If you're using winkNLP in the browser use the wink-eng-lite-web-model. Learn about its installation and usage in our guide to using winkNLP in the browser. Explore winkNLP recipes on Observable for live browser based examples. Get started Here is the "Hello World!" of winkNLP: // Load wink-nlp package. const winkNLP = require( 'wink-nlp' ); // Load english language model. const model = require( 'wink-eng-lite-web-model' ); // Instantiate winkNLP. const nlp = winkNLP( model ); // Obtain "its" helper to extract item properties. const its = nlp.its; // Obtain "as" reducer helper to reduce a collection. const as = nlp.as; // NLP Code. const text = 'Hello World! How are you?'; const doc = nlp.readDoc( text ); console.log( doc.out() ); // -> Hello World! How are you? console.log( doc.sentences().out() ); // -> [ 'Hello World!', 'How are you?' ] console.log( doc.entities().out( its.detail ) ); // -> [ { value: '', type: 'EMOJI' } ] console.log( doc.tokens().out() ); // -> [ 'Hello', 'World', '', '!', 'How', 'are', 'you', '?' ] console.log( doc.tokens().out( its.type, as.freqTable ) ); // -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ] Experiment with winkNLP on RunKit. Speed & Accuracy The winkNLP processes raw text at ~650,000 tokens per second with its wink-eng-lite-web-model, when benchmarked using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline -- tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks. The benchmark was conducted on Node.js versions 16, and 18. It pos tags a subset of WSJ corpus with an accuracy of ~94.7% -- this includes tokenization of raw text prior to pos tagging. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus. Its general purpose sentiment analysis delivers a f-score of ~84.5%, when validated using Amazon Product Review Sentiment Labelled Sentences Data Set at UCI Machine Learning Repository. The current benchmark accuracy for specifically trained models can range around 95%. Memory Requirement Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire History of India Volume I with a total peak memory requirement of under 80MB. The book has around 350 pages which translates to over 125,000 tokens. Need Help? Usage query Please ask at Stack Overflow or discuss at Wink JS GitHub Discussions or chat with us at Wink JS Gitter Lobby. Bug report If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a PR. New feature Looking for a new feature, request it via the new features & ideas discussion forum or consider becoming a contributor. About winkJS WinkJS is a family of open source packages for Natural Language Processing, Machine Learning, and Statistical Analysis in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions. Copyright & License Wink NLP is copyright 2017-22 GRAYPE Systems Private Limited. It is licensed under the terms of the MIT License. About Developer friendly Natural Language Processing winkjs.org/wink-nlp/ Topics visualization nlp natural-language-processing sentiment-analysis pattern-matching chatbot vectorizer ner wink hacktoberfest pos-tagging tokenize bm25 sentence-boundary-detection word-vectors sbd named-entity-extraction negation-handling custom-entity-detection wink-nlp Resources Readme License MIT license Code of conduct Code of conduct Security policy Security policy Stars 371 stars Watchers 6 watching Forks 30 forks Releases 25 Operational update Latest Nov 18, 2022 + 24 releases Packages 0 No packages published Used by 112 * @WSKKung * @ryan-love * @rai-shatrughan * @AwaisKamran * @trsh * @charles-bot-groupTwo * @alexspdlr * @ptts-easy + 104 Contributors 6 * @sanjayaksaxena * @rachnachakraborty * @prtksxna * @pimpale * @searleser97 * @dependabot[bot] Languages * JavaScript 99.5% * TypeScript 0.5% Footer (c) 2022 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.