https://github.com/explosion/spaCy/releases/tag/v3.0.0 Skip to content Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Project management - + Integrations - + GitHub Sponsors - + Customer stories - + Security - * Team * Enterprise * Explore + Explore GitHub - Learn & contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Nonprofit - + Education - [ ] [search-key] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} explosion / spaCy * Watch 565 * Star 18.5k * Fork 3.2k * Code * Issues 85 * Pull requests 6 * Discussions * Actions * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Security * Insights Releases Tags Latest release * v3.0.0 * a59f3fc * * Compare Choose a tag to compare [ ] Search for a tag v3.0.0: Transformer-based pipelines, new training system, project templates, custom models, improved component API, type hints & lots more Latest release * v3.0.0 * a59f3fc * Compare Choose a tag to compare [ ] Search for a tag * @ines ines released this Feb 1, 2021 NEW: Want to make the transition from spaCy v2 to spaCy v3 as smooth as possible for you and your organization? We're now offering commercial migration support for your spaCy pipelines! We've put a lot of work into making it easy to upgrade your existing code and training workflows - but custom projects may always need some custom work, especially when it comes to taking advantage of the new capabilities. Details & application - Quickstart For the smoothest updating process, we recommend starting with a fresh virtual environment. pip install -U spacy * New in v3.0: New features, backwards incompatibilities and migration guide. * Installation Quickstart: Install the new version, pipelines and add-ons for your specific setup. * Training Quickstart: Generate a training config for your specific use case. * Benchmarks: Results and accuracy comparisons. * Projects & Project Templates: Get started by cloning a project template. New features and improvements * Transformer-based pipelines with support for multi-task learning. * Retrained model families for 18+ languages and 58 trained pipelines in total, including 5 transformer-based pipelines. * Retrained pipelines for all supported languages, plus new core pipelines for Macedonian and Russian. Thanks to @borijang, @buriy and @kuk for the contributions! * New training workflow and config system. * Implement custom models using any machine learning framework, including PyTorch, TensorFlow and MXNet. * spaCy Projects for managing end-to-end multi-step workflows from preprocessing to model deployment. * Integrations with Data Version Control (DVC), Streamlit, Weights & Biases, Ray and more. * Parallel training and distributed computing with Ray. * New built-in pipeline components: SentenceRecognizer, Morphologizer, Lemmatizer, AttributeRuler and Transformer. * New and improved pipeline component API and decorators for custom components. * Source trained components from other pipelines in your training config. * Pre-built and more efficient binary wheels for all trained pipeline packages, available by setting --wheel on spacy download. * DependencyMatcher for matching patterns within the dependency parse using Semgrex operators. * Support for greedy patterns in Matcher. * New data structure SpanGroup for efficiently storing collections of potentially overlapping spans via the Doc.spans. * Type hints and type-based data validation for custom registered functions. * Various new methods, attributes and commands. Video introductions & tutorials spaCy v3: State-of-the-art NLP from spaCy v3: Design concepts explained (behind spaCy v3: Custom trainable relation Prototype to Production the scenes) extraction component [106374435-ddeb8f00-6383-11eb-99f4-c2f2570] [106374436-df1cbc00-6383-11eb-905a-d5db2ca] [106374432-d926db00-6383-11eb-8b3c-5173dd5] Trained pipelines (58) To download a trained pipeline, you can use the spacy download command. See the training documentation for details on how to train your own pipelines on your data. Name Language POS TAG LAS UAS NER Sent Size da_core_news_lg Danish 0.97 0.97 0.78 0.82 0.82 0.88 547 v3.0.0 MB da_core_news_md Danish 0.96 0.96 0.78 0.82 0.81 0.86 47 MB v3.0.0 da_core_news_sm Danish 0.95 0.95 0.76 0.81 0.72 0.86 17 MB v3.0.0 de_core_news_lg German 0.98 0.98 0.91 0.93 0.85 0.95 546 v3.0.0 MB de_core_news_md German 0.98 0.98 0.91 0.93 0.84 0.95 47 MB v3.0.0 de_core_news_sm German 0.98 0.97 0.90 0.92 0.82 0.94 18 MB v3.0.0 de_dep_news_trf German 0.99 0.99 0.95 0.96 n/a 0.98 393 v3.0.0 MB el_core_news_lg Greek 0.97 0.94 0.85 0.88 0.80 1.00 544 v3.0.0 MB el_core_news_md Greek 0.96 0.93 0.84 0.87 0.79 1.00 42 MB v3.0.0 el_core_news_sm Greek 0.94 0.91 0.81 0.85 0.72 1.00 12 MB v3.0.0 en_core_web_lg English n/a 0.97 0.90 0.92 0.86 0.89 742 v3.0.0 MB en_core_web_md English n/a 0.97 0.90 0.92 0.85 0.89 44 MB v3.0.0 en_core_web_sm English n/a 0.97 0.90 0.92 0.84 0.89 13 MB v3.0.0 en_core_web_trf English n/a 0.98 0.94 0.95 0.90 0.89 438 v3.0.0 MB es_core_news_lg Spanish 0.99 0.98 0.88 0.91 0.90 1.00 547 v3.0.0 MB es_core_news_md Spanish 0.99 0.98 0.88 0.91 0.90 1.00 46 MB v3.0.0 es_core_news_sm Spanish 0.98 0.97 0.87 0.90 0.89 1.00 17 MB v3.0.0 es_dep_news_trf Spanish 0.99 0.98 0.93 0.95 n/a 0.97 395 v3.0.0 MB fr_core_news_lg French 0.98 0.95 0.86 0.90 0.82 0.88 546 v3.0.0 MB fr_core_news_md French 0.97 0.94 0.85 0.89 0.81 0.87 45 MB v3.0.0 fr_core_news_sm French 0.96 0.93 0.84 0.88 0.79 0.85 16 MB v3.0.0 fr_dep_news_trf French 0.99 0.96 0.92 0.94 n/a 0.94 381 v3.0.0 MB it_core_news_lg Italian 0.98 0.97 0.88 0.91 0.89 0.97 545 v3.0.0 MB it_core_news_md Italian 0.97 0.97 0.88 0.91 0.87 0.97 44 MB v3.0.0 it_core_news_sm Italian 0.97 0.97 0.86 0.90 0.85 0.97 16 MB v3.0.0 ja_core_news_lg Japanese 0.96 0.97 0.90 0.92 0.72 0.98 531 v3.0.0 MB ja_core_news_md Japanese 0.96 0.97 0.90 0.92 0.72 0.99 41 MB v3.0.0 ja_core_news_sm Japanese 0.96 0.97 0.90 0.92 0.64 0.99 12 MB v3.0.0 lt_core_news_lg Lithuanian 0.96 0.89 0.68 0.75 0.80 0.82 545 v3.0.0 MB lt_core_news_md Lithuanian 0.95 0.86 0.67 0.74 0.79 0.83 44 MB v3.0.0 lt_core_news_sm Lithuanian 0.91 0.82 0.59 0.68 0.74 0.79 15 MB v3.0.0 mk_core_news_lg Macedonian 0.93 n/a 0.51 0.68 0.76 0.73 312 v3.0.0 MB mk_core_news_md Macedonian 0.93 n/a 0.51 0.67 0.75 0.73 44 MB v3.0.0 mk_core_news_sm Macedonian 0.92 n/a 0.47 0.62 0.70 0.62 18 MB v3.0.0 nb_core_news_lg Norwegian 0.97 0.97 0.87 0.89 0.85 0.94 547 v3.0.0 MB nb_core_news_md Norwegian 0.97 0.97 0.87 0.90 0.85 0.93 44 MB v3.0.0 nb_core_news_sm Norwegian 0.97 0.97 0.85 0.88 0.77 0.93 15 MB v3.0.0 nl_core_news_lg Dutch 0.96 0.95 0.82 0.87 0.77 0.87 546 v3.0.0 MB nl_core_news_md Dutch 0.96 0.95 0.82 0.87 0.74 0.87 45 MB v3.0.0 nl_core_news_sm Dutch 0.95 0.93 0.80 0.85 0.72 0.86 16 MB v3.0.0 pl_core_news_lg Polish 0.97 0.98 0.84 0.89 0.85 0.99 584 v3.0.0 MB pl_core_news_md Polish 0.97 0.98 0.84 0.89 0.84 0.98 84 MB v3.0.0 pl_core_news_sm Polish 0.95 0.98 0.79 0.86 0.80 0.98 55 MB v3.0.0 pt_core_news_lg Portuguese 0.97 0.90 0.86 0.90 0.91 0.95 551 v3.0.0 MB pt_core_news_md Portuguese 0.97 0.90 0.86 0.90 0.90 0.95 49 MB v3.0.0 pt_core_news_sm Portuguese 0.97 0.89 0.85 0.89 0.88 0.92 21 MB v3.0.0 ro_core_news_lg Romanian 0.96 0.97 0.84 0.89 0.77 0.96 546 v3.0.0 MB ro_core_news_md Romanian 0.96 0.97 0.85 0.89 0.76 0.96 44 MB v3.0.0 ro_core_news_sm Romanian 0.96 0.96 0.82 0.87 0.72 0.97 15 MB v3.0.0 ru_core_news_lg Russian 0.99 0.99 0.95 0.96 0.95 1.00 491 v3.0.0 MB ru_core_news_md Russian 0.99 0.99 0.95 0.96 0.94 1.00 41 MB v3.0.0 ru_core_news_sm Russian 0.99 0.99 0.95 0.96 0.95 1.00 16 MB v3.0.0 xx_ent_wiki_sm MultiLanguage n/a n/a n/a n/a 0.82 n/a 14 MB v3.0.0 xx_sent_ud_sm MultiLanguage n/a n/a n/a n/a n/a 0.86 10 MB v3.0.0 zh_core_web_lg Chinese n/a 0.90 0.66 0.71 0.71 0.75 577 v3.0.0 MB zh_core_web_md Chinese n/a 0.90 0.65 0.70 0.70 0.76 76 MB v3.0.0 zh_core_web_sm Chinese n/a 0.90 0.64 0.70 0.69 0.75 47 MB v3.0.0 zh_core_web_trf Chinese n/a 0.92 0.73 0.77 0.75 0.65 398 v3.0.0 MB TAG: Part-of-speech tags (fine-grained tags, i.e. Token.tag_) POS: Part-of-speech tags (coarse-grained tags, i.e. Token.pos_) UAS: Unlabelled dependencies (parser). LAS: Labelled dependencies (parser). NER: Named entities (F-score). Sent: Sentence segmentation. Size: Model file size (zipped archive). [?][?] Backwards incompatibilities For more info on how to migrate from spaCy v2.x, see the detailed migration guide. API changes * Pipeline package symlinks, the link command and shortcut names are now deprecated. There can be many different trained pipelines and not just one "English model", so you should always use the full package name like en_core_web_sm explicitly. * A pipeline's meta.json is now only used to provide meta information like the package name, author, license and labels. It's not used to construct the processing pipeline anymore. This is all defined in the config.cfg, which also includes all settings used to train the pipeline. * The train, pretrain and debug data commands now only take a config.cfg. * Language.add_pipe now takes the string name of the component factory instead of the component function. * Custom pipeline components now need to be decorated with the @Language.component or @Language.factory decorator. * The Language.update, Language.evaluate and TrainablePipe.update methods now all take batches of Example objects instead of Doc and GoldParse objects, or raw text and a dictionary of annotations. * The begin_training methods have been renamed to initialize and now take a function that returns a sequence of Example objects to initialize the model instead of a list of tuples. * Matcher.add and PhraseMatcher.add now only accept a list of patterns as the second argument (instead of a variable number of arguments). The on_match callback becomes an optional keyword argument. * The Doc flags like Doc.is_parsed or Doc.is_tagged have been replaced by Doc.has_annotation. * The spacy.gold module has been renamed to spacy.training. * The PRON_LEMMA symbol and -PRON- as an indicator for pronoun lemmas has been removed. * The TAG_MAP and MORPH_RULES in the language data have been replaced by the more flexible AttributeRuler. * The Lemmatizer is now a standalone pipeline component and doesn't provide lemmas by default or switch automatically between lookup and rule-based lemmas. You can now add it to your pipeline explicitly and set its mode on initialization. * Various keyword arguments across functions and methods are now explicitly declared as keyword-only arguments. Those arguments are documented accordingly across the API reference. Removed or renamed API Removed Replacement Language.select_pipes, Language.disable_pipes Language.disable_pipe, Language.enable_pipe Language.begin_training, Language.initialize, Pipe.begin_training, ... Pipe.initialize, ... Doc.is_tagged, Doc.is_parsed, ... Doc.has_annotation GoldParse Example GoldCorpus Corpus KnowledgeBase.load_bulk, KnowledgeBase.from_disk, KnowledgeBase.dump KnowledgeBase.to_disk Matcher.pipe, PhraseMatcher.pipe not needed gold.offsets_from_biluo_tags, training.biluo_tags_to_offsets, gold.spans_from_biluo_tags, training.biluo_tags_to_spans, gold.biluo_tags_from_offsets training.offsets_to_biluo_tags spacy init-model spacy init vectors spacy debug-data spacy debug data spacy profile spacy debug profile spacy link, util.set_data_path, not needed, symlinks are deprecated util.get_data_path The following deprecated methods, attributes and arguments were removed in v3.0. Most of them have been deprecated for a while and many would previously raise errors. Many of them were also mostly internals. If you've been working with more recent versions of spaCy v2.x, it's unlikely that your code relied on them. Removed Replacement Doc.tokens_from_list Doc.__init__ Doc.merge, Span.merge Doc.retokenize Token.string, Span.string, Span.upper, Span.text, Token.text Span.lower Language.tagger, Language.parser, Language.get_pipe Language.entity keyword-arguments like vocab=False on exclude=["vocab"] to_disk, from_disk, to_bytes, from_bytes n_threads argument on Tokenizer, Matcher, n_process PhraseMatcher verbose argument on Language.evaluate logging (DEBUG) SentenceSegmenter hook, SimilarityHook user hooks, Sentencizer, SentenceRecognizer Contributors This release is brought to you by @honnibal, @ines, @svlandeg and @adrianeboyd. Thanks to @AMArostegui, @BramVanroy, @Cristianasp, @DeNeutoy, @DuyguA, @Jan-711, @KKsharma99, @KeshavG-lb, @KoichiYasuoka, @MartinoMensio, @Nuccy90, @PluieElectrique, @SamEdwardes, @Stannislav, @abchapman93, @alexcombessie, @alvaroabascar, @baranitharan2020, @bittlingmayer, @bjascob, @borijang, @borijang, @bratao, @bryant1410, @buriy, @chopeen, @danielvasic, @delzac, @dhruvrnaik, @erip, @florijanstamenkovic, @forest1988, @gandersen101, @garethsparks, @graue70, @guadiromero, @hertelm, @hiroshi-matsuda-rit, @holubvl3, @idoshr, @jabortell, @jbesomi, @jenojp, @jganseman, @jgutix, @jmargeta, @jumasheff, @kuk, @leyendecker, @lizhe2004, @lorenanda, @mahnerak, @mikeizbicki, @myavrum, @nipunsadvilkar, @oculusrepairo, @ophelielacroix, @rahul1990gupta, @rameshhpathak, @rasyidf, @revuel, @richardliaw, @robertsipek, @snsten, @solarmist, @tamuhey, @thomasbird, @tiangolo, @tilusnet, @timgates42, @vha14, @walterhenry, @wannaphong, @werew, @yosiasz and @zaibacu for the pull requests and contributions! Assets 2 Source code (zip) Source code (tar.gz) * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.