https://github.com/explosion/spaCy/releases/tag/v3.0.0

Skip to content
 
Sign up

  * Why GitHub?
    Features -
      + Mobile -
      + Actions -
      + Codespaces -
      + Packages -
      + Security -
      + Code review -
      + Project management -
      + Integrations -
      + GitHub Sponsors -
      + Customer stories -
      + Security -
  * Team
  * Enterprise
  * Explore
      + Explore GitHub -

    Learn & contribute

      + Topics -
      + Collections -
      + Trending -
      + Learning Lab -
      + Open source guides -

    Connect with others

      + The ReadME Project -
      + Events -
      + Community forum -
      + GitHub Education -
      + GitHub Stars program -
  * Marketplace
  * Pricing
    Plans -
      + Compare plans -
      + Contact Sales -
      + Nonprofit -
      + Education -

[                    ] [search-key]

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in Sign up
{{ message }}

explosion / spaCy

  * Watch 565
  * Star 18.5k
  * Fork 3.2k

  * Code
  * Issues 85
  * Pull requests 6
  * Discussions
  * Actions
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Discussions
  * Actions
  * Security
  * Insights

Releases Tags
Latest release

  * v3.0.0
  * a59f3fc
  * 
  * Compare
    Choose a tag to compare
    [                    ]
    Search for a tag

v3.0.0: Transformer-based pipelines, new training system, project
templates, custom models, improved component API, type hints & lots
more
Latest release

  * v3.0.0
  * a59f3fc
  * Compare
    Choose a tag to compare
    [                    ]
    Search for a tag
  * 

@ines ines released this Feb 1, 2021

     NEW: Want to make the transition from spaCy v2 to spaCy v3 as
    smooth as possible for you and your organization? We're now
    offering commercial migration support for your spaCy pipelines!
    We've put a lot of work into making it easy to upgrade your
    existing code and training workflows - but custom projects may
    always need some custom work, especially when it comes to taking
    advantage of the new capabilities. Details & application -

 Quickstart

For the smoothest updating process, we recommend starting with a
fresh virtual environment.

pip install -U spacy

  * New in v3.0: New features, backwards incompatibilities and
    migration guide.
  * Installation Quickstart: Install the new version, pipelines and
    add-ons for your specific setup.
  * Training Quickstart: Generate a training config for your specific
    use case.
  * Benchmarks: Results and accuracy comparisons.
  * Projects & Project Templates: Get started by cloning a project
    template.

 New features and improvements

  * Transformer-based pipelines with support for multi-task learning.
  * Retrained model families for 18+ languages and 58 trained
    pipelines in total, including 5 transformer-based pipelines.
  * Retrained pipelines for all supported languages, plus new core
    pipelines for Macedonian and Russian. Thanks to @borijang, @buriy
    and @kuk for the contributions!
  * New training workflow and config system.
  * Implement custom models using any machine learning framework,
    including PyTorch, TensorFlow and MXNet.
  * spaCy Projects for managing end-to-end multi-step workflows from
    preprocessing to model deployment.
  * Integrations with Data Version Control (DVC), Streamlit, Weights
    & Biases, Ray and more.
  * Parallel training and distributed computing with Ray.
  * New built-in pipeline components: SentenceRecognizer,
    Morphologizer, Lemmatizer, AttributeRuler and Transformer.
  * New and improved pipeline component API and decorators for custom
    components.
  * Source trained components from other pipelines in your training
    config.
  * Pre-built and more efficient binary wheels for all trained
    pipeline packages, available by setting --wheel on spacy
    download.
  * DependencyMatcher for matching patterns within the dependency
    parse using Semgrex operators.
  * Support for greedy patterns in Matcher.
  * New data structure SpanGroup for efficiently storing collections
    of potentially overlapping spans via the Doc.spans.
  * Type hints and type-based data validation for custom registered
    functions.
  * Various new methods, attributes and commands.

 Video introductions & tutorials

    spaCy v3: State-of-the-art NLP from     spaCy v3: Design concepts explained (behind     spaCy v3: Custom trainable relation
          Prototype to Production                           the scenes)                            extraction component
[106374435-ddeb8f00-6383-11eb-99f4-c2f2570] [106374436-df1cbc00-6383-11eb-905a-d5db2ca] [106374432-d926db00-6383-11eb-8b3c-5173dd5]

 Trained pipelines (58)

To download a trained pipeline, you can use the spacy download
command. See the training documentation for details on how to train
your own pipelines on your data.

       Name           Language     POS  TAG  LAS  UAS  NER Sent  Size
da_core_news_lg     Danish        0.97 0.97 0.78 0.82 0.82 0.88   547 
v3.0.0                                                             MB
da_core_news_md     Danish        0.96 0.96 0.78 0.82 0.81 0.86 47 MB 
v3.0.0
da_core_news_sm     Danish        0.95 0.95 0.76 0.81 0.72 0.86 17 MB 
v3.0.0
de_core_news_lg     German        0.98 0.98 0.91 0.93 0.85 0.95   546 
v3.0.0                                                             MB
de_core_news_md     German        0.98 0.98 0.91 0.93 0.84 0.95 47 MB 
v3.0.0
de_core_news_sm     German        0.98 0.97 0.90 0.92 0.82 0.94 18 MB 
v3.0.0
de_dep_news_trf     German        0.99 0.99 0.95 0.96  n/a 0.98   393 
v3.0.0                                                             MB
el_core_news_lg     Greek         0.97 0.94 0.85 0.88 0.80 1.00   544 
v3.0.0                                                             MB
el_core_news_md     Greek         0.96 0.93 0.84 0.87 0.79 1.00 42 MB 
v3.0.0
el_core_news_sm     Greek         0.94 0.91 0.81 0.85 0.72 1.00 12 MB 
v3.0.0
en_core_web_lg      English        n/a 0.97 0.90 0.92 0.86 0.89   742 
v3.0.0                                                             MB
en_core_web_md      English        n/a 0.97 0.90 0.92 0.85 0.89 44 MB 
v3.0.0
en_core_web_sm      English        n/a 0.97 0.90 0.92 0.84 0.89 13 MB 
v3.0.0
en_core_web_trf     English        n/a 0.98 0.94 0.95 0.90 0.89   438 
v3.0.0                                                             MB
es_core_news_lg     Spanish       0.99 0.98 0.88 0.91 0.90 1.00   547 
v3.0.0                                                             MB
es_core_news_md     Spanish       0.99 0.98 0.88 0.91 0.90 1.00 46 MB 
v3.0.0
es_core_news_sm     Spanish       0.98 0.97 0.87 0.90 0.89 1.00 17 MB 
v3.0.0
es_dep_news_trf     Spanish       0.99 0.98 0.93 0.95  n/a 0.97   395 
v3.0.0                                                             MB
fr_core_news_lg     French        0.98 0.95 0.86 0.90 0.82 0.88   546 
v3.0.0                                                             MB
fr_core_news_md     French        0.97 0.94 0.85 0.89 0.81 0.87 45 MB 
v3.0.0
fr_core_news_sm     French        0.96 0.93 0.84 0.88 0.79 0.85 16 MB 
v3.0.0
fr_dep_news_trf     French        0.99 0.96 0.92 0.94  n/a 0.94   381 
v3.0.0                                                             MB
it_core_news_lg     Italian       0.98 0.97 0.88 0.91 0.89 0.97   545 
v3.0.0                                                             MB
it_core_news_md     Italian       0.97 0.97 0.88 0.91 0.87 0.97 44 MB 
v3.0.0
it_core_news_sm     Italian       0.97 0.97 0.86 0.90 0.85 0.97 16 MB 
v3.0.0
ja_core_news_lg     Japanese      0.96 0.97 0.90 0.92 0.72 0.98   531 
v3.0.0                                                             MB
ja_core_news_md     Japanese      0.96 0.97 0.90 0.92 0.72 0.99 41 MB 
v3.0.0
ja_core_news_sm     Japanese      0.96 0.97 0.90 0.92 0.64 0.99 12 MB 
v3.0.0
lt_core_news_lg     Lithuanian    0.96 0.89 0.68 0.75 0.80 0.82   545 
v3.0.0                                                             MB
lt_core_news_md     Lithuanian    0.95 0.86 0.67 0.74 0.79 0.83 44 MB 
v3.0.0
lt_core_news_sm     Lithuanian    0.91 0.82 0.59 0.68 0.74 0.79 15 MB 
v3.0.0
mk_core_news_lg     Macedonian    0.93  n/a 0.51 0.68 0.76 0.73   312 
v3.0.0                                                             MB
mk_core_news_md     Macedonian    0.93  n/a 0.51 0.67 0.75 0.73 44 MB 
v3.0.0
mk_core_news_sm     Macedonian    0.92  n/a 0.47 0.62 0.70 0.62 18 MB 
v3.0.0
nb_core_news_lg     Norwegian     0.97 0.97 0.87 0.89 0.85 0.94   547 
v3.0.0                                                             MB
nb_core_news_md     Norwegian     0.97 0.97 0.87 0.90 0.85 0.93 44 MB 
v3.0.0
nb_core_news_sm     Norwegian     0.97 0.97 0.85 0.88 0.77 0.93 15 MB 
v3.0.0
nl_core_news_lg     Dutch         0.96 0.95 0.82 0.87 0.77 0.87   546 
v3.0.0                                                             MB
nl_core_news_md     Dutch         0.96 0.95 0.82 0.87 0.74 0.87 45 MB 
v3.0.0
nl_core_news_sm     Dutch         0.95 0.93 0.80 0.85 0.72 0.86 16 MB 
v3.0.0
pl_core_news_lg     Polish        0.97 0.98 0.84 0.89 0.85 0.99   584 
v3.0.0                                                             MB
pl_core_news_md     Polish        0.97 0.98 0.84 0.89 0.84 0.98 84 MB 
v3.0.0
pl_core_news_sm     Polish        0.95 0.98 0.79 0.86 0.80 0.98 55 MB 
v3.0.0
pt_core_news_lg     Portuguese    0.97 0.90 0.86 0.90 0.91 0.95   551 
v3.0.0                                                             MB
pt_core_news_md     Portuguese    0.97 0.90 0.86 0.90 0.90 0.95 49 MB 
v3.0.0
pt_core_news_sm     Portuguese    0.97 0.89 0.85 0.89 0.88 0.92 21 MB 
v3.0.0
ro_core_news_lg     Romanian      0.96 0.97 0.84 0.89 0.77 0.96   546 
v3.0.0                                                             MB
ro_core_news_md     Romanian      0.96 0.97 0.85 0.89 0.76 0.96 44 MB 
v3.0.0
ro_core_news_sm     Romanian      0.96 0.96 0.82 0.87 0.72 0.97 15 MB 
v3.0.0
ru_core_news_lg     Russian       0.99 0.99 0.95 0.96 0.95 1.00   491 
v3.0.0                                                             MB
ru_core_news_md     Russian       0.99 0.99 0.95 0.96 0.94 1.00 41 MB 
v3.0.0
ru_core_news_sm     Russian       0.99 0.99 0.95 0.96 0.95 1.00 16 MB 
v3.0.0
xx_ent_wiki_sm      MultiLanguage  n/a  n/a  n/a  n/a 0.82  n/a 14 MB 
v3.0.0
xx_sent_ud_sm       MultiLanguage  n/a  n/a  n/a  n/a  n/a 0.86 10 MB 
v3.0.0
zh_core_web_lg      Chinese        n/a 0.90 0.66 0.71 0.71 0.75   577 
v3.0.0                                                             MB
zh_core_web_md      Chinese        n/a 0.90 0.65 0.70 0.70 0.76 76 MB 
v3.0.0
zh_core_web_sm      Chinese        n/a 0.90 0.64 0.70 0.69 0.75 47 MB 
v3.0.0
zh_core_web_trf     Chinese        n/a 0.92 0.73 0.77 0.75 0.65   398 
v3.0.0                                                             MB

     TAG: Part-of-speech tags (fine-grained tags, i.e. Token.tag_)
    POS: Part-of-speech tags (coarse-grained tags, i.e. Token.pos_)
    UAS: Unlabelled dependencies (parser). LAS: Labelled dependencies
    (parser). NER: Named entities (F-score). Sent: Sentence
    segmentation. Size: Model file size (zipped archive).

[?][?] Backwards incompatibilities

    For more info on how to migrate from spaCy v2.x, see the detailed
    migration guide.

API changes

  * Pipeline package symlinks, the link command and shortcut names
    are now deprecated. There can be many different trained pipelines
    and not just one "English model", so you should always use the
    full package name like en_core_web_sm explicitly.
  * A pipeline's meta.json is now only used to provide meta
    information like the package name, author, license and labels.
    It's not used to construct the processing pipeline anymore. This
    is all defined in the config.cfg, which also includes all
    settings used to train the pipeline.
  * The train, pretrain and debug data commands now only take a
    config.cfg.
  * Language.add_pipe now takes the string name of the component
    factory instead of the component function.
  * Custom pipeline components now need to be decorated with the
    @Language.component or @Language.factory decorator.
  * The Language.update, Language.evaluate and TrainablePipe.update
    methods now all take batches of Example objects instead of Doc
    and GoldParse objects, or raw text and a dictionary of
    annotations.
  * The begin_training methods have been renamed to initialize and
    now take a function that returns a sequence of Example objects to
    initialize the model instead of a list of tuples.
  * Matcher.add and PhraseMatcher.add now only accept a list of
    patterns as the second argument (instead of a variable number of
    arguments). The on_match callback becomes an optional keyword
    argument.
  * The Doc flags like Doc.is_parsed or Doc.is_tagged have been
    replaced by Doc.has_annotation.
  * The spacy.gold module has been renamed to spacy.training.
  * The PRON_LEMMA symbol and -PRON- as an indicator for pronoun
    lemmas has been removed.
  * The TAG_MAP and MORPH_RULES in the language data have been
    replaced by the more flexible AttributeRuler.
  * The Lemmatizer is now a standalone pipeline component and doesn't
    provide lemmas by default or switch automatically between lookup
    and rule-based lemmas. You can now add it to your pipeline
    explicitly and set its mode on initialization.
  * Various keyword arguments across functions and methods are now
    explicitly declared as keyword-only arguments. Those arguments
    are documented accordingly across the API reference.

Removed or renamed API

             Removed                          Replacement
                                  Language.select_pipes,
Language.disable_pipes            Language.disable_pipe,
                                  Language.enable_pipe
Language.begin_training,          Language.initialize,
Pipe.begin_training, ...          Pipe.initialize, ...
Doc.is_tagged, Doc.is_parsed, ... Doc.has_annotation
GoldParse                         Example
GoldCorpus                        Corpus
KnowledgeBase.load_bulk,          KnowledgeBase.from_disk,
KnowledgeBase.dump                KnowledgeBase.to_disk
Matcher.pipe, PhraseMatcher.pipe  not needed
gold.offsets_from_biluo_tags,     training.biluo_tags_to_offsets,
gold.spans_from_biluo_tags,       training.biluo_tags_to_spans,
gold.biluo_tags_from_offsets      training.offsets_to_biluo_tags
spacy init-model                  spacy init vectors
spacy debug-data                  spacy debug data
spacy profile                     spacy debug profile
spacy link, util.set_data_path,   not needed, symlinks are deprecated
util.get_data_path

The following deprecated methods, attributes and arguments were
removed in v3.0. Most of them have been deprecated for a while and
many would previously raise errors. Many of them were also mostly
internals. If you've been working with more recent versions of spaCy
v2.x, it's unlikely that your code relied on them.

                  Removed                          Replacement
Doc.tokens_from_list                         Doc.__init__
Doc.merge, Span.merge                        Doc.retokenize
Token.string, Span.string, Span.upper,       Span.text, Token.text
Span.lower
Language.tagger, Language.parser,            Language.get_pipe
Language.entity
keyword-arguments like vocab=False on        exclude=["vocab"]
to_disk, from_disk, to_bytes, from_bytes
n_threads argument on Tokenizer, Matcher,    n_process
PhraseMatcher
verbose argument on Language.evaluate        logging (DEBUG)
SentenceSegmenter hook, SimilarityHook       user hooks, Sentencizer,
                                             SentenceRecognizer

 Contributors

This release is brought to you by @honnibal, @ines, @svlandeg and
@adrianeboyd. Thanks to @AMArostegui, @BramVanroy, @Cristianasp,
@DeNeutoy, @DuyguA, @Jan-711, @KKsharma99, @KeshavG-lb,
@KoichiYasuoka, @MartinoMensio, @Nuccy90, @PluieElectrique,
@SamEdwardes, @Stannislav, @abchapman93, @alexcombessie,
@alvaroabascar, @baranitharan2020, @bittlingmayer, @bjascob,
@borijang, @borijang, @bratao, @bryant1410, @buriy, @chopeen,
@danielvasic, @delzac, @dhruvrnaik, @erip, @florijanstamenkovic,
@forest1988, @gandersen101, @garethsparks, @graue70, @guadiromero,
@hertelm, @hiroshi-matsuda-rit, @holubvl3, @idoshr, @jabortell,
@jbesomi, @jenojp, @jganseman, @jgutix, @jmargeta, @jumasheff, @kuk,
@leyendecker, @lizhe2004, @lorenanda, @mahnerak, @mikeizbicki,
@myavrum, @nipunsadvilkar, @oculusrepairo, @ophelielacroix,
@rahul1990gupta, @rameshhpathak, @rasyidf, @revuel, @richardliaw,
@robertsipek, @snsten, @solarmist, @tamuhey, @thomasbird, @tiangolo,
@tilusnet, @timgates42, @vha14, @walterhenry, @wannaphong, @werew,
@yosiasz and @zaibacu for the pull requests and contributions!

Assets 2
Source code (zip)
Source code (tar.gz)

  * (c) 2021 GitHub, Inc.
  * Terms
  * Privacy
  * Security
  * Status
  * Docs

 

  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.