https://arxiv.org/abs/2505.13124

close this message
arXiv smileybones

arXiv Is Hiring a DevOps Engineer

Work on one of the world's most important websites and make an impact
on open science.

View Jobs
Skip to main content
Cornell University

arXiv Is Hiring a DevOps Engineer

View Jobs
We gratefully acknowledge support from the Simons Foundation, member
institutions, and all contributors. Donate
 
arxiv logo > cs > arXiv:2505.13124
[                    ]

Help | Advanced Search

[All fields        ]
Search
arXiv logo
Cornell University Logo
[                    ] GO
quick links

  * Login
  * Help Pages
  * About

Computer Science > Machine Learning

arXiv:2505.13124 (cs)
[Submitted on 19 May 2025]

Title:$m$PC: Scaling Predictive Coding to 100+ Layer Networks

Authors:Francesco Innocenti, El Mehdi Achour, Christopher L. Buckley
View a PDF of the paper titled $\mu$PC: Scaling Predictive Coding to
100+ Layer Networks, by Francesco Innocenti and 2 other authors
View PDF HTML (experimental)

    Abstract:The biological implausibility of backpropagation (BP)
    has motivated many alternative, brain-inspired algorithms that
    attempt to rely only on local information, such as predictive
    coding (PC) and equilibrium propagation. However, these
    algorithms have notoriously struggled to train very deep
    networks, preventing them from competing with BP in large-scale
    settings. Indeed, scaling PC networks (PCNs) has recently been
    posed as a challenge for the community (Pinchetti et al., 2024).
    Here, we show that 100+ layer PCNs can be trained reliably using
    a Depth-$\mu$P parameterisation (Yang et al., 2023; Bordelon et
    al., 2023) which we call "$\mu$PC". Through an extensive analysis
    of the scaling behaviour of PCNs, we reveal several pathologies
    that make standard PCNs difficult to train at large depths. We
    then show that, despite addressing only some of these
    instabilities, $\mu$PC allows stable training of very deep (up to
    128-layer) residual networks on simple classification tasks with
    competitive performance and little tuning compared to current
    benchmarks. Moreover, $\mu$PC enables zero-shot transfer of both
    weight and activity learning rates across widths and depths. Our
    results have implications for other local algorithms and could be
    extended to convolutional and transformer architectures. Code for
    $\mu$PC is made available as part of a JAX library for PCNs at
    this https URL (Innocenti et al., 2024).

Comments:    34 pages, 41 figures
Subjects:    Machine Learning (cs.LG); Artificial Intelligence
             (cs.AI); Neural and Evolutionary Computing (cs.NE)
ACM classes: I.2.6
Cite as:     arXiv:2505.13124 [cs.LG]
             (or arXiv:2505.13124v1 [cs.LG] for this version)
             https://doi.org/10.48550/arXiv.2505.13124
             Focus to learn more
             arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Francesco Innocenti [view email]
[v1] Mon, 19 May 2025 13:54:29 UTC (24,285 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled $\mu$PC: Scaling Predictive Coding
    to 100+ Layer Networks, by Francesco Innocenti and 2 other
    authors
  * View PDF
  * HTML (experimental)
  * TeX Source
  * Other Formats

license icon view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2025-05
Change to browse by:
cs
cs.AI
cs.NE

References & Citations

  * NASA ADS
  * Google Scholar
  * Semantic Scholar

a export BibTeX citation Loading...

BibTeX formatted citation

x
[loading...          ]
Data provided by:

Bookmark

BibSonomy logo Reddit logo
(*) Bibliographic Tools

Bibliographic and Citation Tools

[ ] Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
[ ] Connected Papers Toggle
Connected Papers (What is Connected Papers?)
[ ] Litmaps Toggle
Litmaps (What is Litmaps?)
[ ] scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
( ) Code, Data, Media

Code, Data and Media Associated with this Article

[ ] alphaXiv Toggle
alphaXiv (What is alphaXiv?)
[ ] Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
[ ] DagsHub Toggle
DagsHub (What is DagsHub?)
[ ] GotitPub Toggle
Gotit.pub (What is GotitPub?)
[ ] Huggingface Toggle
Hugging Face (What is Huggingface?)
[ ] Links to Code Toggle
Papers with Code (What is Papers with Code?)
[ ] ScienceCast Toggle
ScienceCast (What is ScienceCast?)
( ) Demos

Demos

[ ] Replicate Toggle
Replicate (What is Replicate?)
[ ] Spaces Toggle
Hugging Face Spaces (What is Spaces?)
[ ] Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
( ) Related Papers

Recommenders and Search Tools

[ ] Link to Influence Flower
Influence Flower (What are Influence Flowers?)
[ ] Core recommender toggle
CORE Recommender (What is CORE?)
[ ] IArxiv recommender toggle
IArxiv Recommender (What is IArxiv?)

  * Author
  * Venue
  * Institution
  * Topic

( ) About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and
share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have
embraced and accepted our values of openness, community, excellence,
and user data privacy. arXiv is committed to these values and only
works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?
Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is
MathJax?)

  * About
  * Help

  * Click here to contact arXiv Contact
  * Click here to subscribe Subscribe

  * Copyright
  * Privacy Policy

  * Web Accessibility Assistance
  * arXiv Operational Status
    Get status notifications via email or slack