https://arxiv.org/abs/2107.07467

close this message

Donate to arXiv

Please join the Simons Foundation and our generous member
organizations in supporting arXiv during our giving campaign
September 23-27. 100% of your contribution will fund improvements and
new initiatives to benefit arXiv's global scientific community.

DONATE

[secure site, no need to create account]

Skip to main content
Cornell University
We gratefully acknowledge support from
the Simons Foundation and member institutions.
 
arXiv.org > cs > arXiv:2107.07467
[                    ]

Help | Advanced Search

[All fields        ]
Search
arXiv
Cornell University Logo
[                    ] GO
quick links

  * Login
  * Help Pages
  * About

Computer Science > Machine Learning

arXiv:2107.07467 (cs)
[Submitted on 15 Jul 2021]

Title:Only Train Once: A One-Shot Neural Network Training And Pruning
Framework

Authors:Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang,
Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu
Download PDF

    Abstract: Structured pruning is a commonly used technique in
    deploying deep neural networks (DNNs) onto resource-constrained
    devices. However, the existing pruning methods are usually
    heuristic, task-specified, and require an extra fine-tuning
    procedure. To overcome these limitations, we propose a framework
    that compresses DNNs into slimmer architectures with competitive
    performances and significant FLOPs reductions by Only-Train-Once
    (OTO). OTO contains two keys: (i) we partition the parameters of
    DNNs into zero-invariant groups, enabling us to prune zero groups
    without affecting the output; and (ii) to promote zero groups, we
    then formulate a structured-sparsity optimization problem and
    propose a novel optimization algorithm, Half-Space Stochastic
    Projected Gradient (HSPG), to solve it, which outperforms the
    standard proximal methods on group sparsity exploration and
    maintains comparable convergence. To demonstrate the
    effectiveness of OTO, we train and compress full models
    simultaneously from scratch without fine-tuning for inference
    speedup and parameter reduction, and achieve state-of-the-art
    results on VGG16 for CIFAR10, ResNet50 for CIFAR10/ImageNet and
    Bert for SQuAD.

Comments: Under Review
Subjects: Machine Learning (cs.LG)
Cite as:  arXiv:2107.07467 [cs.LG]
          (or arXiv:2107.07467v1 [cs.LG] for this version)

Submission history

From: Tianyi Chen [view email]
[v1] Thu, 15 Jul 2021 17:15:20 UTC (2,413 KB)
Full-text links:

Download:

  * PDF
  * Other formats

[by-4]
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2107
Change to browse by:
cs

References & Citations

  * NASA ADS
  * Google Scholar
  * Semantic Scholar

a export bibtex citation Loading...

Bibtex formatted citation

x
[loading...          ]
Data provided by:

Bookmark

BibSonomy logo Mendeley logo Reddit logo ScienceWISE logo
(*) Bibliographic Tools

Bibliographic and Citation Tools

[ ] Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
[ ] Litmaps Toggle
Litmaps (What is Litmaps?)
( ) Code & Data

Code and Data Associated with this Article

[ ] arXiv Links to Code Toggle
arXiv Links to Code & Data (What is Links to Code & Data?)
( ) Related Papers

Recommenders and Search Tools

[ ] Connected Papers Toggle
Connected Papers (What is Connected Papers?)
[ ] Core recommender toggle
CORE Recommender (What is CORE?)
( ) About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and
share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have
embraced and accepted our values of openness, community, excellence,
and user data privacy. arXiv is committed to these values and only
works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?
Learn more about arXivLabs and how to get involved.

Which authors of this paper are endorsers? | Disable MathJax (What is
MathJax?)

  * About
  * Help

  * Click here to contact arXiv Contact
  * Click here to subscribe Subscribe

  * Copyright
  * Privacy Policy

  * Web Accessibility Assistance
  * arXiv Operational Status
    Get status notifications via email or slack