https://github.com/yandex/YaFSDP

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
yandex / YaFSDP Public

  * Notifications You must be signed in to change notification
    settings
  * Fork 29
  * Star 601
  * 

YaFSDP: Yet another Fully Sharded Data Parallel

License

Apache-2.0 license
601 stars 29 forks Branches Tags Activity
Star
Notifications You must be signed in to change notification settings

  * Code
  * Issues 1
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

yandex/YaFSDP

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

       Name                Name          Last commit     Last commit
                                           message          date
Latest commit

 

History

15 Commits
 
assets              assets                               
docker              docker                               
examples            examples                             
patches             patches                              
ya_fsdp             ya_fsdp                              
.dockerignore       .dockerignore                        
.markdownlint.json  .markdownlint.json                   
CITATION.cff        CITATION.cff                         
LICENSE             LICENSE                              
README.md           README.md                            
pyproject.toml      pyproject.toml                       
View all files

Repository files navigation

  * README
  * Apache-2.0 license

YaFSDP

 
                  [yafsdp_logo] [yafsdp_logo_white]
 

  * Overview
  * Advantages over FSDP
  * Examples
  * Issues and questions
  * Citation

Overview

 

YaFSDP is a Sharded Data Parallelism framework, designed to work well
with transformer-like neural network architectures.

You can find more info on YaFSDP internals in our blog posts on
Medium and Habr.

Advantages over FSDP

 

YaFSDP is up to 20% faster for pre-training LLMs and performs better
in high memory pressure conditions. It is designed to reduce
communications and memory operations overhead.

YaFSDP:

ya_fsdp

FSDP:

fsdp

Benchmarks

 

We've compared YaFSDP with FSDP on a variety of pre-training setups
ranging from:

  * 7B to 70B parameters
  * 64 to 256 devices
  * 2048 to 8192 tokens per sequence

                                                    YaFSDP       FSDP
model gpu-count seq-len num-ckpt-layers speedup  iteration  iteration
                                                  time (s)   time (s)
Llama        64    2048               0   9.92%       0.81       0.90
2 7B
Llama        64    4096               0   3.43%       1.16       1.21
2 7B
Llama        64    8192               0   2.68%       2.23       2.29
2 7B
Llama       128    2048               0   9.57%       0.87       0.97
2 7B
Llama       128    4096               0   2.42%       1.19       1.22
2 7B
Llama       128    8192               0   2.32%       2.25       2.31
2 7B
Llama       128    2048               0  12.10%       1.55       1.76
2 13B
Llama       128    4096               0   3.49%       2.06       2.14
2 13B
Llama       128    2048               0  20.70%       3.39       4.27
2 34B
Llama       256    2048               0  21.99%       3.51       4.50
2 34B
Llama       256    4096               5   8.35%       5.33       5.81
2 34B
Llama       256    2048              10  21.48%       6.97       8.87
2 70B
Llama       256    4096              50   7.17%      11.07      11.93
2 70B
Llama        64    2048               0  11.91%       0.97       1.10
3 8B
Llama        64    4096               0   7.86%       1.36       1.48
3 8B
Llama       256    2048              20  26.60%       7.17       9.76
3 70B

Details:

  * In each run per-device batch size is set to 1.
  * speedup represents relative iteration time decrease between
    YaFSDP and FSDP runs.
  * num-ckpt-layers refers to the number of transformer layers to
    which activation checkpointing was applied.
  * Performance was measured using a cluster of hosts with A100 80 GB
    GPUs.

Examples

 

You can find examples of LLM training using  stack in the examples
folder:

 1. clm.md for causal pre-training
 2. sft.md for supervised fine-tuning

Notice that both examples require a Docker image, which can be built
using docker/build.sh script. The image is based on the NVIDIA
PyTorch image with some patched  libraries. Patches for the
libraries can be found in the patches folder.

Issues and questions

 

If you encounter any bugs of have any questions feel free to open a
GitHub issue.

Citation

 

If you use this codebase, please cite it by using the following
BibTeX entry:

@misc{YaFSDP2024,
  author =       {Mikhail Khrushchev and Anton Frolov and Ruslan Vasilev},
  title =        {YaFSDP: Yet another Fully Sharded Data Parallel},
  howpublished = {\url{https://github.com/yandex/YaFSDP}},
  year =         {2024}
}

About

YaFSDP: Yet another Fully Sharded Data Parallel

Resources

Readme

License

Apache-2.0 license
Activity
Custom properties

Stars

601 stars

Watchers

14 watching

Forks

29 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Contributors 4

  * 
     
  * 
     
  * 
     
  * 
     

Languages

  * Python 97.9%
  * Dockerfile 1.8%
  * Shell 0.3%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.