https://github.com/AviSoori1x/makeMoE

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
AviSoori1x / makeMoE Public

  * Notifications
  * Fork 34
  * Star 415
  * 

From scratch implementation of a sparse mixture of experts language
model inspired by Andrej Karpathy's makemore :)

License

MIT license
415 stars 34 forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 0
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

AviSoori1x/makeMoE

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

                                                                                                 Last    Last
                     Name                                            Name                       commit  commit
                                                                                                message  date
Latest commit

 

History

111 Commits
 
images                                          images                                                   

LICENSE                                         LICENSE                                                  

README.md                                       README.md                                                

input.txt                                       input.txt                                                

makeMoE.py                                      makeMoE.py                                               

makeMoE_Concise.ipynb                           makeMoE_Concise.ipynb                                    

makeMoE_from_Scratch.ipynb                      makeMoE_from_Scratch.ipynb                               

makeMoE_from_Scratch_with_Expert_Capacity.ipynb makeMoE_from_Scratch_with_Expert_Capacity.ipynb          

View all files

Repository files navigation

  * README
  * MIT license

makeMoE

 
                            [makemoelogo]
[databr]
Developed using Databricks with [?]

Sparse mixture of experts language model from scratch inspired by
(and largely based on) Andrej Karpathy's makemore (https://github.com
/karpathy/makemore) :)

 

HuggingFace Community Blog that walks through this: https://
huggingface.co/blog/AviSoori1x/makemoe-from-scratch

Part #2 detailing expert capacity: https://huggingface.co/blog/
AviSoori1x/makemoe2

This is an implementation of a sparse mixture of experts language
model from scratch. This is inspired by and largely based on Andrej
Karpathy's project 'makemore' and borrows the re-usable components
from that implementation. Just like makemore, makeMoE is also an
autoregressive character-level language model but uses the
aforementioned sparse mixture of experts architecture.

Just like makemore, pytorch is the only requirement (so I hope the
from scratch claim is justified).

Significant Changes from the makemore architecture

  * Sparse mixture of experts instead of the solitary feed forward
    neural net.
  * Top-k gating and noisy top-k gating implementations.
  * initialization - Kaiming He initialization used here but the
    point of this notebook is to be hackable so you can swap in
    Xavier Glorot etc. and take it for a spin.
  * Expert Capacity -- most recent update (03/18/2024)

Unchanged from makemore

  * The dataset, preprocessing (tokenization), and the language
    modeling task Andrej chose originally - generate Shakespeare-like
    text
  * Casusal self attention implementation
  * Training loop
  * Inference logic

Publications heavily referenced for this implementation:

  * Outrageosly Large Neural Networks: The Sparsely-Gated
    Mixture-Of-Experts layer: https://arxiv.org/pdf/1701.06538.pdf
  * Mixtral of experts: https://arxiv.org/pdf/2401.04088.pdf

makeMoE.py is the entirety of the implementation in a single file of
pytorch.

makMoE_from_Scratch.ipynb walks through the intuition for the entire
model architecture and how everything comes together. I recommend
starting here.

makeMoE_Concise.ipynb is the consolidated hackable implementation
that I encourage you to hack, understand, improve and make your own

The code was entirely developed on Databricks using a single A100 for
compute. If you're running this on Databricks, you can scale this on
an arbitrarily large GPU cluster with no issues, on the cloud
provider of your choice.

I chose to use MLFlow (which comes pre-installed in Databricks. It's
fully open source and you can pip install easily elsewhere) as I find
it helpful to track and log all the metrics necessary. This is
entirely optional but encouraged.

Please note that the implementation emphasizes readability and
hackability vs. performance, so there are many ways in which you
could improve this. Please try and let me know!

Hope you find this useful. Happy hacking!!

About

From scratch implementation of a sparse mixture of experts language
model inspired by Andrej Karpathy's makemore :)

Resources

Readme

License

MIT license
Activity

Stars

415 stars

Watchers

7 watching

Forks

34 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Contributors 2

  * @AviSoori1x AviSoori1x Avinash Sooriyarachchi
  * @StudyingLover StudyingLover studyinglover

Languages

  * Jupyter Notebook 96.2%
  * Python 3.8%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.