https://github.com/AviSoori1x/makeMoE Skip to content Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} AviSoori1x / makeMoE Public * Notifications * Fork 34 * Star 415 * From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :) License MIT license 415 stars 34 forks Branches Tags Activity Star Notifications * Code * Issues 0 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights AviSoori1x/makeMoE This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Last Last Name Name commit commit message date Latest commit History 111 Commits images images LICENSE LICENSE README.md README.md input.txt input.txt makeMoE.py makeMoE.py makeMoE_Concise.ipynb makeMoE_Concise.ipynb makeMoE_from_Scratch.ipynb makeMoE_from_Scratch.ipynb makeMoE_from_Scratch_with_Expert_Capacity.ipynb makeMoE_from_Scratch_with_Expert_Capacity.ipynb View all files Repository files navigation * README * MIT license makeMoE [makemoelogo] [databr] Developed using Databricks with [?] Sparse mixture of experts language model from scratch inspired by (and largely based on) Andrej Karpathy's makemore (https://github.com /karpathy/makemore) :) HuggingFace Community Blog that walks through this: https:// huggingface.co/blog/AviSoori1x/makemoe-from-scratch Part #2 detailing expert capacity: https://huggingface.co/blog/ AviSoori1x/makemoe2 This is an implementation of a sparse mixture of experts language model from scratch. This is inspired by and largely based on Andrej Karpathy's project 'makemore' and borrows the re-usable components from that implementation. Just like makemore, makeMoE is also an autoregressive character-level language model but uses the aforementioned sparse mixture of experts architecture. Just like makemore, pytorch is the only requirement (so I hope the from scratch claim is justified). Significant Changes from the makemore architecture * Sparse mixture of experts instead of the solitary feed forward neural net. * Top-k gating and noisy top-k gating implementations. * initialization - Kaiming He initialization used here but the point of this notebook is to be hackable so you can swap in Xavier Glorot etc. and take it for a spin. * Expert Capacity -- most recent update (03/18/2024) Unchanged from makemore * The dataset, preprocessing (tokenization), and the language modeling task Andrej chose originally - generate Shakespeare-like text * Casusal self attention implementation * Training loop * Inference logic Publications heavily referenced for this implementation: * Outrageosly Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts layer: https://arxiv.org/pdf/1701.06538.pdf * Mixtral of experts: https://arxiv.org/pdf/2401.04088.pdf makeMoE.py is the entirety of the implementation in a single file of pytorch. makMoE_from_Scratch.ipynb walks through the intuition for the entire model architecture and how everything comes together. I recommend starting here. makeMoE_Concise.ipynb is the consolidated hackable implementation that I encourage you to hack, understand, improve and make your own The code was entirely developed on Databricks using a single A100 for compute. If you're running this on Databricks, you can scale this on an arbitrarily large GPU cluster with no issues, on the cloud provider of your choice. I chose to use MLFlow (which comes pre-installed in Databricks. It's fully open source and you can pip install easily elsewhere) as I find it helpful to track and log all the metrics necessary. This is entirely optional but encouraged. Please note that the implementation emphasizes readability and hackability vs. performance, so there are many ways in which you could improve this. Please try and let me know! Hope you find this useful. Happy hacking!! About From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :) Resources Readme License MIT license Activity Stars 415 stars Watchers 7 watching Forks 34 forks Report repository Releases No releases published Packages 0 No packages published Contributors 2 * @AviSoori1x AviSoori1x Avinash Sooriyarachchi * @StudyingLover StudyingLover studyinglover Languages * Jupyter Notebook 96.2% * Python 3.8% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.