https://github.com/epfLLM/meditron Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} epfLLM / meditron Public * Notifications * Fork 2 * Star 119 Meditron is a suite of open-source medical Large Language Models (LLMs). huggingface.co/epfl-llm License Apache-2.0 license 119 stars 2 forks Activity Star Notifications * Code * Issues 0 * Pull requests 1 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights epfLLM/meditron This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 2 branches 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/e] Use Git or checkout with SVN using the web URL. [gh repo clone epfLLM] Work fast with our official CLI. Learn more about the CLI. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @AGBonnet AGBonnet Update README.md ... 084fe59 Nov 28, 2023 Update README.md 084fe59 Git stats * 46 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time BetterChatGPT @ 28c0b88 add submodule FastChat, Megatron-LLM, and BetterChatGPT November 23, 2023 21:24 FastChat @ a754c48 add submodule FastChat, Megatron-LLM, and BetterChatGPT November 23, 2023 21:24 Megatron-LLM @ 01fa877 add submodule FastChat, Megatron-LLM, and BetterChatGPT November 23, 2023 21:24 deployment add UI api update example November 23, 2023 21:47 evaluation Merge pull request #2 from epfLLM/ft-preprocessing November 27, 2023 18:02 figures Added data figure November 28, 2023 11:47 finetuning Finetuning docs (#1) November 27, 2023 17:08 gap-replay Update README.md November 28, 2023 18:02 paper upload paper to github November 28, 2023 02:45 pretrain add pretrain script November 24, 2023 22:26 .gitmodules add submodule FastChat, Megatron-LLM, and BetterChatGPT November 23, 2023 21:24 LICENSE initial commit of meditron's public release November 23, 2023 20:08 README.md Update README.md November 28, 2023 14:36 requirements.txt Update requirements.txt November 23, 2023 21:38 View code [ ] Model Details How to use Medical Training Data Download instructions Training Procedure Training Hyperparameters (7B) Training Hyperparameters (70B) Supervised Finetuning Finetuning Hyperparameters Uses Downstream Use Medical Benchmark Inference & Evaluation Requirements Model Deployment Citation README.md MediTron logo Meditron is a suite of open-source medical Large Language Models (LLMs). We release Meditron-7B and Meditron-70B, which are adapted to the medical domain from Llama-2 through continued pretraining on a comprehensively curated medical corpus, including selected PubMed papers and abstracts, a new dataset of internationally-recognized medical guidelines, and a general domain corpus. Meditron-70B, finetuned on relevant data, outperforms Llama-2-70B, GPT-3.5 and Flan-PaLM on multiple medical reasoning tasks. Advisory Notice While Meditron is designed to encode medical knowledge from sources of high-quality evidence, it is not yet adapted to deliver this knowledge appropriately, safely, or within professional actionable constraints. We recommend against using Meditron in medical applications without extensive use-case alignment, as well as additional testing, specifically including randomized controlled trials in real-world practice settings. Model Details * Developed by: EPFL LLM Team * Model type: Causal decoder-only transformer language model * Language(s): English (mainly) * Model License: LLAMA 2 COMMUNITY LICENSE AGREEMENT * Code License: APACHE 2.0 LICENSE * Continue-pretrained from model: Llama-2-70B * Context length: 4k tokens * Input: Text only data * Output: Model generates text only * Status: This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we enhance model's performance. * Knowledge Cutoff: August 2023 * Trainer: epflLLM/Megatron-LLM * Paper: Meditron-70B: Scaling Medical Pretraining for Large Language Models How to use You can load Meditron model directly from the HuggingFace model hub as follows: from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("epfl-llm/meditron-70B") model = AutoModelForCausalLM.from_pretrained("epfl-llm/meditron-70B") Pipeline Medical Training Data We release code to download and pre-process the data used to train Meditron. MediTron's domain-adaptive pre-training corpus GAP-Replay combines 48.1B tokens from four corpora: * Clinical Guidelines: a new corpus of 46K clinical practice guidelines from various healthcare-related sources, including hospitals and international organizations, * Paper Abstracts: 16.1M abstracts extracted from closed-access PubMed and PubMed Central papers, * Medical Papers: full-text articles extracted from 5M publicly available PubMed and PubMed Central papers. * Replay dataset: 400M tokens of general domain pretraining data sampled from RedPajama-v1. Download instructions You can download and pre-process the entire GAP-Replay corpus by running ./download.sh in the gap-replay folder. You can download 36K open-access articles from our Guidelines corpus from the HuggingFace datasets hub. from datasets import load_dataset dataset = load_dataset("epfl-llm/guidelines") You can scrape and clean all 46K guidelines (including closed-access sources) by running ./download.sh in the guidelines folder. More details can be found in the GAP-Replay documentation. Training Procedure We used the Megatron-LLM distributed training library, a derivative of Nvidia's Megatron LM project, to optimize training efficiency. Hardware consists of 16 nodes of 8x NVIDIA A100 (80GB) SXM GPUs connected by NVLink and NVSwitch with a single Nvidia ConnectX-6 DX network card and equipped with 2 x AMD EPYC 7543 32-Core Processors and 512 GB of RAM. The nodes are connected via RDMA over Converged Ethernet. Our three-way parallelism scheme uses the following: * Data Parallelism (DP -- different GPUs process different subsets of the batches) of 2, * Pipeline Parallelism (PP -- different GPUs process different layers) of 8, * Tensor Parallelism (TP -- different GPUs process different subtensors for matrix multiplication) of 8. Training Hyperparameters (7B) bf16 true lr 3e-4 eps 1e-5 betas [0.9, 0.95] clip_grad 1 weight decay 0.1 DP size 16 TP size 4 PP size 1 seq length 2048 lr scheduler cosine min lr 1e-6 warmup iteration 2000 micro batch size 10 global batch size 1600 Training Hyperparameters (70B) bf16 true lr 1.5e-4 eps 1e-5 betas [0.9, 0.95] clip_grad 1 weight decay 0.1 DP size 2 TP size 8 PP size 8 seq length 4096 lr scheduler cosine min lr 1e-6 warmup iteration 2000 micro batch size 2 global batch size 512 You can see the script we used to pretrain our models through Megatron-LLM here: finetune.sh Supervised Finetuning We again used the Megatron-LLM distributed training library for supervised finetuning (sinlge-node and multi-node). We made a file, sft.py, that automatically handles the tokenization and finetuning process through Megatron-LLM. To start a multi-node finetuning process, here is an example: cd finetuning python sft.py \ --checkpoint=baseline \ --size=70 \ --run_name=cotmedqa \ --data /pure-mlo-scratch/zechen/meditron/benchmarks/ft_preprocessed/medqa_cot_train.jsonl \ --val /pure-mlo-scratch/zechen/meditron/benchmarks/ft_preprocessed/medqa_cot_validation.jsonl \ --micro_batch=4 --nodes=4 \ --addr= \ --save_interval=200 \ --pp=4 \ --seq 4096 \ --rank= Run the above line of code at node rank-0, rank-1, rank-2, and rank3 to start a 4-node finetuning process. Important!: Make sure to have the proper paths defined in sft.py and finetune_sft.sh. Finetuning Hyperparameters bf16 true lr 2e-5 eps 1e-5 betas [0.9, 0.95] clip_grad 1 weight decay 0.1 DP size 16 TP size 4 PP size 1 seq length 2048 or 4096 lr scheduler cosine min lr 2e-6 warmup ratio 0.1 added tokens [<|im_start|>, <|im_end|>] Uses Meditron-70B is being made available for further testing and assessment as an AI assistant to enhance clinical decision-making and democratize access to an LLM for healthcare use. Potential use cases may include but are not limited to: * Medical exam question answering * Supporting differential diagnosis * Disease information (symptoms, cause, treatment) query * General health information query It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. It should not be used directly for production or work that may impact people. We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise. Downstream Use Meditron-70B is a foundation model that can be finetuned, instruction-tuned, or RLHF-tuned for specific downstream tasks and applications. The main way we have used this model is finetuning for downstream question-answering tasks, but we encourage using this model for additional applications. Specific formatting needs to be followed to prompt our finetuned models, including the <|im_start|>, <|im_end|> tags, and system, question, answer identifiers. """ <|im_start|>system {system_message}<|im_end|> <|im_start|>question {prompt}<|im_end|> <|im_start|>answer """ Note: the above formatting is not a requirement if you use your own formatting option for the finetuning of the model. Medical Benchmark Inference & Evaluation Requirements Before you start, please install the necessary packages: vllm >= 0.2.1 transformers >= 4.34.0 datasets >= 2.14.6 torch >= 2.0.1 For detailed instructions to run inference and evaluation with medical benchmarks, please read the documentation here inference & evaluation instructions. Model Deployment For detailed instructions to deploy meditron models and have an interactive chat session, please read the documentation here Model Deployment Citation If you use this software or our paper, please cite them: @misc{chen2023meditron70b, title={MEDITRON-70B: Scaling Medical Pretraining for Large Language Models}, author={Zeming Chen and Alejandro Hernandez-Cano and Angelika Romanou and Antoine Bonnet and Kyle Matoba and Francesco Salvi and Matteo Pagliardini and Simin Fan and Andreas Kopf and Amirkeivan Mohtashami and Alexandre Sallinen and Alireza Sakhaeirad and Vinitra Swamy and Igor Krawczuk and Deniz Bayazit and Axel Marmet and Syrielle Montariol and Mary-Anne Hartley and Martin Jaggi and Antoine Bosselut}, year={2023}, eprint={2311.16079}, archivePrefix={arXiv}, primaryClass={cs.CL} } @software{epfmedtrn, author = {Zeming Chen and Alejandro Hernandez-Cano and Angelika Romanou and Antoine Bonnet and Kyle Matoba and Francesco Salvi and Matteo Pagliardini and Simin Fan and Andreas Kopf and Amirkeivan Mohtashami and Alexandre Sallinen and Alireza Sakhaeirad and Vinitra Swamy and Igor Krawczuk and Deniz Bayazit and Axel Marmet and Syrielle Montariol and Mary-Anne Hartley and Martin Jaggi and Antoine Bosselut}, title = {MediTron-70B: Scaling Medical Pretraining for Large Language Models}, month = November, year = 2023, url = {https://github.com/epfLLM/meditron} } About Meditron is a suite of open-source medical Large Language Models (LLMs). huggingface.co/epfl-llm Resources Readme License Apache-2.0 license Activity Stars 119 stars Watchers 9 watching Forks 2 forks Report repository Releases No releases published Packages 0 No packages published Contributors 11 * @AGBonnet * @eric11eca * @frasalvi * @agromanou * @AleHD * @smontariol * @martinjaggi * @kylematoba * @vinitra * @lighthea * @alirezasakhaei Languages * Python 82.9% * Shell 9.1% * TypeScript 8.0% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact You can't perform that action at this time.