https://github.com/apple/ml-ferret Skip to content Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} apple / ml-ferret Public * Notifications * Fork 41 * Star 1.3k License View license 1.3k stars 41 forks Activity Star Notifications * Code * Pull requests 3 * Security * Insights Additional navigation options * Code * Pull requests * Security * Insights apple/ml-ferret This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 1 branch 0 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/a] Use Git or checkout with SVN using the web URL. [gh repo clone apple/] Work fast with our official CLI. Learn more about the CLI. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @Haotian-Zhang Haotian-Zhang checkpoints release ... 262a943 Dec 15, 2023 checkpoints release 262a943 Git stats * 3 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time experiments first code commit October 30, 2023 20:44 ferret checkpoints release December 14, 2023 21:16 figs first code commit October 30, 2023 20:44 scripts checkpoints release December 14, 2023 21:16 CODE_OF_CONDUCT.md first commit October 6, 2023 14:48 CONTRIBUTING.md first code commit October 30, 2023 20:44 EVAL.md first code commit October 30, 2023 20:44 LICENSE first commit October 6, 2023 14:48 README.md checkpoints release December 14, 2023 21:16 pyproject.toml first code commit October 30, 2023 20:44 View code [ ] Ferret: Refer and Ground Anything Anywhere at Any Granularity Overview Release Contents Install Train Hyperparameters Prepare Vicuna checkpoint and LLaVA's projector FERRET Training Evaluation Checkpoints Demo Launch a controller Launch a gradio web server. Launch a model worker Citation Acknowledgement README.md Alt text for the image Ferret: Refer and Ground Anything Anywhere at Any Granularity An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response. [Paper] Haoxuan You*, Haotian Zhang*, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang [*: equal contribution] Overview [ferret_fig_diagram_v2] Diagram of Ferret Model. Key Contributions: * Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM. * GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset. * Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning . Release * [12/14] We released the checkpoints(7B, 13B). * [10/30] We released the code of FERRET model and Ferret-Bench. Usage and License Notices: The data, and code is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. Contents * Install * Train * Evaluation * Demo Install 1. Clone this repository and navigate to FERRET folder git clone https://github.com/apple/ml-ferret cd ml-ferret 2. Install Package conda create -n ferret python=3.10 -y conda activate ferret pip install --upgrade pip # enable PEP 660 support pip install -e . pip install pycocotools pip install protobuf==3.20.0 3. Install additional packages for training cases pip install ninja pip install flash-attn --no-build-isolation Train FERRET is trained on 8 A100 GPUs with 80GB memory. To train on fewer GPUs, you can reduce the per_device_train_batch_size and increase the gradient_accumulation_steps accordingly. Always keep the global batch size the same: per_device_train_batch_size x gradient_accumulation_steps x num_gpus. Hyperparameters We use a similar set of hyperparameters as LLaVA(Vicuna) in finetuning. Hyperparameter Global Batch Learning Epochs Max Weight Size rate length decay FERRET-7B 128 2e-5 3 2048 0 FERRET-13B 128 2e-5 3 2048 0 Prepare Vicuna checkpoint and LLaVA's projector Before you start, prepare our base model Vicuna, which is an instruction-tuned chatbot. Please download its weights following the instructions here. Vicuna v1.3 is used in FERRET. Then download LLaVA's first-stage pre-trained projector weight (7B, 13B). FERRET Training The scripts are provided (7B, 13B). Evaluation Please see this doc for the details. Checkpoints We extracted the delta between our pre-trained model and Vicuna. Please first download weights of Vicuna following the previous instruction. Then download our prepared offsets of weights: 7B, 13B using wget or curl, and unzip the downloaded offsets. Lastly, apply the offset to the Vicuna's weight by running the following script: # 7B python3 -m ferret.model.apply_delta \ --base ./model/vicuna-7b-v1-3 \ --target ./model/ferret-7b-v1-3 \ --delta path/to/ferret-7b-delta # 13B python3 -m ferret.model.apply_delta \ --base ./model/vicuna-13b-v1-3 \ --target ./model/ferret-13b-v1-3 \ --delta path/to/ferret-13b-delta Notices: Apple's rights in the attached weight differentials are hereby licensed under the CC-BY-NC license. Apple makes no representations with regards to LLaMa or any other third party software, which are subject to their own terms. Please refer to the next section about how to set up a local demo with pre-trained weight. Demo To run our demo, you need to train FERRET and use the checkpoints locally. Gradio web UI is used. Please run the following commands one by one. Launch a controller python -m ferret.serve.controller --host 0.0.0.0 --port 10000 Launch a gradio web server. python -m ferret.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --add_region_feature Launch a model worker This is the worker that load the ckpt and do the inference on the GPU. Each worker is responsible for a single model specified in --model-path. CUDA_VISIBLE_DEVICES=0 python -m ferret.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./checkpoints/FERRET-13B-v0 --add_region_feature Wait until the process finishes loading the model and you see "Uvicorn running on ...". Now, refresh your Gradio web UI, and you will see the model you just launched in the model list. [ferret_demo] Example of Ferret Interactive Demo. Citation If you find Ferret useful, please cite using this BibTeX: @article{you2023ferret, title={Ferret: Refer and Ground Anything Anywhere at Any Granularity}, author={You, Haoxuan and Zhang, Haotian and Gan, Zhe and Du, Xianzhi and Zhang, Bowen and Wang, Zirui and Cao, Liangliang and Chang, Shih-Fu and Yang, Yinfei}, journal={arXiv preprint arXiv:2310.07704}, year={2023} } Acknowledgement * LLaVA: the codebase we built upon. * Vicuna: the LLM codebase. About No description, website, or topics provided. Resources Readme License View license Code of conduct Code of conduct Activity Stars 1.3k stars Watchers 43 watching Forks 41 forks Report repository Releases No releases published Packages 0 No packages published Languages * Python 97.4% * Shell 2.6% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.