https://github.com/apple/ml-ferret

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
apple / ml-ferret Public

  * Notifications
  * Fork 41
  * Star 1.3k

License

View license
1.3k stars 41 forks Activity
Star
Notifications

  * Code
  * Pull requests 3
  * Security
  * Insights

Additional navigation options

  * Code
  * Pull requests
  * Security
  * Insights

apple/ml-ferret

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
1 branch 0 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/a]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone apple/]

    Work fast with our official CLI. Learn more about the CLI.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@Haotian-Zhang
Haotian-Zhang checkpoints release
...
262a943 Dec 15, 2023
checkpoints release
262a943

Git stats

  * 3 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
experiments
first code commit
October 30, 2023 20:44
ferret
checkpoints release
December 14, 2023 21:16
figs
first code commit
October 30, 2023 20:44
scripts
checkpoints release
December 14, 2023 21:16
CODE_OF_CONDUCT.md
first commit
October 6, 2023 14:48
CONTRIBUTING.md
first code commit
October 30, 2023 20:44
EVAL.md
first code commit
October 30, 2023 20:44
LICENSE
first commit
October 6, 2023 14:48
README.md
checkpoints release
December 14, 2023 21:16
pyproject.toml
first code commit
October 30, 2023 20:44
View code
[                    ]
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Overview Release Contents Install Train Hyperparameters Prepare
Vicuna checkpoint and LLaVA's projector FERRET Training Evaluation
Checkpoints Demo Launch a controller Launch a gradio web server.
Launch a model worker Citation Acknowledgement

README.md

 Alt text for the image Ferret: Refer and Ground Anything Anywhere at
Any Granularity

An End-to-End MLLM that Accept Any-Form Referring and Ground Anything
in Response. [Paper]

Haoxuan You*, Haotian Zhang*, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui
Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang [*: equal
contribution]

 Overview

                       [ferret_fig_diagram_v2]
                      Diagram of Ferret Model.

Key Contributions:

  * Ferret Model - Hybrid Region Representation + Spatial-aware
    Visual Sampler enable fine-grained and open-vocabulary referring
    and grounding in MLLM.
  * GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust
    ground-and-refer instruction tuning dataset.
  * Ferret-Bench - A multimodal evaluation benchmark that jointly
    requires Referring/Grounding, Semantics, Knowledge, and Reasoning
    .

 Release

  * [12/14]  We released the checkpoints(7B, 13B).
  * [10/30]  We released the code of FERRET model and Ferret-Bench.

Usage and License Notices: The data, and code is intended and
licensed for research use only. They are also restricted to uses that
follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset
is CC BY NC 4.0 (allowing only non-commercial use) and models trained
using the dataset should not be used outside of research purposes.

 Contents

  * Install
  * Train
  * Evaluation
  * Demo

 Install

 1. Clone this repository and navigate to FERRET folder

git clone https://github.com/apple/ml-ferret
cd ml-ferret

 2. Install Package

conda create -n ferret python=3.10 -y
conda activate ferret
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install pycocotools
pip install protobuf==3.20.0

 3. Install additional packages for training cases

pip install ninja
pip install flash-attn --no-build-isolation

 Train

FERRET is trained on 8 A100 GPUs with 80GB memory. To train on fewer
GPUs, you can reduce the per_device_train_batch_size and increase the
gradient_accumulation_steps accordingly. Always keep the global batch
size the same: per_device_train_batch_size x
gradient_accumulation_steps x num_gpus.

 Hyperparameters

We use a similar set of hyperparameters as LLaVA(Vicuna) in
finetuning.

Hyperparameter   Global Batch    Learning Epochs       Max     Weight
                         Size        rate           length      decay
FERRET-7B                 128        2e-5      3      2048          0
FERRET-13B                128        2e-5      3      2048          0

 Prepare Vicuna checkpoint and LLaVA's projector

Before you start, prepare our base model Vicuna, which is an
instruction-tuned chatbot. Please download its weights following the
instructions here. Vicuna v1.3 is used in FERRET.

Then download LLaVA's first-stage pre-trained projector weight (7B,
13B).

 FERRET Training

The scripts are provided (7B, 13B).

 Evaluation

Please see this doc for the details.

 Checkpoints

We extracted the delta between our pre-trained model and Vicuna.
Please first download weights of Vicuna following the previous
instruction. Then download our prepared offsets of weights: 7B, 13B
using wget or curl, and unzip the downloaded offsets. Lastly, apply
the offset to the Vicuna's weight by running the following script:

# 7B
python3 -m ferret.model.apply_delta \
    --base ./model/vicuna-7b-v1-3 \
    --target ./model/ferret-7b-v1-3 \
    --delta path/to/ferret-7b-delta
# 13B
python3 -m ferret.model.apply_delta \
    --base ./model/vicuna-13b-v1-3 \
    --target ./model/ferret-13b-v1-3 \
    --delta path/to/ferret-13b-delta

Notices: Apple's rights in the attached weight differentials are
hereby licensed under the CC-BY-NC license. Apple makes no
representations with regards to LLaMa or any other third party
software, which are subject to their own terms.

Please refer to the next section about how to set up a local demo
with pre-trained weight.

 Demo

To run our demo, you need to train FERRET and use the checkpoints
locally. Gradio web UI is used. Please run the following commands one
by one.

 Launch a controller

python -m ferret.serve.controller --host 0.0.0.0 --port 10000

 Launch a gradio web server.

python -m ferret.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --add_region_feature

 Launch a model worker

This is the worker that load the ckpt and do the inference on the
GPU. Each worker is responsible for a single model specified in
--model-path.

CUDA_VISIBLE_DEVICES=0 python -m ferret.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./checkpoints/FERRET-13B-v0 --add_region_feature

Wait until the process finishes loading the model and you see
"Uvicorn running on ...". Now, refresh your Gradio web UI, and you
will see the model you just launched in the model list.

                            [ferret_demo]
                 Example of Ferret Interactive Demo.

 Citation

If you find Ferret useful, please cite using this BibTeX:

@article{you2023ferret,
  title={Ferret: Refer and Ground Anything Anywhere at Any Granularity},
  author={You, Haoxuan and Zhang, Haotian and Gan, Zhe and Du, Xianzhi and Zhang, Bowen and Wang, Zirui and Cao, Liangliang and Chang, Shih-Fu and Yang, Yinfei},
  journal={arXiv preprint arXiv:2310.07704},
  year={2023}
}

 Acknowledgement

  * LLaVA: the codebase we built upon.
  * Vicuna: the LLM codebase.

About

No description, website, or topics provided.

Resources

Readme

License

View license

Code of conduct

Code of conduct
Activity

Stars

1.3k stars

Watchers

43 watching

Forks

41 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Languages

  * Python 97.4%
  * Shell 2.6%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.