https://ai.google.dev/gemma/docs/paligemma

Gemma

  * 

Gemini API Gemma Google AI Edge Tools  

  *  
    Google AI Studio
  *  
    Colab
  *  
    JAX
  *  
    Keras
  *  
    TensorFlow
  *  
    Vertex AI

Community
[                    ]
  * English
  * Deutsch
  * Espanol - America Latina
  * Francais
  * Indonesia
  * Italiano
  * Polski
  * Portugues - Brasil
  * Tieng Viet
  * Turkce
  * Russkii
  * `bryt
  * l`rbyW@
  * frsy
  * hiNdii
  * baaNlaa
  * phaasaaaithy
  * Zhong Wen  - Jian Ti 
  * Zhong Wen  - Fan Ti 
  * Ri Ben Yu 
  * hangugeo

Sign in
Overview Docs
[                    ]
Gemma

  * 

  * Gemini API
  * Gemma
      + Overview
      + Docs
  * Google AI Edge
  * Tools
      + More
  * Community

  * Overview
  * Model Card
  * Gemma model family
  * CodeGemma
      + Overview
      + Model card
      + Quickstart
      + AI Assisted coding with CodeGemma
  * PaliGemma
      + Overview
      + Model card
      + Fine-tuning PaliGemma with JAX
  * RecurrentGemma
      + Overview
      + Model card
      + Releases
  * Guides
  * Gemma setup
  * Get started with Gemma using Keras
  * Basic tuning with Gemma using Keras
  * Distributed tuning with Gemma using Keras
  * Get started with Gemma using PyTorch
  * Chat with Gemma
  * Formatting and system instructions
  * Gemma C++ Tutorial
  * Inference using JAX and Flax
  * Fine-tuning using JAX and Flax
  * Integrations
  * Vertex AI
  * Dataflow ML
  * Google Kubernetes Engine (GKE)
  * LangChain
  * Responsible Generative AI Toolkit
  * Overview
  * Set responsible policies
  * Tune models for safety
  * Create input and output safeguards
  * Conduct safety evaluations
  * Build transparency artifacts
  * Analyze model behavior
  * Community
  * Discord
  * Legal
  * Terms of use
  * Prohibited use

  * Google AI Studio
  * Colab
  * JAX
  * Keras
  * TensorFlow
  * Vertex AI

  * Home
  * 
    Gemma
  * 
    Docs

Send feedback Stay organized with collections Save and categorize
content based on your preferences.

PaliGemma

PaliGemma is a lightweight open vision-language model (VLM) inspired
by PaLI-3, and based on open components like the SigLIP vision model
and the Gemma language model. PaliGemma takes both images and text as
inputs and can answer questions about images with detail and context,
meaning that PaliGemma can perform deeper analysis of images and
provide useful insights, such as captioning for images and short
videos, object detection, and reading text embedded within images.

There are two sets of PaliGemma models, a general purpose set and a
research-oriented set:

  * PaliGemma - General purpose pretrained models that can be
    fine-tuned on a variety of tasks.
  * PaliGemma-FT - Research-oriented models that are fine-tuned on
    specific research datasets.

Important: Most PaliGemma models require tuning in order to produce
useful results, except for the paligemma-3b-mix variant. Make sure
you perform fine-tuning on these models and test the output before
deploying them to end users.

Key benefits include:

  * multiple_stop

    Multimodal comprehension

    Simultaneously understands both images and text.
  * build

    Versatile base model

    Can be fine-tuned on a wide range of vision-language tasks.
  * explore

    Off-the-shelf exploration

    Comes with a checkpoint fine-tuned on on a mixture of tasks for
    immediate research use.

Learn more

View the model card

PaliGemma's model card contains detailed information about the model,
implementation information, evaluation information, model usage and
limitations, and more.

View on Kaggle

View more code, Colab notebooks, information, and discussions about
PaliGemma on Kaggle.

Run in Colab

Run a working example for fine-tuning PaliGemma with JAX in Colab.

Except as otherwise noted, the content of this page is licensed under
the Creative Commons Attribution 4.0 License, and code samples are
licensed under the Apache 2.0 License. For details, see the Google
Developers Site Policies. Java is a registered trademark of Oracle
and/or its affiliates.

Last updated 2024-05-14 UTC.

[{ "type": "thumb-down", "id": "missingTheInformationINeed",
"label":"Missing the information I need" },{ "type": "thumb-down",
"id": "tooComplicatedTooManySteps", "label":"Too complicated / too
many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out
of date" },{ "type": "thumb-down", "id": "samplesCodeIssue",
"label":"Samples / code issue" },{ "type": "thumb-down", "id":
"otherDown", "label":"Other" }] [{ "type": "thumb-up", "id":
"easyToUnderstand", "label":"Easy to understand" },{ "type":
"thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{
"type": "thumb-up", "id": "otherUp", "label":"Other" }] Need to tell
us more?

  * Terms
  * Privacy
  * Manage cookies

  * English
  * Deutsch
  * Espanol - America Latina
  * Francais
  * Indonesia
  * Italiano
  * Polski
  * Portugues - Brasil
  * Tieng Viet
  * Turkce
  * Russkii
  * `bryt
  * l`rbyW@
  * frsy
  * hiNdii
  * baaNlaa
  * phaasaaaithy
  * Zhong Wen  - Jian Ti 
  * Zhong Wen  - Fan Ti 
  * Ri Ben Yu 
  * hangugeo