https://viper.cs.columbia.edu/

ViperGPT: Visual Inference via Python Execution for Reasoning

Didac Suris^*, Sachit Menon^*, Carl Vondrick
^*Equal contribution
Columbia University
Paper arXiv Code

ViperGPT decomposes visual queries into interpretable steps.

Abstract

Answering visual queries is a complex task that requires both visual
processing and reasoning. End-to-end models, the dominant approach
for this task, do not explicitly differentiate between the two,
limiting interpretability and generalization. Learning modular
programs presents a promising alternative, but has proven challenging
due to the difficulty of learning both the programs and modules
simultaneously. We introduce ViperGPT, a framework that leverages
code-generation models to compose vision-and-language models into
subroutines to produce a result for any query. ViperGPT utilizes a
provided API to access the available modules, and composes them by
generating Python code that is later executed. This simple approach
requires no further training, and achieves state-of-the-art results
across various complex visual tasks.

Logical Reasoning

ViperGPT can perform logic operations because it directly executes
Python code.

 1.
 2.
 3.

Previous Next

Spatial Understanding

We show ViperGPT's spatial understanding.

 1.
 2.
 3.
 4.
 5.
 6.
 7.

Previous Next

Knowledge

ViperGPT can access the knowledge of large language models.

Consistency

ViperGPT answers similar questions with consistent reasoning.

 1.
 2.

Previous Next

Math

ViperGPT can count, and divide. All using Python.

Attributes

We show some ViperGPT examples involving attributes.

 1.
 2.
 3.
 4.
 5.

Previous Next

Relational Reasoning

Reasoning about relations.

 1.
 2.
 3.
 4.

Previous Next

Negation

Negation is programmatic, not neural.


More Results


BibTeX

@article{surismenon2023vipergpt,
            author    = {Sur\'is D\'idac and Menon, Sachit and Vondrick, Carl},
            title     = {ViperGPT: Visual Inference via Python Execution for Reasoning},
            journal   = {arXiv preprint arXiv:2303.08128},
            year      = {2023},
}

  

This research is based on work partially supported by the DARPA MCS
program under Federal Agreement No. N660011924032 and the NSF CAREER
Award #2046910. DS is supported by the Microsoft PhD Fellowship and
SM is supported by the NSF GRFP.

This webpage template was inspired by this and this project pages.