Mask Group 1

ViperGPT: Visual Inference via Python Execution for Reasoning

Sachit Menon banner image 1 r50gd8ybefryg46sys68yro9nr22k7q0top5dg2zre

About this Session

Answering queries about visual inputs is a complex task that requires both visual processing and reasoning. In this talk, Sachit will demonstrate how large language models can be instrumental in reasoning within such settings, which extend beyond traditional language tasks. ViperGPT utilizes a provided API to access computer vision modules and composes them by generating Python code that is subsequently executed. This simple approach requires no additional training and achieves state-of-the-art results across various complex visual tasks. Sachit will also discuss how ViperGPT inspired the development of code-based agents and share insights on the future potential of such agents.

About the Speaker

Sachit Menon is a PhD student in Computer Science at Columbia University advised by Professor Carl Vondrick. His research centres around models trained at scale and ways to use them for novel tasks, such as using large language models to perform visual reasoning.

About Tech Talks

A regular series by Soroco, Tech Talks are expert-led technical sessions that deep dive into a specific area of technology and provide engineers valuable insights and tools. It also examines fascinating research, use cases and facilitates larger conversations around cutting-edge tech.
Mask Group 2
Register for the Tech Talk

See Scout in action.
Schedule your demo now!

Request demo