Mask Group 1

ViperGPT: Visual Inference via Python Execution for Reasoning

IMG 3039

About this Session

Answering queries about visual inputs is a complex task that requires both visual processing and reasoning. In this talk, I'll show you how large language models can be useful for reasoning in such settings that fall outside what you'd normally think of as language. ViperGPT uses a provided API to access computer vision modules, then composes them by generating Python code that is later executed. This simple approach requires no further training, and achieves state-of-the-art results across various complex visual tasks. I'll discuss how ViperGPT served as the inspiration for code-based agents, with thoughts on what the future of such agents may hold.

About the Speaker

Sachit Menon is a PhD student in Computer Science at Columbia University advised by Professor Carl Vondrick. His research centres around models trained at scale and ways to use them for novel tasks, such as using large language models to perform visual reasoning.

About Tech Talks

A regular series by Soroco, Tech Talks are expert-led technical sessions that deep dive into a specific area of technology and provide engineers valuable insights and tools. It also examines fascinating research, use cases and facilitates larger conversations around cutting-edge tech.
Mask Group 2

Registration is now closed for this Tech Talk

See Scout in action.
Schedule your demo now!

Request demo