TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture Artwork

Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

All Episodes

Deep Papers

TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

November 24, 2025 • Arize AI

0:00 | 23:44

We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture."

Hear from one of the paper's authors — Yongchao Chen, Research Scientist — walks through the research and its implications.

The paper proposes Tool-Use Mixture (TUMIX), an ensemble framework that runs multiple agents in parallel, each employing distinct tool-use strategies and answer paths. Agents in TUMIX iteratively share and refine responses based on the question and previous answers. In experiments, TUMIX achieves significant gains over state-of-the-art tool-augmented and test-time scaling methods.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Dylan Couzon

Host

Parth Shisode

Host