Deep Papers Artwork

Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

Show More

Episodes

60 episodes

CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent

We dive into the latest paper from a team of researchers at IBM: "From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production." We're excited to host several of the paper's authors, who walk us through the resear...

February 11, 2026 • 23:04

Deep Papers Artwork

TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture."Hear from one of the pa...

November 24, 2025 • 23:44

Deep Papers Artwork

Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations

In our latest paper reading, we had the pleasure of hosting Grégoire Mialon — Research Scientist at Meta Superintelligence Labs — to walk us through Meta AI’s groundbreaking

November 10, 2025 • 22:34

Deep Papers Artwork

Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI

Santosh Vempala, Frederick Storey II Chair of Computing and Distinguished Professor in the School of Computer Science at Georgia Tech, explains

October 14, 2025 • 31:24

Deep Papers Artwork

Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies

Large language models are increasingly used to turn complex study output into plain-English summaries. But how do we know which models are safest and most reliable for healthcare? In this most recent community AI research paper read...

September 22, 2025 • 26:22

Deep Papers Artwork

Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper

This episode dives into "Category-Theoretic Analysis ...

September 06, 2025 • 48:11

Deep Papers Artwork

Small Language Models are the Future of Agentic AI

We had the privilege of hosting Peter Belcak – an AI Researcher working on the reliability and efficiency of agentic systems at NVIDIA – who walked us through his new paper making the rounds in AI circles titled “

September 05, 2025 • 31:15

Deep Papers Artwork

Watermarking for LLMs and Image Models

In this AI research paper reading, we dive into "A Watermark for Large Language Models" with the paper's author John Kirchenbauer. This paper is a timely exploration of techniques for embedding invisible but detectable signals in AI...

July 30, 2025 • 42:56

Deep Papers Artwork

Self-Adapting Language Models: Paper Authors Discuss Implications

The authors of the new paper *Self-Adapting Language Models (SEAL)* shared a behind-the-scenes look at their work, motivations, results, and future directions.The paper introduces a novel method for enabling large language models (LLMs) ...

July 08, 2025 • 31:26

Deep Papers Artwork

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

This week we discuss The Illusion of Thinking, a new paper from researchers at Apple that challenges today’s evaluation methods and introduces a new benchmark: synthetic puzzles with controllable complexity and clean logic. Their fi...

June 20, 2025 • 30:35

Deep Papers Artwork

Accurate KV Cache Quantization with Outlier Tokens Tracing

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing memory and compute costs during inference, by...

June 04, 2025 • 25:11

Deep Papers Artwork

Scalable Chain of Thoughts via Elastic Reasoning

In this week's episode, we talk about Elastic Reasoning, a novel framework designed to enhance the efficiency and scalability of large reasoning models by explicitly separating the reasoning process into two distinct phases: thinking a...

May 16, 2025 • 28:54

Deep Papers Artwork

Sleep-time Compute: Beyond Inference Scaling at Test-time

What if your LLM could think ahead—preparing answers before questions are even asked?In this week's paper read, we dive into a

May 02, 2025 • 30:24

Deep Papers Artwork

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

For this week's paper read, we dive into our own research.We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We al...

April 18, 2025 • 27:19

Deep Papers Artwork

AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam

This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably Hu...

April 04, 2025 • 26:11

Deep Papers Artwork

Model Context Protocol (MCP)

We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was ...

March 25, 2025 • 15:03

Deep Papers Artwork

AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs

This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks.We break down key announcements, including:DeepSeek’s Big Launc...

February 28, 2025 • 30:23

Deep Papers Artwork

How DeepSeek is Pushing the Boundaries of AI Development

This week, we dive into DeepSeek. SallyAnn DeLucia, Product Manager at Arize, and Nick Luzio, a Solutions Engineer, break down key insights on a model that have dominating headlines for its significant breakthrough in inference speed over other...

February 21, 2025 • 29:54

Deep Papers Artwork

Multiagent Finetuning: A Conversation with Researcher Yilun Du

We talk to Google DeepMind Senior Research Scientist (and incoming Assistant Professor at Harvard), Yilun Du, about his latest paper, "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains." This paper introduces a multiagent fi...

February 04, 2025 • 30:03

Deep Papers Artwork

Training Large Language Models to Reason in Continuous Latent Space

LLMs have typically been restricted to reason in the "language space," where chain-of-thought (CoT) is used to solve complex reasoning problems. But a new paper argues that language space may not always be the best for reasoning. In this paper ...

January 14, 2025 • 24:58

Deep Papers Artwork

LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods

We discuss a major survey of work and research on LLM-as-Judge from the last few years. "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods" systematically examines the LLMs-as-Judge framework across five dimensions: functio...

December 23, 2024 • 28:57

Deep Papers Artwork

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

LLMs have revolutionized natural language processing, showcasing remarkable versatility and capabilities. But individual LLMs often exhibit distinct strengths and weaknesses, influenced by differences in their training corpora. This diversity p...

December 10, 2024 • 28:47

Deep Papers Artwork

Agent-as-a-Judge: Evaluate Agents with Agents

This week, we break down the “Agent-as-a-Judge” framework—a new agent evaluation paradigm that’s kind of like getting robots to grade each other’s homework. Where typical evaluation methods focus solely on outcomes or demand extensive manual wo...

November 22, 2024 • 24:54

Deep Papers Artwork

Introduction to OpenAI's Realtime API

We break down OpenAI’s realtime API. Learn how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or ...

November 12, 2024 • 29:56

Deep Papers Artwork

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

As multi-agent systems grow in importance for fields ranging from customer support to autonomous decision-making, OpenAI has introduced Swarm, an experimental framework that simplifies the process of building and managing these systems. Swarm, ...

October 29, 2024 • 46:46

Deep Papers Artwork