The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs Artwork

Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

All Episodes

Deep Papers

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

October 16, 2024 • Arize AI

In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler.

This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transforming LLMs. Harrison talks about how this method improves upon traditional sampling strategies by leveraging entropy and varentropy to produce more dynamic and intelligent responses. Explore its potential to enhance open-source AI models and enable human-like reasoning in smaller language models.

Learn more about AI observability and evaluation, join the Arize AI Slack community or get the latest on LinkedIn and X.

Today, I wanted to talk about Entropy Based Sampling, colloquially known as the Shrek Sampler, what it is, and And why is everyone talking about it? Fun fact, it's called the Shrek sampler because the author of the repo, the person who's posting all of this stuff behind X, his profile picture is Shrek dressed as the main antagonist from the Western movie, No Country for Old Men.

Before we go further, what is a sampler? Well, the first thing you need to know. Is that an LLM doesn't just output the next token at each forward pass. Instead, every time you talk to an LLM, like chat GPT or cloud, what happens is the LLM under the hood returns a list of thousands and thousands of probabilities, each probability of representing the likelihood of that next token being the next token of the output sampling is a process by which to determine out of that list.

Which token to use the simplest way will be to always choose the most probable word But what researchers have found is that when you always choose the most probable word arc max This will come into play later. You tend to get output. That is really boring and Uninteresting instead what developers of these AI systems do is introduce methods like top case sampling temperature based sampling to induce some element of controlled randomness where likely outcomes are more often seen, but some element of surprise can still happen.

By the way, this is why when you enter the same prompt twice, you'll get different results. So what is unique about the Shrek sampler? At the end of each forward pass, Unlike the naive sampling strategies we just talked about, the Shrek sampler takes in the list of all probabilities of all possible tokens and computes two values, entropy and bar entropy.

And it makes sense if you think about it. When the model returns its list of thousands and thousands of probabilities, it's not just giving a list of words to use, it's in a way implicitly expressing how uncertain and how much it doesn't know about what to do next. And so the Shrek sampler exploits that by taking those two values and switching its sampling strategy based on those values.

So, for example, if uncertainty is really low, if entropy and varentropy are both really low, it'll just take var max. It'll just take the highest probability value and output that. But if uncertainty is really high, I'll do something really interesting, like insert a pause token to get the model, to reconsider the path that was going down.

Here's an example that the author posted where the author asks a 1 billion parameter open source model, which value is greater 9. 11 or 9. 9. And you can see a pause token is inserted right when the model is about to go off the rails to get it to reconsider and start a series of chain of thought reasoning steps that eventually gets it at the right answer.

And I think this is partly why the community, especially the open source AI community is super excited about the results. You essentially have like this, Oh, one style reasoning result being replicated in an open source model. One building parameters that you and I could download. And if we tried really, really hard, get it to run on a machine that we'd run at home.

By the way, if you want to check this out for yourself, we'll link out to the repository below. You don't need to read a paper. It's just code sampler. py. It's all you need.

People on this episode

Deep Papers

Deep Papers

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

People on this episode

Aparna Dhinakaran

Brian Burns

Jason Lopatecki