Deep Papers

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

December 27, 2023 Arize AI
A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I
Deep Papers
More Info
Deep Papers
A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I
Dec 27, 2023
Arize AI

For the last paper read of the year, Arize CPO & Co-Founder, Aparna Dhinakaran, is joined by a Dat Ngo (ML Solutions Architect) and Aman Khan (Product Manager) for an exploration of the new kids on the block: Gemini and Mixtral-8x7B. 

There's a lot to cover, so this week's paper read is Part I in a series about Mixtral and Gemini. In Part I, we  provide some background and context for Mixtral 8x7B from Mistral AI, a high-quality sparse mixture of experts model (SMoE) that outperforms Llama 2 70B on most benchmarks with 6x faster inference Mixtral also matches or outperforms GPT3.5 on most benchmarks. This open-source model was optimized through supervised fine-tuning and direct preference optimization. 

Stay tuned for Part II in January, where we'll build on this conversation in and discuss Gemini-developed by teams at DeepMind and Google Research. 

Link to transcript and live recording: https://arize.com/blog/a-deep-dive-into-generatives-newest-models-mistral-mixtral-8x7b/

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

Show Notes

For the last paper read of the year, Arize CPO & Co-Founder, Aparna Dhinakaran, is joined by a Dat Ngo (ML Solutions Architect) and Aman Khan (Product Manager) for an exploration of the new kids on the block: Gemini and Mixtral-8x7B. 

There's a lot to cover, so this week's paper read is Part I in a series about Mixtral and Gemini. In Part I, we  provide some background and context for Mixtral 8x7B from Mistral AI, a high-quality sparse mixture of experts model (SMoE) that outperforms Llama 2 70B on most benchmarks with 6x faster inference Mixtral also matches or outperforms GPT3.5 on most benchmarks. This open-source model was optimized through supervised fine-tuning and direct preference optimization. 

Stay tuned for Part II in January, where we'll build on this conversation in and discuss Gemini-developed by teams at DeepMind and Google Research. 

Link to transcript and live recording: https://arize.com/blog/a-deep-dive-into-generatives-newest-models-mistral-mixtral-8x7b/

To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.