The Power of Self-Play in AI Training

Or: How to Get Really, Really, Really Good at Chess

The Lightwave 

Practical Insights for Skeptics & Users Alike…in (Roughly) Two Minutes or Less

“The beautiful thing about learning is that nobody can take it away from you…”

BB King (who apparently never saw the Men in Black movies)

Yesterday, we dove into the basics of Supervised/Unsupervised Learning and Muppet GIFs.

Today we’ll see look at AlphaZero, the groundbreaking artificial intelligence system developed by DeepMind, revolutionized chess by combining supervised and unsupervised learning approaches in a novel way.

Here's how these two fundamental machine learning techniques work together to create AlphaZero's chess prowess

Supervised Learning: The Foundation

While traditional chess engines rely heavily on supervised learning from human-labeled game data, AlphaZero takes a different approach.

It starts with only the basic rules of chess, skipping the usual step of learning from huge data sets of historical games.

This allows AlphaZero to develop strategies free from human biases or preconceptions.

In other words, emphasizing Unsupervised Learning promotes a kind of self-play and discovery within AlphaZero.

Learning by Playing

This self-play is a critical breakthrough. Whereas previous models focused on analyzing millions of past games (Structured Data), the new system plays millions of games against itself, using reinforcement learning to improve its strategies based on game outcomes.

This self-play approach allows AlphaZero to

  • Discover novel tactics and strategies

  • Develop a deep understanding of chess positions

  • Learn from a vast amount of data it generates itself

Combining the Approaches


AlphaZero's brilliance comes from how it integrates these Supervised/Unsupervised learning methods:

  1. Initial Knowledge: Starts with only basic chess rules (minimal supervised input).

  2. Continuous Self-Improvement: Uses unsupervised self-play to generate data and learn.

  3. Reinforcement: Applies reinforcement learning to refine its strategies based on game outcomes.

  4. Neural Network Training: Updates its neural network using insights from self-play, creating a constantly evolving supervised model.

This cycle of self-play, learning, and improvement allows AlphaZero to achieve superhuman performance without relying on human expertise or pre-existing game databases.

Fun Dog GIF by Brussels.Sprout

Gif by sproutie on Giphy

The Result

By combining these approaches, AlphaZero developed a playing style that is both highly effective and often described as more "human-like" and creative compared to traditional chess engines.

It has made moves that surprised chess grandmasters and has contributed new insights to chess strategy.

"I can't disguise my satisfaction that it plays with a very dynamic style, much like my own!”

- Gary Kasparov, former World Chess Champion


AlphaZero’ success demonstrates the power of combining different learning approaches in AI. For businesses and researchers, this suggests that:

  • Limiting initial biases can lead to innovative solutions

  • Self-generated data can be incredibly valuable for training AI systems

  • Continuous learning and adaptation are key to achieving high performance

Thanks for reading. See you tomorrow.