After Go and Chess, AI Is Back to defeat Mere Humans—this time its Stratego

2022-07-23 05:53:11 By : Ms. vicky liao

Deepmind has been the pioneer in making AI models that have the capability to mimic a human’s cognitive ability to play games. Games are a common testbed to assess a model’s ability. After mastering games like Go, Chess and Checkers, Deepmind has launched DeepNash, an AI model that can play Stratego at an expert level. 

Mastering a game like ‘Stratego’ is a significant achievement for AI research because it presents a challenging benchmark for learning strategic interactions at a massive scale. Stratego’s complexity is based on two key aspects. Firstly, there are 10535 possible states in the game, which is exponentially larger than Texas hold ’em poker(10164 states) and Go(10360 states). The second is that at the start of the game, any given situation in Stratego requires reasoning over 1066 possible deployments for each player.

DeepNash learns to play Stratego in a self-play model-free manner without the need for human demonstration. DeepNash outperforms previous state-of-the-art AI agents and achieves expert human-level performance in the most complex variant of the game, Stratego Classic.

DeepNash, at its core, is based on a model-free reinforcement learning algorithm that is termed as Regularised Nash Dynamics(R-NaD). 

DeepNash combines the concept of R-NaD with its deep neural network architecture and converges to an approximate ‘Nash equilibrium’ by directly modifying the underlying multi-agent learning dynamics. By this technique, DeepNash was able to beat the existing state-of-the-art AI methods in Stratego, even achieving an all-time best ranking of #3 on the Gravon games platform against human expert players.

DeepNash employs an end-to-end approach to employ the learning of the deployment phase. The model uses deep reinforcement learning coupled with a theoretic game approach in this phase. The goal of the model is to learn to approximate Nash equilibrium through self-play. This technique guarantees that the agent will perform well even against a worst-case opponent.

Stratego computationally challenges all existing search techniques due to search space intractability. To resolve this, DeepNash uses an orthogonal route without search and proposes a new method(R-Nad). This new model combines model-free reinforcement learning in self-play with a game theoretic algorithmic idea.

This combined approach does not require modelling private states from public data. However, the challenge with this approach is that of scaling up this model-free reinforcement learning approach with R-NaD for making self-play competitive against human experts in Stratego – a feat that remains yet to be achieved.

We learn a Nash equilibrium in Stratego through self-play and model-free reinforcement learning. The idea of combining model-free RL and self-play has been tried before, but it has been empirically challenging to stabilise such learning algorithms when scaling up to complex games.

The idea behind the R-NaD algorithm is that it is possible to define a learning update rule that provides a dynamical system that, in turn, reveals the existence of a Lyapunov function. This function decreases during learning, which in turn guarantees convergence to a fixed nash equilibrium.

To test DeepNash’s capabilities, it is evaluated against both human expert players and the latest SOTA Stratego bots. The former test is performed on Gravon, a well-known online gaming platform for Stratego players. The latter is performed against known Stratego bots like Celsius, Asmodeus, PeternLewis, etc. 

Inspite of training only with self-play, DeepNash achieves victory against all of the bots with an overwhelming majority. However, in a few matches that DeepNash lost against Celsius1.1, the latter took a high-risk strategy of getting a significant material advantage by capturing pieces with a high-ranking piece at the start of the game.

DeepNash is designed with the sole aim of learning a Nash equilibrium policy during training and learning the qualitative behaviour of a top player. DeepNash managed to generate a wide range of deployments which made it difficult for the human players to find patterns to exploit. DeepNash also demonstrated its capability to make non-trivial trade-offs between information and material, execute bluffs and take risks when needed. 

Masterclass, Virtual Building a career in Artificial Intelligence 23rd Jul

Conference, in-person (Bangalore) Cypher 2022 21-23rd Sep

Conference, in-person (Bangalore) Machine Learning Developers Summit (MLDS) 2023 19-20th Jan

Conference, in-person (Bangalore) Data Engineering Summit (DES) 2023 21st Apr, 2023

Stay Connected with a larger ecosystem of data science and ML Professionals

Discover special offers, top stories, upcoming events, and more.

The MLOps maturity model is a key component of the MLOps.This article aims to explain the MLOps maturity model and its importance in the production environment.

Hicks was a foundational member of the engineering team that developed Red Hat OpenShift

Gradient Descent is primarily used in Neural Networks for unsupervised learning

The weight clustering API is one of the use cases of the Tensorflow model optimization library and it aims to optimize the models developed so that they can be easily integrated into edge devices.

In an open core model, the company offers certain limited features that form the core of the product as free and open source (FOSS) software.

Another 100 projects are in various stages of development

In FY-23 Q1, the company onboarded 4,700 employees leading its total headcount to 37,455 professionals.

Tesla laid off 229 annotation employees from its Autopilot team and shut down one of its US offices

MG Motors, the first carmaker to enter the NFT space in December, 2021 in India.

Satya Nadella, the CEO of Microsoft, summarised the Tay incident as a teaching moment and stated that Tay had changed the mindset of Microsoft’s approach toward AI.

Stay up to date with our latest news, receive exclusive deals, and more.

© Analytics India Magazine Pvt Ltd 2022