Hanabi is a very fun game to play, and it is currently our main project. The game contains many
complexities like Radomness, partial observability, big observation space, etc. However, the biggest
challenge with the game is the existence of a non-verbal way to communicate information. This
information, that is called intention by some, is communicated through the choice of action and
specifically through a hint system.
If you have not seen the game before, I think it is a good time to click at (put them in parallel or make a
box 2x2)
English: Hanabi Review - with Tom Vasel
French: LudoChrono - Hanabi
Portugese: TRAPAÇAS no De Quem é a Vez? - MAGIC MAZE & HANABI - Jhonny Drumond, Thati Lopes e Victor Lamoglia
German: Hanabi (Spiel des Jahres 2013) - Spiel - Kartenspiel - Board Game - Review #10
Because of this special communication system, it is possible to generate an astronomical number of
winning strategies, which are totally incompatible with each other. This means that when people
consciously agree on a strategy, the game is being completed successfully. However, if these strategies
are picked at random, we end up in a disaster. Unlike with zero-sum competitive games like Go, since
the choices that an agent is doing in the game, depend on their interpretation of your intentions, it is
impossible to find a strategy that works for everyone.
Because of this communication peculiarity, Hanabi was proposed by google as a good testbed for
creating agents that can successfully engage in cooperative tasks in ad-hoc setting, i.e., paired with
agents that have not seen before. In order to do this, we believe that an agent should be able to create
models of other agents that are sufficient to finish the game. However, these models must have two
main properties. First, they must be sufficient in order to have a universal model-based strategy that
successfully completes the game. They have to be quite reduced and light so they can be produced only
after a small number of games. That is why Hanabi, seems like the perfect ground to form and test
meta-learning algorithms applied to decision making.
Our roadmap has three stops. First, we create a pool of diverse agents that they carry a winning
strategy. These agents are quite fixed and not adaptable, and they serve as data-points on the space of
possible strategies. For the creation of these pools, we try three different Evolutionary methods. The first is with ruled based agents, the second with reward shaping and the third using Neuroevolution.
With the completion of the pool, the second step is about forming models of these agents. Through
Reinforcement learning, an agent will be trained to play with these agents, given these models.