## An expectation maximization Yahtzee AI

By Vermeille on Saturday 5 March 2016, 20:30 - Permalink

## Yahtzee

I will describe a simple AI I did for the Yahtzee game. The solution is not optimal because of one small point. There are probably smarter ways to write this program, but as I needed to write this program quickly to play with my friends at New Year's Eve, (I had less than 3 days, actually), my priority was to have a solution almost guaranteeing me to win, not a beautiful and optimal one. If you know how to make it better, let me know in the comments.

As I am self taught in probability ans statistics, my notations and terminology might not be accurate. You're more than welcome to help me improve this in the comments.

### Description

The game of Yahtzee is a mix of poker and dice rolls: you have 5 dices to roll, the ability to reroll any of them twice, and, depending of the combinations you have, score some points. Each combination can be scored only once, and if no combination was made, the player must sacrifice one of them, so that the number of turns is fixed.

The combinations are:

Name | Score | Description |
---|---|---|

One | Sum of 1s | Number of 1s obtained |

Two | Sum of 2s | Number of 2s obtained * 2 |

Three | Sum of 3s | Number of 3s obtained * 3 |

Four | Sum of 4s | Number of 4s obtained * 4 |

Five | Sum of 5s | Number of 5s obtained * 5 |

Six | Sum of 6s | Number of 6s obtained * 6 |

Set | Sum of 3 dices | Three same dices. Score is the sum of those 3. |

Full House | 25 | Three same dices + two same dices. |

Quad | Sum of 4 dices | Four same dices. Score is the sum of those 4. |

Straight | 30 | Four dices in sequence (1234 / 2345 / 3456) |

Full straight | 40 | Five dices in sequence (12345 / 23456) |

Yahtzee | 50 | Five same dices |

Luck | Sum of dices | Any combination. Usually, when nothing else works. |

Each player, in turn do this:

- Roll all dices. The player can select a combination and end his turn, or...
- Select some dices to roll again. Then, the player can select a combination and end his turn, or...
- Select some dices to roll again. Then, the player MUST select a combination to score or sacrifice.

## The AI

### The numbers

The game has a fairly low dimensionnality. Any of the 5 dices can take values from 1 to 6. Hence, the (naive) number of possible games is \(6^5 = 7776\). But this is actually a higher bound: the dices are not ordered, and a lot of the combinations are equivalent (11234 is equivalent to 12431, etc). The real number of possible games is given by the formula of unordered combinations with repetitions. With \(n = 6\) and \(k = 5\):

\[C'_k(n) = {n+k-1 \choose k}\] \[C'_{ 5 }( 6 ) = C_{{ 5}}(10) = {{ 10} \choose 5} = \frac{ 10! }{ 5!(10-5)!} = 252\]

Which is, fortunately, far from intractable, and we can bruteforce all of them.

We will also find useful later to know how many outcomes are possible for any number of dices.

# of dices | # of outcomes |
---|---|

0 | 1 |

1 | 6 |

2 | 21 |

3 | 56 |

4 | 126 |

5 | 252 |

The number of possible *actions* (set of dices to reroll) is the number of subsets of the dices, ie \(2^k=2^5=32\).

### The program

The program is fairly simple to use: given a dice roll, it will tell you which dices to reroll (if any), and the associated statistical expected score.

First, we need to precompute the score that each roll gets for all of the combinations. I first enumerate each of the (ordered) possible games, compute their score for each combination, and store than in a table of \(7776 \times 13\).

The user is then prompted to write the hand he got. The objective is the following:

\[\text{action*} = \underset{\text{action}}{\operatorname{argmax}} \mathbb{E}[\text{best score | action}]\]

ie: find the subset of dices to reroll that leads to the best score (ie, the best scored combination for each possible outcome given this reroll) where \(action\) is successively one of the 32 possible subsets of dices to reroll, and \(action*\) the best choice according (with an eager strategy).

This expectation can be computed as follows:

\[\text{action*} = \underset{\text{action}}{\operatorname{argmax}} \frac{1}{\text{# of equivalent outcomes | action}} \sum_{\text{possible games} g \text{| action}} \underset{\text{combination} c}{\operatorname{max}}(\text{score for} c | g)\]

This is an eager policy that maximizes the score for each *turn*. As such, this algorithm does not take into account the *waste of points* that you can make by choosing a combination, to allow maximizing your score for the *whole game*. As I was unable to think of an optimal solution for this (and I would really enjoy to know if there's one), I chose to apply a (quite arbitraty) penalty to each combination's maximum score following:

\[\text{penalty(combination, current_score)} = \exp{-\frac{\text{best possible score for combination} - \text{current_score})}{100}}\]

In code terms, this would lead to something like:

- Read input hand \(r\)
- Initialize \(e\), the expectation for each possible reroll, to 0
- For each possible game \(g_i\):
- \(d = \text{dices to reroll to go from } r \text{ to } g_i\)
- \[e[d] \text{+=} \frac{1}{\text{number of possible outcomes for } d} \text{maximum score for } g_i\]

- return \(\underset{\text{d}}{\operatorname{argmax}} e[d]\)

And that's it.

## Conclusion

I won, statistically. Which is good. Bad point: my friends were angry because taking some time to make an often "obvious" choice was not worth it according to them :D. Make sure your friends enjoy maths and / or CS before doing something like this!

The code is available on my GitHub page. As I said, don't expect magnificent code.