Cepheus Takes on Texas Hold’em: Computer program teaches itself to compute winning moves

Xiuqi Cao | xiuqi.cao@yale.edu March 29, 2015

Cepheus Takes on Texas Hold’em: Computer program teaches itself to compute winning moves

In previous years, researchers have developed artificial intelligence that can beat humans in a variety of games. In 1994, there was checkers world-champion Chinook. Then came chess grandmaster Deep Blue in 1996, and Jeopardy! title-holder Watson in 2011. Now, there is a new revolutionary program in town. Its name: Cepheus.

Developed by Dr. Michael Bowling of the University of Alberta in Canada, Cepheus is a computer program designed to master Texas hold’em. With Cepheus as a model, researchers will be able to develop technologies to assist humans in making complex decisions when the right answer may be ambiguous. The implications of this computer program go far beyond the game of poker, ultimately suggesting a way to reduce human error and all of its pitfalls.

Dr. Michael Bowling and his colleague work on their program, Cepheus. Image courtesy of the University of Alberta.

Dr. Michael Bowling and his colleague work on their program, Cepheus. Image courtesy of the University of Alberta.

What separates Cepheus from other artificial intelligence programs is that it can make the most beneficial decision in games with imperfect information, which includes Texas hold’em. In these games, participants do not know all of the factors that could contribute to each round of play. In Texas hold’em, players do not know which cards their opponents are holding, so it is nearly impossible to decide with certainty how valuable their hand will be as they place bets before each round. It is thus more difficult for a Texas hold’em player to predict his opponents’ strategies than it is for a checkers, chess, or Jeopardy! player, who would be equipped with complete information during the game before deciding how to play his next move.

In order to construct a program that could master imperfect-information games, Bowling’s team developed Cepheus to be capable of learning from its mistakes. First, they chose to work with a simple version of Texas hold’em called heads-up limit hold’em, in which two players bet fixed initial sums of money and raise their bets in set increments. Heads-up limit hold’em allowed researchers to study imperfect-information games using simplified calculations.

To solve heads-up limit Hold’em, Cepheus used an established algorithm called Counterfactual Regret Minimization (CFR), which analyzes previous decisions to determine whether alternative choices could have resulted in better outcomes. After conducting this analysis for each hand, the algorithm tallies how much it lost because of its decision, and retains this number as a regret value. Worse decisions with more drastic losses have lower associated regret values than better decisions. When the program needs to make the same decision in the future, it compares stored regret values to determine the optimal choice. After practicing with itself for thousands of rounds using CFR, Cepheus was able to reduce its regret values to the point that it was undefeatable when taking on humans in heads-up limit hold’em.

Cepheus has yet to completely master the standard Texas hold’em game. Researchers are unsure whether Cepheus’ results from heads-up limit hold’em can be extrapolated to standard Texas hold’em, where there are typically three or more players and therefore a greater number of possible outcomes during the game. In order to analyze even more complicated situations with more players and more sets of information, Cepheus would need to play in a fair game without collusion from the other players. When Cepheus was placed against two other computers, an optimal test environment in which the program’s identity could be hidden, it was able to produce positive results.

Although Cepheus can only play poker, the concept behind it could lead to groundbreaking implications for the fields of game theory and artificial intelligence. In areas like finance and economics where speculation is key, future programs derived from Cepheus could help investors make optimal decisions. In medicine, Cepheus could lead to the development of technology to help doctors advise between treatment options by analyzing the patient’s history as well as the efficacy of drugs and treatments in consideration. Even in politics, global leaders could implement CFR technology to analyze results of historical events and help world leaders make the most beneficial decisions for mankind. While human error impacts our decisions in uncertain circumstances, from politics to play, the ability to analyze past events could help prevent us from repeating past mistakes.

You can challenge Cepheus yourself at http://poker-play.srv.ualberta.ca/.