Image courtesy of Professor Seong-Whan Lee.
At the 2018 Pyeongchang Winter Olympics, the South Korean women’s curling team won their country’s first medal in curling. Before their shockingly successful run, the sport was barely known to many South Koreans. Almost overnight, the country was swept up in a curling craze.
Curling has been called “chess on ice” for the high level of strategy involved in every move. On the surface, the rules of curling seem simple enough: Two teams take turns throwing stones across the ice with the goal of getting it as close to the bullseye as possible. However, every throw requires an incredible amount of calculation to get the ideal velocity, angle, and direction of curl to knock out their opponent’s stones while simultaneously getting their own stone to stop on the bullseye. Inspired by the sport’s need for quick decision-making and adaptation, a team of Korean and German researchers developed a deep learning curling robot that could hold its own and even win against top-ranked human teams.
Meet Curly, the artificial intelligence (AI) curling robot system and star athlete extraordinaire.
The lean mean curling machine is made up of two physical parts: skip-Curly and thrower-Curly. Thrower-Curly rotates and releases the stone, and skip-Curly processes the locations and trajectories of all stones with imaging techniques. These robots have long necks to allow them to survey the entire rest of the ice, giving them a gawky giraffe-like appearance. However, once the game is in play, thrower-Curly tucks in its neck, transforming into a surprisingly agile athlete equipped with video analysis, data communication, and throwing control modules strong enough to accelerate a hefty curling stone.
While Curly’s physical ability is quite impressive, the real secret to Curly’s prowess lies in its curling AI which acts as a strategy planning model, curling simulator, and adaptive Deep Reinforcement Learning (DRL) model.
Reinforcement Learning describes a learning problem in which the goal is to maximize a long-term goal or reward. DRL is an incredibly powerful technology because it can learn by itself using a process of trial and error, typically through virtual simulations. However, these simulations are typically done in specific stationary environments, which means that every time an environment changes, the DRL model has to go through the process of relearning. This works for stationary tasks like chess, but not so much for robotic star athletes.
“Curly learned states from millions of actions in simulations […] In the real world, we may not even be able to perform hundreds of actions for the purpose of learning in each case. The system can never replicate the real world.” said Seong-Whan Lee, senior author of the study and Professor of Brain and Cognitive Engineering at Korea University.
After all, it is one thing to simulate moves on a chessboard and another to play a physical sport in live, changing conditions. Changing conditions in the real world are one of the largest obstacles to applying AI outside of the confines of a controlled lab environment, and many researchers have done work to try to reduce the gap between a simulated environment and a real-world scenario. This is why the researchers chose to study DRL in the context of curling.
“Curling is challenging because it requires precise throwing (robot control problem) and strategic planning to win. Moreover, the environmental characteristics change at every moment, and every throw has an impact on the outcome of the match,” Lee said.
Ice is slippery and unpredictable. Over the course of a single curling match, there are many differences in the smoothness of the ice, wear of the pebble, temperature, and humidity. No two throws have the same conditions. In fact, if you apply the exact same direction, force, and curl to a stone, the stone’s final trajectory could be far off enough to account for the difference between a gold medal and finishing off the podium.
The typical relearning process that standard DRL models go through to accommodate these different factors is impractical in a curling tournament because of time constraints. That’s the problem that Lee, in collaboration with Professor Müller of the Berlin Institute of Technology, set out to solve.
“Our proposed adaptation framework can compensate for uncertainties and nonstationarities that are unavoidable in the curling game by augmenting standard DRL through temporal features, which helps to lower the distance gaps to competitive levels. Specifically, our framework is that Curly performs adaptive actions that can respond to the environment changes that occur continuously with every shot.” Lee said.
Unlike existing DRL models, their proposed adaptation framework learns to compensate for the unknowns over a short time period. When applied to curling, the AI is able to recalculate the best throwing strategy for each turn, making Curly a formidable force on the ice.
Curly plays through seamless communication between the skip- and thrower-Curly. First, skip-Curly uses active computer vision to identify where the stones are on the ice and transmits them to the curling AI. With the knowledge of where the stones are and their implications for the game status, the strategy planning AI figures out the best strategy to compute the most optimal throw. The crucial step occurs when the adaptation DRL model computes an adaptation to the strategy, factoring in any uncertainties within the icy environment. This accomplishes the overarching aim of Lee and Müller’s study by incorporating and adjusting for variability in a real-world environment. Finally, the information is communicated to thrower-Curly, who delivers the stone with a rubber grip.
After more than two years of trial and error, the researchers and Curly were ready to test their skills on the big stage.
Curly played four matches against expert human teams: a top Korean women’s curling team, and the national wheelchair reserve curling team. You might notice that Curly can’t sweep, so these matches weren’t exactly reflective of an Olympic match. Still, Lee was surprised at Curly’s success, as the robot won three out of the four matches it played against the two Korean national teams. Curly also played some collaborative games in which it showed the human players the computed throw strategies, and the human players executed and swept the stone. The collaboration between robot and human athletes was very successful. Compared to the non-adapted algorithm, the level of error in distance traveled was significantly reduced when using the proposed adaptive DRL framework.
“Even elite players don’t always get the stone to the target. They can be off by an average of 3 to 4 feet (0.8 meters to 1.3 meters). Curly was able to match that margin of error. […] Our results indicate that the gap between physics-based simulators and the real world can be narrowed,” Lee explained.
The framework has implications for the application of AI to changing environments beyond Curly’s stunning athletic career. Curly’s success adds more fuel to the question of the 21st century: can AI outsmart humanity?
On May 12, 1997, IBM’s Deep Blue supercomputer defeated Chess world champion Garry Kasparov. More recently, in 2016, AlphaGo beat the legendary Go player Lee Sedol 4-1, a bit step forward for machine learning, because Go is exceptionally complex—there are significantly more possible board configurations than the number of atoms in the universe.
After this historic victory, a more recent iteration of AlphaGo known as AlphaGo Zero—which, liked Curly, uses DRL—was introduced in 2017. In AlphaGo Zero’s case, the computer started off with a neural network that had absolutely zero knowledge of Go. Similar to how Curly learned the sport of curling through simulations, AlphaGo Zero was able to learn Go by playing against itself. After just eight hours of learning, AlphaGo Zero bested the former version of itself, AlphaGo.
The performance of AlphaGo Zero shone a light on the powerful ability of AI to learn complex tasks using DRL, unconstrained by the limits of human knowledge. Curly takes that one step further by showing that robots have the potential to adapt to the “real world,” opening up the door for AI to become increasingly integrated in our homes, workplaces, and in Curly’s case, maybe even bask in the glory of a gold medal finish at the Olympics.