2022 Competition Results

This year we ran two tracks of the competition. The Standard Track which is based on Angry Birds Chrome, and the new Novelty Track which is based on Science Birds. Overall, we had 11 participating agents. 

For the Standard Trackagents have 30 minutes per round to solve 8 new Angry Birds levels that we designed ourselves. Each level contains only game objects that are known to the agent. We did not run any of the agents on the new competition levels before the competition and they are unknown to all participants. For every match, all agents can see the current high scores per level for all agents of the same match. and can use this information to select which level to solve next. Levels can be solved again in any order until the time is up. Each agent runs on an individual laptop and can use its full computational power. We ran 4 agents per round and every round was live at IJCAI'22 in Vienna. The overall score per agent is the sum of their highest score for each of the 8 game levels. The two agents with the highest overall score progress to the next round until we have a winner. 

The surprise winner from 2021, Agent X, played as the defending Champion. 2020 Champion Bambirds who lost the 2021 final by a large margin submitted a new and improved agent. Both teams were again the dominating agents this year and easily made it into the Grand Final. In the final showdown of the competition, the excitement was huge. Can the new Bambirds agent beat Agent X, or is Agent X still better? The final lasted for 30 minutes and it was incredibly exciting. Both team had different strategies and played the games in different order, so the leaderboard changed almost every time an agent solved a game. In the end, the result was extremely close: Bambirds had a final score of 172,320, Agent X a final score of 169,890. Bambirds regained their title and is crowned as the new AIBIRDS 2022 Champion! Congratulations to Diedriech Wolter and Felix Haase from the University of Bamberg in Germany! Well done in further improving their already excellent agent. 

 

The Novelty Track was very different from the Standard Track. Not only because it was based on Science Birds, which has slightly different physics from Angry Birds, but also because we introduced novelty that was unknown to the participants. This novelty was of five different types, we call it novelty levels. Novelty level 1 was new game entities that are visibly different from known game objects and have different properties. These could be new birds with new special powers, new pigs, or new blocks and other new game objects. Novelty level 2 was existing game entities, but with a changed parameter, which could be any of the parameters that define the game entities. The difficulty here is that agents can only determine what has changed by interacting with the modified game entity and observing what has changed. Novelty level 3 is a change in game representation. For example, the game could be upside down or black and white. Novelty level 4 is new static relations between game entities, and novelty level 5 is new dynamic interactions between game entities. Novelty levels 4 and 5 were used for the first time this year.

Before the competition we released test novelties for the different novelty levels for participants to experiment and develop their agents. But the novelties we used in the competition (2 per novelty level) were unknown to participants. The Novelty Track is much more realistic as in the real Angry Birds game novelty is very often introduced. New game entities, new capabilities, or new game versions. The AI capabilities to deal with novelty are of utmost importance for AI dealing with real world situations, where novelty occurs very frequently. AI agents need to be able to detect what is novel and need to be able to adjust to it. An example is a future household robot. Whenever you purchase a new item, the household robot needs to understand what the new item is, what it does and how it can be used, just like other members of the household. 

The novelties we used in the competition are the following:

Novelty Level 1 (Previously unseen objects or entities):

  • Novelty 1.1: New egg-shaped object with the same colour as a pig
  • Novelty 1.2: New object that behaves like a pig and must be destroyed to win the game

Novelty Level 2 (Change in object features):

  • Novelty 2.1: Red bird has increased bounciness
  • Novelty 2.2: Slingshot is higher than normal

Novelty Level 3 (Change in representation):

  • Novelty 3.1: Objects surrounding a pig have the same colour as pigs
  • Novelty 3.2: Object shapes are represented as a circle instead of their real shapes

Novelty Level 4 (New static relations):

  • Novelty 4.1: Slingshot is on the right
  • Novelty 4.2: Wood blocks can float above ice blocks. If ice block gets destroyed, wood block falls down

Novelty Level 5 (New dynamic interactions):

  • Novelty 5.1: Birds fly through pigs instead of killing them. Pigs need to be killed otherwise
  • Novelty 5.2: Birds can slide on stone blocks (low friction)

In order to evaluate agents on these novelties, we set up a competition with 20 different trials for each of the ten novelties. A trial is a fixed sequence of Angry Birds games, each game can be placed only once and games most be played in the given order. Each trial has an unknown number of standard, non-novel games at the beginning of the trial, followed by a fixed number of novel games, i.e., at one point the games change from non-novel games to novel games. The task of the agents remains to solve each level, i.e., to kill all the pigs with as few birds as possible. How to solve this task can change quite a bit when novelty is introduced, and it is possible that agents unable to deal with novelty cannot solve any games anymore. Agents need to detect the novelty and adjust to it, a very difficult task. Each trial contained between 0 and 15 non-novel games, followed by 40 novel games. But these settings were unknown to participants. In addition to solving the games, agents also had to report when they believe novelty has been introduced. Therefore, agents are evaluated on two aspects: (1) their novelty detection performance, which is based on the percentage of trials where they correctly detect novelty (i.e., they report novelty after novelty occurs) and on the number of novel games they need before they can detect it. (2) their novelty reaction performance, which is the overall game score they received in the novel games. See here for a more detailed description of these measures. Given that we used 10 novelties, 20 trials per novelty, plus ten trials for no-novelty, and each trial consists of around 50 games, i.e., each agent had to play around 10500 games, we were not able to run the competition live, but ran it in advance on AWS. Agents had on average two minutes per game and can speed up gameplay up to 50 times. 

One major change we made compared to the 2021 Novelty Track is that this year we use simple games with only a few birds, pigs, and blocks. These games are randomly created from simple game templates. That way every non-novel game of the same trial is very similar and from the same template. Likewise every novel game of the same trial is very similar and from the same template. Switching to novell games requires a specific change in game strategy to be able to still solve the games. 

In order to study progress over the past year, we used the best agents from the 2021 competition as well as the new 2022 agents, so for several agents (CIMARRON from the University of Massachusetts Amherst, HYDRA from the Palo Alto Research Center and the University of Pennsylvania, OpenMIND from Smart Information Flow Technologies) we ran two version of these agents, the 2021 version and the new 2022 version. The overall winner of the competition is the agent with the best novelty reaction performance across all novelties. In addition, we have two other major awards: the agent with the best novelty detection performance across all novelties, and the agent with the best non-novelty performance, which corresponds to the Standard Track. We also have ten subcategories: best novelty detection performance and best novelty reaction performance for each of the five novelty levels. 

We presented the results as part of the Competition Session at IJCAI 2022. The presentation which includes details about the six novelties we used  can be found here

The overall winner and AIBIRDS 2022 Novelty Champion is: OpenMind 2022 from Smart Information Flow Technologies (SIFT)! Congratulations to David Musliner and his team! Second place went to CIMARRON 2022 from UMass, whose 2021 agent was the defending champion. Third place went to HYDRA from PARC. 

  • Subcategory winners for the best novelty reaction performance are: HYDRA 2022 for novelty level 1, OpenMind 2022 for novelty level 2, CIMARRON 2022 for novelty levels 3 and 5, and CIMARRON 2021 for novelty level 4. 

The winner of the Novelty Detection Award is OpenMIND 2022 from SIFT! Second place went to Dongqing 1 from Bytedance/Monash, third place to CIMARRON 2022 from UMass. 

  • Subcategory winners for the best novelty detection performance are: Dongqing 1 for novelty level 1, and OpenMind 2022 for novelty levels 2-5. 

The winner of the Non-Novelty Award is also OpenMind 2022 from SIFT.  Second place went to CIMARRON 2022, third place to HYDRA 2022. 

We saw a massive jump in performance this year when compared to the best 2021 agents. For Novelty Reaction, OpenMind 2022 won with a score of 20,244,030, while last years winner CIMARRON 2021 only achieved a score of 8,219,260. For Novelty Detection, the winner OpenMind 2022 had a score of 21.37 compared to the 2021 winner OpenMind 2021 which hd a score of only 3.9. Even for non-novelty there was a huge improvement, OpenMind 2022 had a score of 22,582,050, while CIMARRON 2021 only had a score of 6,117,740. Detailed results, including results per novelty can be found in the presentation slides. Results for the three award categories are in the table at the end of this page. 

For the first time since 2019, we ran a Man vs Machine Challenge again, where we test if the best AI agents can already beat human players. As in previous years, human players still won. The winner was Hannu Laaksonen with a score of 251,710, second place went to Joonseuk Lee (247,910), and third place to Felix Haase (241,900). The best performing AI agent, Felix's Bambirds agent only had a score of 144,630, Agent X only 84,370. 

That was the end of a very exciting competition. We saw many improvements and amazing shots. In the Standard Treck we saw a new and old champion, Bambirds, who won back their title after losing it in 2021. In the Novelty Track we saw a massive jump in performance from 2021 and one agent, OpenMind from SIFT, who dominated all categories, We hope to be able to test this again at next years competition at IJCAI 2023. 

We hope to see many improved agents and many new agents at our next competition in 2023. Angry Birds remains a very challenging problem for AI and the new Novelty Track makes it even more challenging. We are still waiting to see an exceptionally good deep learning agent. We encourage and challenge all members of the AI community to take on this problem and to develop AI that can successfully deal with a physical environment. See you in 2023! 

Jochen, Katya, Vimu, Chathura, Cheng and Peng. 

 

The main results of the competition can be found in the following table:   

Standard Treck        
         
Grand Final        
         
1. Bambirds   172,320      
2. Agent X   169,890  

 

 
         
         
Novelty Track        
         
 Novelty Reaction (out of 8000    novel games)     Novelty Detection  (max score 39)
         
 1. OpenMind 2022 20,244,030     1. OpenMind 2022  21.37
 2. Cimarron 2022 19,641,910     2. Dongqing 1  8.22
 3. Hydra 2022 14,259,000     3. Cimarron 2022  5.36
         
Non-novelty performance (out of 2500 non-novel games)       
         
 1. OpenMind 2022 22,582,050       
 2. Cimarron 2022 15,834,490       
 3. Hydra 2022 15,586,050