The University of Texas at Austin

UTCS Artificial Intelligence

Labs Projects People Publications Talks Software Courses Demos
Multi-objective Neuroevolution of NPCs
Multi-objective Neuroevolution of NPCs screenshot

Neuroevolution with multiple objectives allows for populations of agents to evolve different solutions based on various trade-offs inherent in any given domain. In particular, these methods can be used to evolve complex NPC behavior for games involving multiple objectives.

Multi-objective neuroevolution has currently been applied to a battle game domain in which a player fights against several attacking monster agents by using a bat as a weapon. As work on this project continues, multi-objective methods will be applied to increasingly complicated/difficult games with more objectives.


The player controls the green agent, and evolved neural networks control the yellow monsters. The monsters try to attack the player, and the player tries to hit them with a bat. The challenge is to evolve complex behavior for the monsters that combines several objectives, such as attacking, assisting others in attacking, staying alive, and avoiding getting hit.

In the movies below, the player is controlled by a static computer strategy. It constantly moves towards the nearest monster in front of it while swinging its bat.

Baiting Strategy

A monster being chased turns by small amounts to avoid facing the player as it backs up. The player turns to pursue it, which enables the player to eventually catch up with the monster, since turning slightly while moving puts less distance between player and monster than moving straight. However, since the player also turns to pursue the monster, the monsters chasing the player eventually catch up. Sometimes the monster being pursued will incur damage as a result of these slight turns, but regardless of whether the bait takes damage or not, any monsters pursuing from behind are eventually able to catch up and attack. One monster takes a risk so that the rest can catch up. Once the player is hit, the monsters are usually able to bounce the player back and forth for a few hits before it regains enough control to start pursuing a new monster.

Charging Strategy

Although monsters using the baiting strategy avoid damage and do a good job inflicting damage, they waste a lot of simulation time avoiding the player when they could be hitting it.

In some simulations a riskier strategy was learned that does not have this problem. It turns out that if the player moves forward after being hit, the monster that hit it can rush the player and hit it again in mid-swing. Being hit cancels the player's swing and knocks it back again. As long as the player rushes forward it is vulnerable to attack, because moving forward gives the monster just enough time to reach the player before the bat reaches the monster.

Wait and Strike Strategy

Recent experiments have revealed another effective strategy learned by multi-objective neuroevolution in this domain. This strategy involves waiting for just the right moment to rush in and strike the player.

Monsters start out surrounding the player, and approach it if they are not at risk of being hit by the bat. If the player is swinging at them they try to spread out off to the sides. If a monster can angle itself directly towards the player's left side, it will rush in to hit. This works because the player's bat swings from right to left, leaving a small opening on the left side. The manner in which the monsters spread out around the player also makes it easier for other monsters to find their chance to rush in.

After rushing in, these monsters will sometimes knock the player back several times the way that charging monsters do.

A Subsymbolic Model of Schizophrenic Language
A Subsymbolic Model of Schizophrenic Language screenshotThe DISCERN system is trained to paraphrase stories consisting of multiple scripts and including emotional content and self-reference; it is then lesioned in various ways to model possible underlying causes of schizophrenia. This demo allows the user to specify the lesions and their intensity, and observe the resulting impaired storytelling graphically and in individual stories.

Learning in Fractured Domains
Learning in Fractured Domains screenshotHand-coded keepaway policy:


Keepaway policy evolved with Cascade-NEAT:


Half-field soccer policy evolved with Cascade-NEAT:

Evolving Cooperation in Multiagent Systems
Evolving Cooperation in Multiagent Systems screenshotIn tasks such as pursuit and evasion, multiple agents need to coordinate their behavior to achieve a common goal. Using the Multi-agent ESP method, such agents can be effectively evolved in separate networks, rewarded together as a team. This demo shows two examples of evolved behavior in the prey-capture task in a toroidal grid world.

In the role-based animation, the predator agents (red, green, and blue squares) do not sense each other directly. Instead, they learn to coordinate through stigmergy, i.e. through changes in the environment that result from their actions. The red agent has learned the role of a blocker, waiting in the prey s (shown as X) path. The other two are chasers, driving the prey towards the blocker until the prey has nowhere to run (remember the world is a toroid). This kind of role-based cooperation is easier to learn, more robust, and more effective than communication-based cooperation in this task. The team learns behavior similar to a well-trained soccer team, where the players know what to expect from their teammates, making direct communication unnecessary.

In the communication-based animation, the predators broadcast their locations to all other predators; their coordination is therefore based on communication. They predators all first chase the prey vertically, from different directions, forcing it to flee horizontally in the end. At that point, the red agent assumes the behavior of the blocker and the other two chase the prey towards it until it is caught between them (the world wraps around at that point). In this typical behavior of communicating agents, the team members use different strategies at different times. The behavior is more flexible, but harder to learn, and not as robust nor as effective. It resembles play in pickup soccer, where the players have to constantly observe what their teammates are doing and adapt to it.

The conclusion is that role-based cooperation is a surprisingly effective approach in certain multi-agent domains like the prey capture.


Multi-modal Behavior in NPCs
Multi-modal Behavior in NPCs screenshot
In this game the player controls the green agent, and evolved neural networks control the NPCs, which are yellow (Fight mode) or red (Flight mode). When the NPCs are yellow, they try to attack the player while the player tries to hit them with a bat. When the NPCs are red they must attack the player while keeping it from escaping (the player no longer has a bat). The challenge is to evolve multi-modal behavior for NPCs so that they can accomplish both tasks.

In the movies below, the player is controlled by a static computer strategy. In Fight mode, this bot constantly moves towards the nearest NPC in front of it while swinging its bat. In Flight mode the bot moves backwards away from the nearest NPC in front of it (away from what it can see).

a. ModeMutation Result

ModeMutation is mutation operator for neuroevolution that adds a new output mode to the output layer of a neural network in order to encourage multi-modal behavior. In the movie, the green bot plays the Fight or Flight game, and the NPCs try to kill the bot. The NPCs have evolved several distinct output modes, and each number displayed on an NPC indicates a different output mode. The NPCs use a baiting strategy against the bot. Notice that the output mode of the NPCs chasing the bot is 0, but that the output mode of the bait alternates between 1 and 2. When the NPCs become red, they are in the Flight task. Here the NPCs generalize the mode 0 behavior to work in the Flight task as well. They have to effectively corral the bot to keep it from escaping.

b. 1Mode Result 1

This shows the comparatively poor performance of the 1Mode method (only one network mode). Each network in the population tends to specialize in certain objectives, but none does particularly well in all of them. These NPCs develop a boring behavior for the Fight trial that avoids all contact with the bot. They do no damage to it, though they also take no damage and manage to live through the whole trial. However, the NPCs do well in the Flight trial that follows and exhibit corralling behavior. Doing damage in the Fight trial is sacrificed for the sake of the other objectives, thus putting this individual near the edge of the trade-off surface between objectives.

c. 1Mode Result 2

Another movie showing how the 1Mode method does not simultaneously optimize all objectives. This network meets the goal of killing the bot once (50 damage), but all NPCs are killed! Also, in the Flight trial following the Fight trial, the NPCs perform very poorly and let the bot escape very quickly. NPCs in this same 1Mode population that actually did well in the Flight task almost invariably did poorly in the Fight task. This is how the population as a whole could reach the required goals in terms of average performance, yet still fail to contain individuals that do well in all objectives.

INSOMNet Demo and package
INSOMNet Demo and package screenshotINSOMNet is a subysmbolic sentence processing system that produces explicit and graded semantic graph representations. The novel technique of semantic self-organization allows the network to learn typical semantic dependencies between nodes in a graph that helps the INSOMNet process novel sentences. The technique makes it possible to assign case roles flexibly, while retaining the cogntively plausible behavior that characterizes connectionist modeling. INSOMNet has been shown to scale up to to sentences of realistic complexity, including those with dysfluencies in the input and damage in the network. The network also exhibits the crucial cognitive properties of incremental processing, expectations, semantic priming, and nonmonotonoic revision of an interpretation during sentence processing. INSOMNet therefore constitutes a significant step towards building a cogntive parser that works with everyday language that people use.

Geoquery
Geoquery screenshotA natural-language system that answers questions on US Geography.

Restaurant Query
Restaurant Query screenshotA natural-language system that answers questions about restaurants in the California Bay area

Learning to Sportscast
Learning to Sportscast screenshotCurrent state-of-the-art language learners require annotated corpora as training data. However, constructing such corpora is difficult and time-consuming. On the other hand, children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment. By connecting words and phrases to objects and events in the world, the semantics of language is grounded in perceptual experience (Harnad, 1990). Ideally, a machine learning system could learn language in a similar manner. Our ultimate goal is to build a system that can exploit the large amount of linguistic data available naturally in the world with minimal supervision. Although there has been some interesting computational work in grounded language learning (Roy, 2002; Bailey et al., 1997; Yu & Ballard, 2004), most of the focus has been on dealing with raw perceptual data and the complexity of the language involved has been very modest. To help make progress, we study the problem in a simulated environment that retains many of the important properties of a dynamic world with multiple agents and actions while avoiding many of the complexities of robotics and vision. Specifically, we use the Robocup simulator which provides a fairly detailed physical simulation of robot soccer. Our immediate goal is to build a system that learns to semantically interpret and generate language in the Robocup soccer domain by observing an on-going commentary of the game paired with the dynamic simulator state. While several groups have constructed Robocup commentator systems (Andre et al., 2000) that provide a textual natural-language (NL) transcript of the simulated game, their systems use manually-developed templates and are incapable of learning.