Machine Learning Research (GCI Task)

#1
Sources:
http://ai-depot.com/GameAI/Learning.html
https://link.springer.com/content/pdf/10.1007/s10994-006-8919-x.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.137.591&rep=rep1&type=pdf

The first source is a brief overview of how machine learning could be used in game design and how it works in general. Paintball is used as an example throughout the article. A “learning agent” is described as consisting of 4 elements: a curiosity element, learning element, performance element, and performance analyzer. The performance element is what makes the actual decisions based on stimuli received. The performance analyzer quantifies the outcome of the situation, essentially “scoring” the performance element. The learning element takes the score and tries to improve the performance element after each test. Finally, the curiosity element makes sure the learning agent doesn’t get stuck with bad habits that accomplish the task at hand, but not in the best possible way.

The second source is also very brief. The beginning explains the reasons why machine learning and games go hand-in-hand with each other and why machine learning for games is a good idea. It then lists several different ways to combine machine learning and games, including several of particular interest such as “learning about players” and “behavior capture of players.” After that is a summary of several other papers on more specific aspects of machine learning in games. Here, the first one is the most useful.

This source is an in-depth look at the possibilities and applications of machine learning in game design. It gives a list of four computational requirements (Speed, Effectiveness, Robustness, and Efficiency) and four functional requirements (Clarity, Variety, Consistency, and Scalability) which all effective implementations must meet. The main focus of the paper, however, is on dynamic scripting. This method maintains a “rulebase” for each agent class. Each time an agent is created, it receives its own set of rules. Each rule has a weight which represents its chance to be used in the agent. Each rule’s chance of being used is calculated independently, so an agent can have no rules, every rule, or anything in between. The rules within the rulebase always remain the same, instead, each rule’s weight is adjusted.

The applications for Terasology are limited in its present state, but machine learning could play a huge role in the final release of Terasology. In the “What is Terasology?” (https://github.com/MovingBlocks/Terasology/wiki/What-is-Terasology) wiki page, autonomous NPCs are referenced several times. The tech tree is planned to be “heavily aided by minions” and another feature described is “Autonomous NPC societies that grow on their own and can both be source of minions as well as valuable trading partners. Or be your greatest enemy…” Given how much of a role NPCs will play in the final game, it would be a shame to see them turn out like the NPCs in Minecraft, whose behavior, despite being complex, has become extremely predictable and exploitable over the years due to the fact that it is completely static. It would also be a great selling point for the game to have machine-learning controlled NPCs and would help a lot to set it apart from other Minecraft-like games.

A possible implementation for these NPCs’ machine learning would be to have several pre-trained agents to give the impression that these NPCs have had experiences in the past. Prior to the game's release, the agents would receive training, and these trained agents would be included in the game's files. Simpler NPCs which the agents would recognize as "players" might be able to shorten the process. For example, an NPC could be programmed to attack any other NPCs which come close to it, simulating an experience with an offensive player. The agents would be separated into groups which received similar training data, and each settlement of NPCs would pick a group at random with each individual selecting from that group. This would provide groups with an overall sense of identity while still preserving the individuality of each NPC. The training data would be the interaction with the player. Each group would be classified based on its experiences with players in the past and the amount of training it has received, representing the “age” of the settlement. The training program would still be active while the game is played though, so choices the player makes can still affect the NPCs’ behavior.

Going further into the specifics of a single agent, the rulebase for dynamic scripting could consist of rules describing how to interact with the player. For example, one rule may be "if the player buys an item, raise the price" and another may be "if I get attacked, run away." Other rules might not necessarily have to do with the player, for example "stock up on armor rather than weapons in case I get attacked." Each of these rules would be designed for certain purposes, for example the second rule might work if a player is only trying to take the town that the NPC lives in, but not if the player's goal is to actually kill the NPC at all costs. Using the rules of dynamic scripting, the performance element would act based on the rules, while the learning and curiosity elements modify the weight of the rules. The performance element would look at variables within the NPC, particularly health, hunger, thirst, and wealth. The first three are self explanatory, although thirst could possibly be excluded. How wealth is determined will depend on how the trading system is implemented. If an actual currency is added, NPCs would simply keep track of that. If no currency is implemented, then it could be based on how many items the NPC has, or a system of "value" could be implemented for each item in the game. At the beginning of each session, rules would be picked based on the weights. At the end of each session, the performance analyzer would give a score, and the learning and curiosity elements would reevaluate the weights based on the current score and rules and the scores obtained with previous rules. The length of a single "session" would be randomly determined at the beginning of each session and the sessions would run back-to-back with no time in between.

Another possible application of machine learning in Terasology would be an agent that actually plays the game. This would be a much more complex system, however. First, because there is the game's IO layer between the agent and the game itself, the agent needs to learn not only what actions in game produce a favorable outcome, but what inputs will cause that action. For example, the agent would need to learn the correct timing for pushing the spacebar if it wants to jump up a block. Similarly, the agent would need to learn to connect what is represented graphically on-screen to what is actually happening in game. For example, it would need to recognize that if the entire screen turns brown, it probably walked into a tree. Overall, having an AI play the game is feasible, but would require much more work.
 
Last edited:

vampcat

Moderator
Contributor
#2
Hello ContronThePanda!

Based on a quick read, I can see that the first 3 paragraphs of your post are about external links. However, you never really expand on that knowledge in the context of Terasology. For instance, what is the curiosity element, learning element, performance element, and performance analyzer for our NPC that you consider as an agent in the last paragraph? What kind of ruleset could the NPC have?

"The agents would be separated into groups which received similar training data"
What kind of training will you be providing to the agents? More specifically, how? Will you play the game a thousand times over for each NPC, to familiarize them with your behaviour? For some I can't see that being feasible.

Also, as a side question, what do you think about using Machine Learning to train an agent to play Terasology? Is it feasible?
 

MandarJ

New Member
Contributor
#4
Hey, I'm still not sure how you plan to train the agents. In an average game, a player will have very limited interactions with NPCs (not enough for effective training), so you can't rely on those as training data. How exactly would you train the agent? And what kind of data would you feed it?
 

vampcat

Moderator
Contributor
#5
As MandarJ said, I'm not sure how you're going to train the agents.

"The length of a single "session" would be randomly determined at the beginning of each session and the sessions would run back-to-back with no time in between."
So you plan on having a person sit down in front of a computer, and go through back-to-back sessions of interacting with the same NPC again and again, a million times over? Or do you plan on having this automated via scripts, which would lead to all NPCs being trained by the exact same input a million times over? I don't think you're understanding the practical problem here.

On top of that, regarding the feasibility of training an agent to play a game:
"For example, it would need to recognize that if the entire screen turns brown, it probably walked into a tree."
Which brown? The exact shade of which brown represents a tree, which brown represents dirt, which brown represents a door.. they'll be similar, maybe even exactly same, but have different meanings. To add to it, as time progresses in Terasology, the ambient lighting changes, leading to differences in color. So, a tree can be anything from almost white (due to bloom, when viewed against sun) to black (during night). If that wasn't enough, the we have multiple biomes, each of which can have different tree styles (and different ambient lighting). Do you see the problem here?
This isn't quite as easy as training a bot on a game of flappy bird.
 
#6
I think I mentioned it briefly in the post, but here's a better explanation. Prior to the game's release, the developers would provide some training to the agents. Of course, actually playing the game for long enough would get really tedious, especially when training multiple agents, so much simpler "fake players" (which would be driven by basic scripts) could be created to speed up the process. These would have basic pathfinding algorithms to seek out the NPCs, and the NPCs would recognize these as "players." Each one would have its own very basic algorithm for deciding what to do with an NPC. As soon as it finds one, it either attacks or buys a completely random item. Different groups would be trained by fake players with different probabilities to do either one. These probabilities would be static, i.e. each fake player would always have the same probability to do any given action. They don't learn, only the NPCs do. The fake players would have a few other basic rules to keep the NPCs from getting into bad habits, for example, if it gets killed by an NPC it will switch behavior that NPC from now on so it doesn't learn that killing players for no reason is OK and it learns that fighting and killing an aggressive player can lead to a better outcome (Like players, fake players respawn as well). Every now and then there would be a few human-controlled sessions to make sure that the agents don't get totally overfitted to the fake players. The sessions wouldn't necessarily be back to back during the development phase, but after release while the player is interacting with the NPC for real, the sessions would be placed back-to-back so that the player can interact with the NPC seamlessly. If the NPC needs to sleep, that could also be used as a gap between sessions, or also if the NPC's entity gets unloaded. Once the agents are trained, they will be included in the game files and used to pre-set the agents used for the actual game. From there, the new agents would recieve additional training data from their interaction with the player.

The last part was intentionally oversimplified; that still probably wasn't the best example though. My point was that while training a bot to play Terasology would be feasible, the fact that it has to interpret the game's graphics before it can react to anything. While this can be accomplished, and it is made easier with APIs like Serpent, it is much more difficult than having an in-game character accomplish tasks.