Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts.
This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of relative overgeneralization. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called lenient learning, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions.
I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods.
I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games.
Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.
|Commitee:||Axtell, Robert, Kosecka, Jana, Lien, Jyh-Ming|
|School:||George Mason University|
|School Location:||United States -- Virginia|
|Source:||DAI-B 80/07/(E), Dissertation Abstracts International|
|Subjects:||Artificial intelligence, Computer science|
|Keywords:||Adversarial learning, Deep learning, Multiagent learning, Reinforcement learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be