Dissertation/Thesis Abstract

Learning to Play Cooperative Games via Reinforcement Learning
by Wei, Ermo, Ph.D., George Mason University, 2018, 162; 13420351
Abstract (Summary)

Being able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts.

This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of relative overgeneralization. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called lenient learning, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions.

I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods.

I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games.

Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.

Indexing (document details)
Advisor: Luke, Sean
Commitee: Axtell, Robert, Kosecka, Jana, Lien, Jyh-Ming
School: George Mason University
Department: Computer Science
School Location: United States -- Virginia
Source: DAI-B 80/07/(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Artificial intelligence, Computer science
Keywords: Adversarial learning, Deep learning, Multiagent learning, Reinforcement learning
Publication Number: 13420351
ISBN: 9780438887220
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest