Artificially intelligent assistive agents are playing an increased role in our work and homes. In contrast to currently predominant conversational agents, whose intelligence derives from a reliance on dialogue trees and functionality modules, an autonomous domestic or workplace robot must carry out more complicated reasoning with more exhaustive knowledge of its surroundings. For example, a construction or military robot could be brought to new locations about which it knows very little, but be expected to carry out important tasks with minimal supervision. Such a robot must make good decisions, learn from experience, respond to feedback, and rely on feedback only as much as is necessary. In this thesis, we narrow the focus of a robot assistant to a simple room-tidying task in a simulated domestic environment. Given an item, the robot must choose where to put it among many destinations, then receive feedback from a human operator if the operator chooses to leave it. We frame the problem as a contextual bandit, a reinforcement learning approach frequently used in recommendation systems. We evaluate accuracy and time to more correct than incorrect decisions when the human operator leaves feedback every episode to only 25% of episodes. We evaluate learning from episodes where the human does not leave feedback. Additionally, we maintain historical episode data, and remove episodes without feedback when they are no longer useful. To improve early-episode performance, we incorporate a priori knowledge into action selection through commonsense reasoning with ConceptNet. We show that combining these methods in epsilon-greedy action selection can increase early-episode accuracy from approximately 10% to 40%, reduce the number of episodes before a majority of decisions are correct by 15%-28%, and increase the cumulative reward at this point by 80%. Additionally, we show that replacing epsilon-greedy action selection with LinUCB can achieve total accuracy of approximately 90%, even when feedback is left only 25% of the time.
|Commitee:||Leeds, Daniel, Weiss, Gary|
|Department:||Computer and Information Science|
|School Location:||United States -- New York|
|Source:||MAI 81/4(E), Masters Abstracts International|
|Subjects:||Artificial intelligence, Robotics, Computer science|
|Keywords:||Commonsense reasoning, ConceptNet, Contextual bandits, Infrequent feedback, LinUCB|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be