We consider autonomous robots as having associated control policies that determine their actions in response to perceptions of the environment. Often, these controllers are explicitly transferred from a human via programmatic description or physical instantiation. Alternatively, Robot Learning from Demonstration (RLfD) can enable a robot to learn a policy from observing only demonstrations of the task itself. We focus on interactive, teleoperative teaching, where the user manually controls the robot and provides demonstrations while receiving learner feedback. With regression, the collected perception-actuation pairs are used to directly estimate the underlying policy mapping.
This dissertation contributes an RLfD methodology for interactive, mixed-initiative learning of unknown tasks. The goal of the technique is to enable users to implicitly instantiate autonomous robot controllers that perform desired tasks as well as the demonstrator, as measured by task-specific metrics. With standard regression techniques, we show that such “on-par” learning is restricted to policies typified by a many-to-one mapping (a unimap) from perception to actuation. Thus, controllers representable as multi-state Finite State Machines (FSMs) and that exhibit a one-to-many mapping (a multimap) cannot be learnt. To be able to do so we must address the three issues of model selection (how many subtasks or FSM states), policy learning (for each subtask), and transitioning (between subtasks). Previous work in RLfD has assumed knowledge of the task decomposition and learned the subtask policies or the transitions between them in isolation.
We instead address both model selection and policy learning simultaneously. Our presented technique uses an infinite mixture of experts and treats the multimap data from an FSM controller as being generated from overlapping unimaps. The algorithm automatically determines the number of unimap experts (model selection) and learns a unimap for each one (policy learning). On data from both synthetic and robot soccer multimaps we show that the discovered subtasks can be used (switched between) to reperform the original task. While not at the same level of skill as the demonstrator, the resulting approximations represent significant improvement over ones for the same tasks learned with unimap regression.
|Advisor:||Jenkins, Odest Chadwicke|
|School Location:||United States -- Rhode Island|
|Source:||DAI-B 71/11, Dissertation Abstracts International|
|Subjects:||Robotics, Artificial intelligence|
|Keywords:||Machine learning, Multimap regression, Robot learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be