Sequential prediction problems arise commonly in many areas of robotics and information processing: e.g., predicting a sequence of actions over time to achieve a goal in a control task, interpreting an image through a sequence of local image patch classifications, or translating speech to text through an iterative decoding procedure.
Learning predictors that can reliably perform such sequential tasks is challenging. Specifically, as predictions influence future inputs in the sequence, the data-generation process and executed predictor are inextricably intertwined. This can often lead to a significant mismatch between the distribution of examples observed during training (induced by the predictor used to generate training instances) and test executions (induced by the learned predictor). As a result, naively applying standard supervised learning methods—that assume independently and identically distributed training and test examples—often leads to poor test performance and compounding errors: inaccurate predictions lead to untrained situations where more errors are inevitable.
This thesis proposes general iterative learning procedures that leverage interactions between the learner and teacher to provably learn good predictors for sequential prediction tasks. Through repeated interactions, our approaches can efficiently learn predictors that are robust to their own errors and predict accurately during test executions. Our main approach uses existing no-regret online learning methods to provide strong generalization guarantees on test performance.
We demonstrate how to apply our main approach in various sequential prediction settings: imitation learning, model-free reinforcement learning, system identification, structured prediction and submodular list predictions. Its efficiency and wide applicability are exhibited over a large variety of challenging learning tasks, ranging from learning video game playing agents from human players and accurate dynamic models of a simulated helicopter for controller synthesis, to learning predictors for scene understanding in computer vision, news recommendation and document summarization. We also demonstrate the applicability of our technique on a real robot, using pilot demonstrations to train an autonomous quadrotor to avoid trees seen through its onboard camera (monocular vision) when flying at low-altitude in natural forest environments.
Our results throughout show that unlike typical supervised learning tasks where examples of good behavior are sufficient to learn good predictors, interaction is a fundamental part of learning in sequential tasks. We show formally that some level of interaction is necessary, as without interaction, no learning algorithm can guarantee good performance in general.
|Advisor:||Bagnell, J. Andrew|
|Commitee:||Atkeson, Christopher G., Gordon, Geoffrey J., Langford, John|
|School:||Carnegie Mellon University|
|School Location:||United States -- Pennsylvania|
|Source:||DAI-B 75/02(E), Dissertation Abstracts International|
|Subjects:||Statistics, Robotics, Artificial intelligence|
|Keywords:||Imitation learning, Online learning, Reinforcement learning, Sequential prediction|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be