Dissertation/Thesis Abstract

Regularization Based Multitask Learning With Applications in Computational Biology
by Widmer, Christian, Dr.Nat., Technische Universitaet Berlin (Germany), 2015, 153; 10695936
Abstract (Summary)

In this work, we consider a problem that biologists are very good at: deciphering biological processes by integrating knowledge from experiments in different biological entities, such as organisms, tissues, tumor types or proteins, while respecting their differences and commonalities. We look at this problem from a supervised learning point of view, aiming to solve the same inference task in different biological entities. In this thesis, we investigate two branches of transfer learning: domain adaptation, where information is transferred from source tasks with abundant information to target tasks with little information, and multitask learning, where information is mutually shared between several tasks. In the case of domain adaptation, we show simple extensions of prevalent regularized risk minimization frameworks to handle information transfer and derive efficient solvers for classification and structured output learning. We present an algorithm tailored for the setting of hierarchical task relationships. This setting is particularly relevant to computational biology, where different tasks often correspond to different organisms, whose relationship is defined by a phylogeny. We perform experimental analyses on synthetic data sets, problems from sequence biology and prokaryotic gene finding to explore the properties of our algorithms and demonstrate their performance. Multitask learning, a machine learning technique that has recently received considerable attention, considers the problem of simultaneously learning models for several tasks that are related to each other. Using modern mathematical optimization techniques, we develop a general framework for multitask learning that encompasses a large number of existing multitask learning formulations and carefully explore useful special cases, including several novel formulations. A main feature of our general framework is the ability to learn or refine task similarities using non-sparse multiple kernel learning (MKL). We derive an efficient dual-coordinate descent solver for the special case of the hinge-loss, which brings the performance of state-of-the-art linear SVM solvers for binary classification to multitask learning. We explore the application of our framework to important problems ranging from computational immunotherapy, bioimaging and sequence biology. We further provide run-time experiments on a large range of data sets. As a whole, this thesis encompasses the design of transfer learning algorithms by means of carefully engineered regularization terms, the effort of creating and making available software that implements these ideas efficiently and the application to a plethora of practical problems, where transfer learning may be regarded as a principled way of obtaining more cost-effective predictors.

Indexing (document details)
Advisor: Müller, Klaus-RobertRätsch, Gunnar
School: Technische Universitaet Berlin (Germany)
School Location: Germany
Source: DAI-C 81/1(E), Dissertation Abstracts International
Subjects: Bioengineering, Computer Engineering, Bioinformatics
Keywords: Multiple kernel learning, Sequence biology
Publication Number: 10695936
ISBN: 9781392595503
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy