In this work, we consider a problem that biologists are very good at: deciphering biological processes by integrating knowledge from experiments in different biological entities, such as organisms, tissues, tumor types or proteins, while respecting their differences and commonalities. We look at this problem from a supervised learning point of view, aiming to solve the same inference task in different biological entities. In this thesis, we investigate two branches of transfer learning: domain adaptation, where information is transferred from source tasks with abundant information to target tasks with little information, and multitask learning, where information is mutually shared between several tasks. In the case of domain adaptation, we show simple extensions of prevalent regularized risk minimization frameworks to handle information transfer and derive efficient solvers for classification and structured output learning. We present an algorithm tailored for the setting of hierarchical task relationships. This setting is particularly relevant to computational biology, where different tasks often correspond to different organisms, whose relationship is defined by a phylogeny. We perform experimental analyses on synthetic data sets, problems from sequence biology and prokaryotic gene finding to explore the properties of our algorithms and demonstrate their performance. Multitask learning, a machine learning technique that has recently received considerable attention, considers the problem of simultaneously learning models for several tasks that are related to each other. Using modern mathematical optimization techniques, we develop a general framework for multitask learning that encompasses a large number of existing multitask learning formulations and carefully explore useful special cases, including several novel formulations. A main feature of our general framework is the ability to learn or refine task similarities using non-sparse multiple kernel learning (MKL). We derive an efficient dual-coordinate descent solver for the special case of the hinge-loss, which brings the performance of state-of-the-art linear SVM solvers for binary classification to multitask learning. We explore the application of our framework to important problems ranging from computational immunotherapy, bioimaging and sequence biology. We further provide run-time experiments on a large range of data sets. As a whole, this thesis encompasses the design of transfer learning algorithms by means of carefully engineered regularization terms, the effort of creating and making available software that implements these ideas efficiently and the application to a plethora of practical problems, where transfer learning may be regarded as a principled way of obtaining more cost-effective predictors.
|Advisor:||Müller, Klaus-RobertRätsch, Gunnar|
|School:||Technische Universitaet Berlin (Germany)|
|Source:||DAI-C 81/1(E), Dissertation Abstracts International|
|Subjects:||Bioengineering, Computer Engineering, Bioinformatics|
|Keywords:||Multiple kernel learning, Sequence biology|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be