In this thesis, we will be exploring several topics in the field of Machine Learning with special attention to applications on biological data.
In the first part, the pre-validation method is being analyzed. Given a predictor of outcomes derived from a high dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the correct nominal level and achieves roughly the same power as the analytical test.
The second part considers the problems of estimating the parameters as well as the structure of binary-valued Markov networks. For maximizing the penalized log-likelihood, we implement an approximate procedure based on the pseudo-likelihood as defined by Besag and generalize it to a fast exact algorithm. Our results show that this procedure is faster than a competing exact method. We also find that the approximate pseudo-likelihood is much faster than the exact methods and only slightly less accurate.
Finally, a path algorithm for the Fused Lasso, an extension of the Lasso model, is being developed. The Fused Lasso adds an L 1 penalty with parameter λ2 on the difference of neighboring coefficients in the Lasso model, assuming there is a natural ordering. The algorithm calculates the whole solution path for the λ2 penalty with λ1 fixed. We also develop special versions for certain interesting cases that can be solved very efficiently.
|School Location:||United States -- California|
|Source:||DAI-B 70/07, Dissertation Abstracts International|
|Keywords:||Cross-validation, Machine learning, Markov networks, Prevalidation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be