With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
COMING SOON! PQDT Open is getting a new home!
ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at www.proquest.com.
Questions? Please refer to this FAQ.
In this thesis, we will be exploring several topics in the field of Machine Learning with special attention to applications on biological data.
In the first part, the pre-validation method is being analyzed. Given a predictor of outcomes derived from a high dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the correct nominal level and achieves roughly the same power as the analytical test.
The second part considers the problems of estimating the parameters as well as the structure of binary-valued Markov networks. For maximizing the penalized log-likelihood, we implement an approximate procedure based on the pseudo-likelihood as defined by Besag and generalize it to a fast exact algorithm. Our results show that this procedure is faster than a competing exact method. We also find that the approximate pseudo-likelihood is much faster than the exact methods and only slightly less accurate.
Finally, a path algorithm for the Fused Lasso, an extension of the Lasso model, is being developed. The Fused Lasso adds an L 1 penalty with parameter λ2 on the difference of neighboring coefficients in the Lasso model, assuming there is a natural ordering. The algorithm calculates the whole solution path for the λ2 penalty with λ1 fixed. We also develop special versions for certain interesting cases that can be solved very efficiently.
Advisor: | Tibshirani, Robert |
Commitee: | |
School: | Stanford University |
School Location: | United States -- California |
Source: | DAI-B 70/07, Dissertation Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Statistics |
Keywords: | Cross-validation, Machine learning, Markov networks, Prevalidation |
Publication Number: | 3364502 |
ISBN: | 978-1-109-24291-1 |