Dissertation/Thesis Abstract

Topics in statistical learning
by Hofling, Holger, Ph.D., Stanford University, 2009, 132; 3364502
Abstract (Summary)

In this thesis, we will be exploring several topics in the field of Machine Learning with special attention to applications on biological data.

In the first part, the pre-validation method is being analyzed. Given a predictor of outcomes derived from a high dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows one to compare a newly derived predictor for disease outcome to standard clinical predictors on the same dataset. We study pre-validation analytically to determine if the inferences drawn from it are valid. We show that while pre-validation generally works well, the straightforward "one degree of freedom" analytical test can be biased and we propose a permutation test to remedy this problem. In simulation studies, we show that the permutation test has the correct nominal level and achieves roughly the same power as the analytical test.

The second part considers the problems of estimating the parameters as well as the structure of binary-valued Markov networks. For maximizing the penalized log-likelihood, we implement an approximate procedure based on the pseudo-likelihood as defined by Besag and generalize it to a fast exact algorithm. Our results show that this procedure is faster than a competing exact method. We also find that the approximate pseudo-likelihood is much faster than the exact methods and only slightly less accurate.

Finally, a path algorithm for the Fused Lasso, an extension of the Lasso model, is being developed. The Fused Lasso adds an L 1 penalty with parameter λ2 on the difference of neighboring coefficients in the Lasso model, assuming there is a natural ordering. The algorithm calculates the whole solution path for the λ2 penalty with λ1 fixed. We also develop special versions for certain interesting cases that can be solved very efficiently.

Indexing (document details)
Advisor: Tibshirani, Robert
School: Stanford University
School Location: United States -- California
Source: DAI-B 70/07, Dissertation Abstracts International
Subjects: Statistics
Keywords: Cross-validation, Machine learning, Markov networks, Prevalidation
Publication Number: 3364502
ISBN: 9781109242911