PubH 7475/8400 Home Page
PubH 7475/8400 Statistical Learning and Data Mining
Spring 2004 Updates
Instructor:
Dr. Wei Pan, weip@biostat.umn.edu
Class: 12:45-2:00pm T&Th, MoosT 2-580
- You can now go to the Biostat frontdesk at Mayo A460
to pick up your HWK5 and course project report.
Congratulations to everyone for your excellent work!
- Week 15 plan: Student presentations.
- HWK5 due on May 10.
Note: You only need to get clustering results for hierarchical
clustering in Problem 1, and no need to assess 1) #clusters or 2) predictive
performance; however, for other two clustering methods, you need to do
1) and 2) (and it's ok to do so using any simple methods).
- Week 13 plan: Unsupervised learning (Chapter 14) and semi-supervised
learning;
-
Download:
Wang, J., and Shen, X. (2006). Large margin semi-supervised learning.
-
Download:
Wang, J., Shen, Z., and Pan, W. (2006). On transductive support vector
machines.
-
Download:
Wei Pan, Xiaotong Shen (2007).
Semi-Supervised Learning via Constraints"
Research Report 2007-007, Division of Biostatistics,
University of Minnesota, 2007.
-
Download:
Wei Pan, Xiaotong Shen, Aixiang Jiang, and Robert P. Hebbel (2006).
Semi-supervised learning via penalized mixture model with application
to microarray sample classification.
Bioinformatics, 22, 2388-2395.
- Week 14 plan: Model selection criteria (Chapter 7);
-
Download:
Efron, B. (2004).
The Estimation of Prediction Error: Covariance Penalties and Cross-Validation
(with discussion).
JASA, 99, 619-632.
-
Download:
Edward. George and Dean P. Foster (2000). Calibration and empirical Bayes variable selection. Biometrika, 87, 731-747.
-
Download:
Shen, Xiaotong; Huang, Hsin-Cheng (2006).
Optimal Model Assessment, Selection, and Combination.
JASA, 101, 554-568.
-
Download:
Shen X., Ye J. (2002). Adaptive Model Selection.
JASA, 97, 210-221.
-
Download:
Yuan, Zheng; Yang, Yuhong (2005).
Combining Linear Regression Models: When and How?
JASA, 100, 1202-1214.
- Download:
Yuhong Yang (2005). Can the strengths of AIC and BIC be shared?
A conflict between model identification and regression estimation.
Biometrika, 92, 937-950.
-
Download:
Christophe Ambroise and Geoffrey J. McLachlan (2002).
Selection bias in gene extraction on the basis of microarray
gene-expression data.
PNAS,
- HWK4 due on April 26.
- Week 12: SVM (Chapter 12);
-
Download:
Wang, L., and Shen, X. (2006). On L1-norm multi-class support vector machines: methodology and theory. To appear in JASA.
-
Download:
Shen, X., Tseng, G.C., Zhang, X., Wong, W.H. (2003).
On psi-Learning.
JASA, 98, 724-734.
- (Review)
Download:
Javier M. Moguerza and Alberto Muqoz (2006).
Support Vector Machines with Applications.
Statistical Science, 21, 322-336.
Comments and rejoiner, 337-362.
- Week 11: Neural networks (Chapter 11); Support vector machines
(Chapter 12)
- (Review)
Download:
Bing Cheng, D. M. Titterington (1994).
Neural Networks: A Review from a Statistical Perspective.
Statistical Science, 9, 2-30.
Comments and rejoiner. 31-54.
- (Review)
Download:
B. D. Ripley (1994).
Neural Networks and Related Methods for Classification.
JRSS-B, 56, 409-456.
- Week 10 plan: MART (Chapter 10); Neural networks (Chapter 11)
- HWK3 due on April 10
- Week 9:
AdaBoosting (10.1); Th's class cancelled.
- Week 8:
bagging (8.7; Pan, 1999),
random forest (Breiman).
- Week 7: Ensemble methods: Bayes model averaging (BMA) and
stacking (8.8), ARM (Yang, 2003),
input-dependent weighting (Pan et al 2006).
- Download:
Saharon Rosset, Ji Zhu, Trevor Hastie (2004).
Boosting as a Regularized Path to a Maximum Margin Classifier.
JMLR 5:941--973.
- Download
Friedman's MART papers.
- Download
Breiman's bagging paper (TR #421) and
random forest paper (TR #567).
- Pan W (1999).
Shrinking Classification Trees for Bootstrap Aggregation.
Pattern Recognition Letters, 20: 961-965.
Download:
as Research Report 1998-006, Division of Biostatistics,
University of Minnesota.
-
Pan W, Xiao G and Huang X (2006).
Input Dependent Weights for Model Combination
and Model Selection with Multiple Sources of Data.
Statistica Sinica, 16:523-540.
Download:
as Research Report 2004-029, Division of Biostat, U of Minnesota.
- Download:
Yang, Y (2003). Regression with multiple candidate models: selecting or mixing? Statistica Sinica, vol. 13, 783-809.
- Download:
Yuhong Yang (2001). Adaptive regression by mixing, JASA, vol. 96, 574-588.
- Week 6: Classification and regression trees (CART) (9.2);
Generalized Additive Models (GAM) (9.1).
- Week 5:
linear regression (4.2), logistic regression (4.4).
CART (9.2).
- HWK2 due on Feb 27
- Week 4: methods based on derived inputs: PCR, PLS;
linear models for classification:
intro (4.1), linear regression (4.2), LDA and QDA (4.3);
RDA (4.3.1), nearest shrunken centroid (PAM).
- Download:
Peter J. Bickel, Elizaveta Levina (2004),
Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations.
Bernoulli, 10, 989--1010.
- Download
Tibshirani et al's shrunken centroid paper (in year 2002).
- Week 3: Linear regression: LS (3.1-3.3); Subset selection,
shrinkage methods: ridge, Lasso (3.4.1-3.4.5);
elastic net (Zou and Hastie 2005), adaptive LASSO (Zou 2006)
- Download:
Fan, J. and Fan, Y. (2007). High dimensional classification using features annealed independence rules. Manuscript.
- Download:
Eitan Greenshtein, Ya'Acov Ritov (2004),
Persistence in high-dimensional linear predictor selection and the virtue of overparametrization.
Bernoulli, 10, 989--1010.
- Download:
Hui Zou (2006),
The Adaptive Lasso and Its Oracle Properties. JASA,
101, 418-1429.
- Download:
Hui Zou and Trevor Hastie (2005),
Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B,
67, 301-320.
- Download:
Xiaohong Huang and Wei Pan (2003),
Linear regression and two-class classification with gene expression data.
Bioinformatics, 19, 2072-2078.
- Download:
Dudoit S., Fridlyand J, Speed T. P. (2002).
Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data.
JASA, 97, 77-87.
- HWK1 due on Jan 30
- Download
Breiman L. (2001), Statistical Modeling: The Two Cultures
(with comments and a rejoinder by the author). Statist. Sci. 16, iss. 3,
199-231.
- Download
Hand, D.J. (2006), Classifier Technology and the Illusion of Progress
(with comments and a rejoinder by the author). Statist. Sci. 21, iss. 1,
1-34.
- Week 2: Curse of dimesionality (2.5);
Model selection and assessment (2.7, 2.8.1, 2.9, 7.10);
- Week 1: Introduction (Chapter 1); Overview (2.1-2.3, p.21).