PubH 7475/8475 Update Page
PubH 7475/8475/Stat 8933 Statistical Learning and Data Mining
Spring 2018 Updates
Instructors:
Dr. Wei Pan, weip@biostat.umn.edu
Dr. Xiaotong Shen, xshen@umn.edu
Class: 9:45 AM - 11:00 AM, M&W, G55 Peik Gymnasium Hall
- Why Medicine Needs Deep Learning - Brendan Frey.
Lecture video
- Deep learning for genomics: Introduction and examples - James Zou.
Lecture video
- Stanford University CS231n, Spring 2017,
Lectures. Lectures 5 and 9 on CNNs.
- Final course project report is due on May 7th,
by 4:00pm in the Schoool of Statistics Main Office, Ford 313.
- Mid-term exam is scheduled on Monday of Week 8 (March 5, 2018).
It will be a closed-book exam: no books or notes allowed.
- HWK4 due on March 21.
- Weeks 7-8: Unsupervised learning (Chapter 14); semi-supervised
learning;
notes 1,
notes 2.
-
Download:
Wagstaff et al (2001). Constrained K-means Clustering with Background Knowledge.
-
Download:
Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect
population stratification. Frontiers in Genetics. 4:215. doi:10.3389/fgene.2013.00215.
-
Download:
Wang J, Shen X, Pan W. (2009). On efficient large margin semisupervised learning: method and theory. Journal of Machine Learning Research, 10, 719-742.
-
Download:
Wang, J., Shen, X., and Pan, W. (2006). On transductive support vector
machines. Contemp. Math., 43, 7-19.
-
Download:
Wei Pan, Xiaotong Shen, Aixiang Jiang, and Robert P. Hebbel (2006).
Semi-supervised learning via penalized mixture model with application
to microarray sample classification.
Bioinformatics, 22, 2388-2395.
-
Download:
Tibshirani R, Walther G (2005). Clustering
validation by prediction strength. JCGS, 14, 511-528.
-
Download:
Liu Y, Hayes DN, Nobel A, Marron JS (2012). Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data. JASA, 103, 1281-1293.
-
Download:
McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. (2002).
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.
Bioinformatics. 18(11):1462-9.
-
Download:
Candes EJ, Li X, Ma Y, Wright J (2009).
Robust Principal Component Analysis?
-
Download:
Shen Y, Wen Z, Zhang Y (2011). Augmented Lagrangian alternating direction method for
matrix separation based on low-rank factorization.
- (Review)
Download:
Zhang Z, Jordan MI (2008). Multiway spectral clustering: a margin-based perspective.
Stat Sci, 23, 383-403.
- (Review)
Download:
von Luxburg. A tutorial on spectral clustering.
-
Download:
Ng AY, Jordan MI, Weiss Y. On spectral clustering: nalysis and an algorithm.
-
Download:
Pan W, Shen X, Liu B (2013). Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.
Journal of Machine Learning Research, 14, 1865-.
-
Download:
Pan W, Shen X (2007). Penalized Model-Based Clustering with Application to Variable
Selection. Journal of Machine Learning Research, 8, 1145-1164.
-
Download:
Li J, Ray S, Lindsay BG (2007). A nonparametric statistical approach to clustering via
mode identification. Journal of Machine Learning Research, 8, 1687-1723.
- (Review)
Download:
Xu R, Wunsch D (2005). Survey of clustering algorithms.
IEEE Transaction on Neural Networks, 16, 645-678.
- Week 6:
Support vector machines (Chapter 12); Neural networks (Chapter 11 and CNNs).
notes 1,
notes 2.
Mengli's
slides
and
example on CNNs in R.
-
Download:
LeCun et al (1998). Gradiatent-based learning applied to document recognition.
{\em Proc of IEEE}. (Cooment: Section I. p.5-7 most helpful to understand
convolutional NNs.)
-
Download:
Krizhevsky A, Sutskever I, Hinton G. (2012).
ImageNet Classification with Deep Convolutional Neural Networks. NIPS.
-
Download:
Zhou J and Troyanskaya OG (2015). Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12, 931-934.
-
Download:
Silver et al. (2016).
Mastering the game of Go with deep neural networks and tree search.
Nature, 529, 484-489.
-
Download:
Wang, L., and Shen, X. (2007). On L1-norm multi-class support vector machines: methodology and theory. JASA, 102, 583-594.
-
Download:
Shen, X., Tseng, G.C., Zhang, X., Wong, W.H. (2003).
On psi-Learning.
JASA, 98, 724-734.
-
Wang J, Shen X, Liu Y. (2008).
Probability estimation for large margin classifiers. Biometrika. 95, 149-167.
-
Download:
Geman S, Bienenstock E, Doursat R (1992). Neural networks and the bias/variance dilemma.
Neural Computation, 4, 1-58.
-
Download:
Kondor RI, Lafferty J. Diffusion kernels on graphs and other discrete structures.
- (Review)
Download:
Hofmann T, Scholkopf B and Smola AJ (2008).
Kernel methods in machine learning.
Annals of Statistics, 36, 1171-1220.
- (Review)
Download:
Javier M. Moguerza and Alberto Muqoz (2006).
Support Vector Machines with Applications.
Statistical Science, 21, 322-336.
Comments and rejoiner, 337-362.
- (Review)
Download:
Bing Cheng, D. M. Titterington (1994).
Neural Networks: A Review from a Statistical Perspective.
Statistical Science, 9, 2-30.
Comments and rejoiner. 31-54.
- (Review)
Download:
B. D. Ripley (1994).
Neural Networks and Related Methods for Classification.
JRSS-B, 56, 409-456.
- HWK3 due on Feb 28
- Week 5: Random forest (Chapter 15); Boosting (Chapter 10).
- Download
Breiman L (2001). Random forests. Machine Learning, 45, 5-32.
- Download R package gbm vignette.
- Download
Friedman's MART papers.
- Download:
Saharon Rosset, Ji Zhu, Trevor Hastie (2004).
Boosting as a Regularized Path to a Maximum Margin Classifier.
JMLR 5:941--973.
- Week 4:
Classification and regression trees (CART) (9.2);
Bagging (8.7);
If time permits, model averaging: Bayes model averaging (BMA) and
stacking (8.8), ARM (Yang, 2003).
notes 1,
notes 2
- Download
Loh W-Y (2014). Fifty years of classification and regression trees (with discussion), International Statistical Review, 34, 329-370.
- Download
Breiman L (1996). Bagging predictors. Machine Learning, 24, 123-140.
- Download
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999).
Bayesian model averaging: a tutorial (with comments). Stat Sci, 14:362-417.
- Download:
Yang Y (2003). Regression with multiple candidate models: selecting or mixing? Statistica Sinica, vol. 13, 783-809.
- Download:
Yang Y (2001). Adaptive regression by mixing, JASA, vol. 96, 574-588.
- Download:
Shen X, Huang H-C (2006)
Optimal model assessment, selection and combination. JASA 101:554-568.
- Download:
Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P (2007)
Random-set methods identify distinct aspects of the enrichment signal in
gene-set analysis. Annals of Applied Statistics, 1:85-106.
- Download:
Pan W, Kim J, Zhang Y, Shen X, Wei P (2014)
A powerful and adaptive association test for rare variants. Genetics, 197(4):1081-1095.
-
Download:
Pan W, Xiao G and Huang X (2006).
Input Dependent Weights for Model Combination
and Model Selection with Multiple Sources of Data.
Statistica Sinica, 16:523-540.
-
Download:
Zhang Y, Yang Y (2015).
Cross-validation for selecting a model selection procedure.
To appear in J of Econometrics.
- HWK2 due on Feb 12
- Week 3: Linear regression: LS (3.1-3.2); Subset selection (3.3),
shrinkage methods: ridge, Lasso (3.4.1-3.4.3);
other penalties (3.8): SCAD (Fan and Li 2001), elastic net (Zou and Hastie 2005),
adaptive LASSO (Zou 2006), TLP (Shen et al 2012), group lasso, fused lasso...;
SIS;
methods based on derived inputs (3.5-3.6): PCR, PLS.
notes
- Download:
Fan J, Lv J (2008). Sure independence screening for ultrahigh dimensional feature space.
JRSS-B 70, 849-911.
- Download:
Zhu L-P, Li L, Li R, Zhu L-X. (2011). Model-Free Feature Screening for
Ultrahigh-Dimensional Data. JASA 106, 1464-1475.
- Download:
Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. Journal of American Statistical Association. 107, 1129 - 1139.
- Download:
Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and
its oracle properties.
Journal of the American Statistical Association 96 (456), 1348-1360.
- Download:
Eitan Greenshtein, Ya'Acov Ritov (2004),
Persistence in high-dimensional linear predictor selection and the virtue of overparametrization.
Bernoulli, 10, 989--1010.
- Download:
Zou H (2006),
The Adaptive Lasso and Its Oracle Properties. JASA,
101, 418-1429.
- Download:
Zou H, Hastie T (2005),
Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B,
67, 301-320.
- Download:
Austin E, Pan W, Shen X. (2013).
Penalized Regression and Risk Prediction in Genome-Wide Association Studies.
Stat Anal Data Min. 6(4). doi: 10.1002/sam.11183.
- Download:
Zhu Y, Shen X, Pan W (2013).
Simultaneous grouping pursuit and feature selection over an undirected graph.
JASA, 108, 713-725.
- Download:
Kim S, Pan W, Shen X (2013).
Network-based penalized regression with application to genomic data.
Biometrics. 69(3), 582-593.
- Download:
Friedman J, Hastie T, Hoefling H, Tibshirani R (2007). Pathwise Coordinate Optimization.
The Annals of Applied Statistics, 2(1), 302–332.
- Download:
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
- Download:
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, (2011).
Distributed optimization and statistical learning via the alternating direction method
of multipliers.
Foundations and Trends in Machine Learning, 3(1):1-122.
- Download:
Huang X, Pan W, Park S, Han X, Miller LW, Hall J. (2004).
Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares.
Bioinformatics. 20(6):888-94. (proposed a simple penalized PLS method)
- Download:
Chun H and Keles S (2010).
Sparse partial least squares regression for simultaneous dimension reduction and variable selection. JRSS-B, 72(1):3-25. (R packages "spls")
- Week 2:
Overview (2.1-2.3);
Model selection and assessment (2.9, 7.10);
read Curse of dimesionality (2.5).
Linear models for classification:
intro (4.1), linear regression (4.2), LDA and QDA (4.3);
RDA (4.3.1), nearest shrunken centroid (18.2), logistic regression (4.4),
penalized logistic regression (18.3.2, 18.4).
notes
- Download:
Banfield JD, Raftery AE (1993). Model-based Gaussian and non-gaussian clustering.
Biometrics, 49, 803-821.
Note:
Section 2 contains a discussion on the eigen-decomposition of a covariance matrix of a Normal
distribution.
- Download:
Peter J. Bickel, Elizaveta Levina (2004),
Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations.
Bernoulli, 10, 989--1010.
- Download
Tibshirani et al. (2002). Diagnosis of multiple cancer types by shrunken
centroids of gene expression. PNAS, 99, 6567-6572.
- Download:
Xiaohong Huang and Wei Pan (2003),
Linear regression and two-class classification with gene expression data.
Bioinformatics, 19, 2072-2078.
- Download:
Dudoit S., Fridlyand J, Speed T. P. (2002).
Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data.
JASA, 97, 77-87.
- Download:
Fan J, Fan Y (2008). High-dimensional classification using features annealed independence rules. Ann Statist, 36, 2605-2637.
- Download:
Mai, Q., Zou, H., and Yuan, M. (2012). A Direct Approach to Sparse Discriminant Analysis in Ultra-high Dimensions. Biometrika, 99(1), 29-42.
- HWK1 due on Jan 31
Note: A hardcopy of your HWK answers plus code is due
at the beginning of the class on the due date; you can
have separate files/pages for the answers and code, or you can mark
out (or hand-print) your answers with the mixed code and output.
Again no late HWK is accepted unless with a prior approval
or a legitimate reason (e.g. illness).
- Download
WSJ: Big Data Is on the Rise, Bringing Big Questions.
(A subscription may be needed.)
- Download
WSJ: Big Data's Big Problem: Little Talent.
(A subscription may be needed.)
- Download
McKinsey Global InstituteJune 2011 Big data: The next frontierfor innovation, competition,and productivity.
- Download
Donoho D. (2015), 50 years of Data Science.
- Download
Breiman L. (2001), Statistical Modeling: The Two Cultures
(with comments and a rejoinder by the author). Statist. Sci. 16, iss. 3,
199-231.
- Download
Hand, D.J. (2006), Classifier Technology and the Illusion of Progress
(with comments and a rejoinder by the author). Statist. Sci. 21, iss. 1,
1-34.
- Download
S. Guha, R. Hafen, J. Xia, J. Rounds, J. Li, B. Xi, and W. S. Cleveland (2012), Large complex data: divide and recombine (D&R) with RHIPE, Stat 1, 53-67.
- Download
Cleveland W.S. (2001, republished 2014),
Data science: An action plan for expanding the technical areas of the field of statistics.
Statistical Analysis and Data Mining 7, iss. 6, 414-417.
- Download
B. Yu (2014). Let us own data science. Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Conference, Sydney, July, 2014.
- Week 1 (one class on W): Introduction (Chapter 1);
notes