PubH 7475/8475 Update Page
PubH 7475/8475/Stat 8056 Statistical Learning and Data Mining
Spring 2019 Updates
Instructors:
Dr. Wei Pan, weip@biostat.umn.edu
Dr. Xiaotong Shen, xshen@umn.edu
Class: 9:45 AM - 11:00 AM, M&W, Rapson 45
- Why Medicine Needs Deep Learning - Brendan Frey.
Lecture video
- Deep learning for genomics: Introduction and examples - James Zou.
Lecture video
- Stanford University CS231n, Spring 2017,
Lectures. Lectures 5 and 9 on CNNs.
- Week 10:
Wednesday (4/10): Community detection in networks.
notes.
-
Download:
Neuman MEJ. Detecting community structure in networks.
-
Download:
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. (2008).
Fast unfolding of communities in large networks. arXiv:0803.0476
-
Download, or
preprint:
Zhao Y, Levina E, Zhu J (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist.
Volume 40, Number 4 (2012), 2266-2292.
-
Download:
Fortunato S (2010). Community detection in graphs. Physics Reports 486, 75-174.
-
Download:
David Meunier, Renaud Lambiotte and Edward T. Bullmore (2010).
Modular and hierarchically modular organization of brain networks.
Front. Neurosci., 4, 200.
- Mid-term exam is scheduled on Monday of Week 8 (March 11, 2019).
It will be a closed-book, one-hour in-class exam: no books or notes allowed;
it will cover the contents up to the end of the last class on 3/6.
- HWK4 due on March 25.
- Weeks 7-8: Unsupervised learning (Chapter 14); semi-supervised
learning;
notes 1,
notes 2.
-
Download:
Wagstaff et al (2001). Constrained K-means Clustering with Background Knowledge.
-
Download:
Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect
population stratification. Frontiers in Genetics. 4:215. doi:10.3389/fgene.2013.00215.
-
Download:
Wang J, Shen X, Pan W. (2009). On efficient large margin semisupervised learning: method and theory. Journal of Machine Learning Research, 10, 719-742.
-
Download:
Wang, J., Shen, X., and Pan, W. (2006). On transductive support vector
machines. Contemp. Math., 43, 7-19.
-
Download:
Wei Pan, Xiaotong Shen, Aixiang Jiang, and Robert P. Hebbel (2006).
Semi-supervised learning via penalized mixture model with application
to microarray sample classification.
Bioinformatics, 22, 2388-2395.
-
Download:
Tibshirani R, Walther G (2005). Clustering
validation by prediction strength. JCGS, 14, 511-528.
-
Download:
Liu Y, Hayes DN, Nobel A, Marron JS (2012). Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data. JASA, 103, 1281-1293.
-
Download:
McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. (2002).
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.
Bioinformatics. 18(11):1462-9.
-
Download:
Candes EJ, Li X, Ma Y, Wright J (2009).
Robust Principal Component Analysis?
-
Download:
Shen Y, Wen Z, Zhang Y (2011). Augmented Lagrangian alternating direction method for
matrix separation based on low-rank factorization.
- (Review)
Download:
Zhang Z, Jordan MI (2008). Multiway spectral clustering: a margin-based perspective.
Stat Sci, 23, 383-403.
- (Review)
Download:
von Luxburg. A tutorial on spectral clustering.
-
Download:
Ng AY, Jordan MI, Weiss Y. On spectral clustering: nalysis and an algorithm.
-
Download:
Pan W, Shen X, Liu B (2013). Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.
Journal of Machine Learning Research, 14, 1865-.
-
Download:
Pan W, Shen X (2007). Penalized Model-Based Clustering with Application to Variable
Selection. Journal of Machine Learning Research, 8, 1145-1164.
-
Download:
Li J, Ray S, Lindsay BG (2007). A nonparametric statistical approach to clustering via
mode identification. Journal of Machine Learning Research, 8, 1687-1723.
- (Review)
Download:
Xu R, Wunsch D (2005). Survey of clustering algorithms.
IEEE Transaction on Neural Networks, 16, 645-678.
- Week 6;
CNNs. Three case studies: handwritten digit recognition (MNIST data);
protein subcellular localization prediction with
micorscopic cell images (Xiao et al 2019);
enhancer-promoter interaction prediction with DNA seq
(Zhuang et al 2019).
Mengli's
slides
and
example on CNNs in R.
SLDS'18 slides
-
Download:
LeCun et al (1998). Gradient-based learning applied to document recognition.
Proc of IEEE. (Cooment: Section I. p.5-7 most helpful to understand
convolutional NNs.)
-
Download:
Krizhevsky A, Sutskever I, Hinton G. (2012).
ImageNet Classification with Deep Convolutional Neural Networks. NIPS.
-
Download:
Zhou J and Troyanskaya OG (2015). Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12, 931-934.
-
Download:
Silver et al. (2016).
Mastering the game of Go with deep neural networks and tree search.
Nature, 529, 484-489.
-
Download:
Xiao M, Shen X, Pan W. (2019).
Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images.
Genetic Epi.
-
Download:
Zhuang Z, Shen X, Pan W. (2019).
A simple convolutional neural network for prediction of enhancer–promoter interactions with DNA sequence data.
Bioinformatics.
- HWK3 due on Feb 27
- Week 5: Boosting (Chapter 10);
Support vector machines (Chapter 12); Feed-forward neural networks (Chapter 11).
notes 1,
notes 2.
- Go to an info page for R package gbm.
- Download
Friedman's MART papers.
- Download:
Saharon Rosset, Ji Zhu, Trevor Hastie (2004).
Boosting as a Regularized Path to a Maximum Margin Classifier.
JMLR 5:941--973.
-
Download:
Wang, L., and Shen, X. (2007). On L1-norm multi-class support vector machines: methodology and theory. JASA, 102, 583-594.
-
Download:
Shen, X., Tseng, G.C., Zhang, X., Wong, W.H. (2003).
On psi-Learning.
JASA, 98, 724-734.
-
Download:
Wang J, Shen X, Liu Y. (2008).
Probability estimation for large margin classifiers. Biometrika. 95, 149-167.
-
Download:
Geman S, Bienenstock E, Doursat R (1992). Neural networks and the bias/variance dilemma.
Neural Computation, 4, 1-58.
-
Download:
Kondor RI, Lafferty J. Diffusion kernels on graphs and other discrete structures.
- (Review)
Download:
Hofmann T, Scholkopf B and Smola AJ (2008).
Kernel methods in machine learning.
Annals of Statistics, 36, 1171-1220.
- (Review)
Download:
Javier M. Moguerza and Alberto Muqoz (2006).
Support Vector Machines with Applications.
Statistical Science, 21, 322-336.
Comments and rejoiner, 337-362.
- (Review)
Download:
Bing Cheng, D. M. Titterington (1994).
Neural Networks: A Review from a Statistical Perspective.
Statistical Science, 9, 2-30.
Comments and rejoiner. 31-54.
- (Review)
Download:
B. D. Ripley (1994).
Neural Networks and Related Methods for Classification.
JRSS-B, 56, 409-456.
- Week 4:
Classification and regression trees (CART) (9.2);
Bagging (8.7);
Bayes model averaging (BMA) and
stacking (8.8), ARM (Yang, 2003);
Random forest (Chapter 15); Boosting: AdaBoost.
notes 1,
notes 2
- Download
Loh W-Y (2014). Fifty years of classification and regression trees (with discussion), International Statistical Review, 34, 329-370.
- Download
Breiman L (1996). Bagging predictors. Machine Learning, 24, 123-140.
- Download
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999).
Bayesian model averaging: a tutorial (with comments). Stat Sci, 14:362-417.
- Download:
Yang Y (2003). Regression with multiple candidate models: selecting or mixing? Statistica Sinica, vol. 13, 783-809.
- Download:
Yang Y (2001). Adaptive regression by mixing, JASA, vol. 96, 574-588.
- Download:
Shen X, Huang H-C (2006)
Optimal model assessment, selection and combination. JASA 101:554-568.
- Download:
Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P (2007)
Random-set methods identify distinct aspects of the enrichment signal in
gene-set analysis. Annals of Applied Statistics, 1:85-106.
- Download:
Pan W, Kim J, Zhang Y, Shen X, Wei P (2014)
A powerful and adaptive association test for rare variants. Genetics, 197(4):1081-1095.
-
Download:
Pan W, Xiao G and Huang X (2006).
Input Dependent Weights for Model Combination
and Model Selection with Multiple Sources of Data.
Statistica Sinica, 16:523-540.
-
Download:
Zhang Y, Yang Y (2015).
Cross-validation for selecting a model selection procedure.
To appear in J of Econometrics.
- Download
Breiman L (2001). Random forests. Machine Learning, 45, 5-32.
- HWK2 due on Feb 13
- Week 3: Linear regression: LS (3.1-3.2); Subset selection (3.3),
shrinkage methods: ridge, Lasso (3.4.1-3.4.3);
other penalties (3.8): SCAD (Fan and Li 2001), elastic net (Zou and Hastie 2005),
adaptive LASSO (Zou 2006), TLP (Shen et al 2012), group lasso, fused lasso...;
SIS;
methods based on derived inputs (3.5-3.6): PCR, PLS.
notes
- Download:
Fan J, Lv J (2008). Sure independence screening for ultrahigh dimensional feature space.
JRSS-B 70, 849-911.
- Download:
Zhu L-P, Li L, Li R, Zhu L-X. (2011). Model-Free Feature Screening for
Ultrahigh-Dimensional Data. JASA 106, 1464-1475.
- Download:
Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. Journal of American Statistical Association. 107, 1129 - 1139.
- Download:
Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and
its oracle properties.
Journal of the American Statistical Association 96 (456), 1348-1360.
- Download:
Eitan Greenshtein, Ya'Acov Ritov (2004),
Persistence in high-dimensional linear predictor selection and the virtue of overparametrization.
Bernoulli, 10, 989--1010.
- Download:
Zou H (2006),
The Adaptive Lasso and Its Oracle Properties. JASA,
101, 418-1429.
- Download:
Zou H, Hastie T (2005),
Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B,
67, 301-320.
- Download:
Austin E, Pan W, Shen X. (2013).
Penalized Regression and Risk Prediction in Genome-Wide Association Studies.
Stat Anal Data Min. 6(4). doi: 10.1002/sam.11183.
- Download:
Zhu Y, Shen X, Pan W (2013).
Simultaneous grouping pursuit and feature selection over an undirected graph.
JASA, 108, 713-725.
- Download:
Kim S, Pan W, Shen X (2013).
Network-based penalized regression with application to genomic data.
Biometrics. 69(3), 582-593.
- Download:
Friedman J, Hastie T, Hoefling H, Tibshirani R (2007). Pathwise Coordinate Optimization.
The Annals of Applied Statistics, 2(1), 302–332.
- Download:
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
- Download:
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, (2011).
Distributed optimization and statistical learning via the alternating direction method
of multipliers.
Foundations and Trends in Machine Learning, 3(1):1-122.
- Download:
Huang X, Pan W, Park S, Han X, Miller LW, Hall J. (2004).
Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares.
Bioinformatics. 20(6):888-94. (proposed a simple penalized PLS method)
- Download:
Chun H and Keles S (2010).
Sparse partial least squares regression for simultaneous dimension reduction and variable selection. JRSS-B, 72(1):3-25. (R packages "spls")
Urgent Notice: the U has cancelled all classes on Wednesday due to the extreme weather, so we will have NO class this Wednesday.
Please finish reading the below Week 2's notes (and the corresponding
sections in the textbook). I'll go over the main points next class.
There will be NO office hours either on this Tuesday and Wednesday.
Take care!
- Week 2:
Overview (2.1-2.3);
Model selection and assessment (2.9, 7.10);
read Curse of dimesionality (2.5).
Linear models for classification:
intro (4.1), linear regression (4.2), LDA and QDA (4.3);
RDA (4.3.1), nearest shrunken centroid (18.2), logistic regression (4.4),
penalized logistic regression (18.3.2, 18.4).
notes
- Download:
Banfield JD, Raftery AE (1993). Model-based Gaussian and non-gaussian clustering.
Biometrics, 49, 803-821.
Note:
Section 2 contains a discussion on the eigen-decomposition of a covariance matrix of a Normal
distribution.
- Download:
Peter J. Bickel, Elizaveta Levina (2004),
Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations.
Bernoulli, 10, 989--1010.
- Download
Tibshirani et al. (2002). Diagnosis of multiple cancer types by shrunken
centroids of gene expression. PNAS, 99, 6567-6572.
- Download:
Xiaohong Huang and Wei Pan (2003),
Linear regression and two-class classification with gene expression data.
Bioinformatics, 19, 2072-2078.
- Download:
Dudoit S., Fridlyand J, Speed T. P. (2002).
Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data.
JASA, 97, 77-87.
- Download:
Fan J, Fan Y (2008). High-dimensional classification using features annealed independence rules. Ann Statist, 36, 2605-2637.
- Download:
Mai, Q., Zou, H., and Yuan, M. (2012). A Direct Approach to Sparse Discriminant Analysis in Ultra-high Dimensions. Biometrika, 99(1), 29-42.
- HWK1 due on Feb 4;
however, due to the cancelled class and office hours on Tu&Wed,
the due date is postpone to Feb 6.
Note: A hardcopy of your HWK answers plus code is due
at the beginning of the class on the due date; you can
have separate files/pages for the answers and code, or you can mark
out (or hand-print) your answers with the mixed code and output.
Again no late HWK is accepted unless with a prior approval
or a legitimate reason (e.g. illness).
- Download
WSJ: Big Data Is on the Rise, Bringing Big Questions.
(A subscription may be needed.)
- Download
WSJ: Big Data's Big Problem: Little Talent.
(A subscription may be needed.)
- Download
McKinsey Global InstituteJune 2011 Big data: The next frontierfor innovation, competition,and productivity.
- Download
Donoho D. (2015), 50 years of Data Science.
- Download
Breiman L. (2001), Statistical Modeling: The Two Cultures
(with comments and a rejoinder by the author). Statist. Sci. 16, iss. 3,
199-231.
- Download
Hand, D.J. (2006), Classifier Technology and the Illusion of Progress
(with comments and a rejoinder by the author). Statist. Sci. 21, iss. 1,
1-34.
- Download
S. Guha, R. Hafen, J. Xia, J. Rounds, J. Li, B. Xi, and W. S. Cleveland (2012), Large complex data: divide and recombine (D&R) with RHIPE, Stat 1, 53-67.
- Download
Cleveland W.S. (2001, republished 2014),
Data science: An action plan for expanding the technical areas of the field of statistics.
Statistical Analysis and Data Mining 7, iss. 6, 414-417.
- Download
B. Yu (2014). Let us own data science. Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Conference, Sydney, July, 2014.
- Week 1 (one class on W): Introduction (Chapter 1);
notes