PubH 7475/8475 Update Page
PubH 7475/8475 Statistical Learning and Data Mining
Spring 2021 Updates
Instructors:
Dr. Wei Pan, panxx014@umn.edu
Class: 9:45 AM - 11:00 AM, M&W, Zoom.
- Week 13-15: Project presentations, 4/21, 4/26, 4/28 and 5/3.
- Week 13:
Monday (4/19): Community detection in networks.
notes.
-
Download:
Neuman MEJ. Detecting community structure in networks.
-
Download:
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. (2008).
Fast unfolding of communities in large networks. arXiv:0803.0476
-
Download, or
preprint:
Zhao Y, Levina E, Zhu J (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist.
Volume 40, Number 4 (2012), 2266-2292.
-
Download:
Fortunato S (2010). Community detection in graphs. Physics Reports 486, 75-174.
-
Download:
David Meunier, Renaud Lambiotte and Edward T. Bullmore (2010).
Modular and hierarchically modular organization of brain networks.
Front. Neurosci., 4, 200.
- HWK5 due on April 21;
for those who are presenting on 4/21, it is extended to 4/28.
- Weeks 11-12: Unsupervised learning (Chapter 14); semi-supervised
learning; No class on 3/29.
SSL notes.
-
Download:
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton, (2020).
A Simple Framework for Contrastive Learning of Visual Representations.
ICMK 2020.
-
Download:
Wagstaff et al (2001). Constrained K-means Clustering with Background Knowledge.
-
Download:
Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect
population stratification. Frontiers in Genetics. 4:215. doi:10.3389/fgene.2013.00215.
-
Download:
Wang J, Shen X, Pan W. (2009). On efficient large margin semisupervised learning: method and theory. Journal of Machine Learning Research, 10, 719-742.
-
Download:
Wang, J., Shen, X., and Pan, W. (2006). On transductive support vector
machines. Contemp. Math., 43, 7-19.
-
Download:
Wei Pan, Xiaotong Shen, Aixiang Jiang, and Robert P. Hebbel (2006).
Semi-supervised learning via penalized mixture model with application
to microarray sample classification.
Bioinformatics, 22, 2388-2395.
-
Download:
Tibshirani R, Walther G (2005). Clustering
validation by prediction strength. JCGS, 14, 511-528.
-
Download:
Liu Y, Hayes DN, Nobel A, Marron JS (2012). Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data. JASA, 103, 1281-1293.
-
Download:
McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. (2002).
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.
Bioinformatics. 18(11):1462-9.
-
Download:
Candes EJ, Li X, Ma Y, Wright J (2009).
Robust Principal Component Analysis?
-
Download:
Shen Y, Wen Z, Zhang Y (2011). Augmented Lagrangian alternating direction method for
matrix separation based on low-rank factorization.
- (Review)
Download:
Zhang Z, Jordan MI (2008). Multiway spectral clustering: a margin-based perspective.
Stat Sci, 23, 383-403.
- (Review)
Download:
von Luxburg. A tutorial on spectral clustering.
-
Download:
Ng AY, Jordan MI, Weiss Y. On spectral clustering: analysis and an algorithm.
-
Download:
Pan W, Shen X, Liu B (2013). Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.
Journal of Machine Learning Research, 14, 1865-.
-
Download:
Pan W, Shen X (2007). Penalized Model-Based Clustering with Application to Variable
Selection. Journal of Machine Learning Research, 8, 1145-1164.
-
Download:
Li J, Ray S, Lindsay BG (2007). A nonparametric statistical approach to clustering via
mode identification. Journal of Machine Learning Research, 8, 1687-1723.
- Week 10 plan: RL; Unsupervised learning (Chapter 14).
RL notes,
clustering notes.
The final course project proposal will be due on March 22.
The midterm exam will be assigned on March 24 and due at 11am CT on March 29.
-
Download:
Volodymyr Mnih et al. (2015).
Human-level control through deep reinforcement learning.
Nature, 518, 529-533.
- (Review)
Download:
Xu R, Wunsch D (2005). Survey of clustering algorithms.
IEEE Transaction on Neural Networks, 16, 645-678.
- HWK4 due on March 17
(but can be automatically extended to March 24 or 31)
- Weeks 8-9: CNNs and
two applications:
protein subcellular localization prediction with
micorscopic cell images (Xiao et al 2019);
enhancer-promoter interaction prediction with DNA seq
(Zhuang et al 2019).
RNN.
R/Keras FNN&CNN,
RNN
Mengli's
slides
and
example on CNNs in R.
SLDS'18 slides
-
Download:
LeCun et al (1998). Gradient-based learning applied to document recognition.
Proc of IEEE. (Comment: Section I. p.5-7 most helpful to understand
convolutional NNs.)
-
Download:
Krizhevsky A, Sutskever I, Hinton G. (2012).
ImageNet Classification with Deep Convolutional Neural Networks. NeurIPS.
-
Download:
Zhou J and Troyanskaya OG (2015). Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12, 931-934.
-
Download:
Silver et al. (2016).
Mastering the game of Go with deep neural networks and tree search.
Nature, 529, 484-489.
-
Download:
Xiao M, Shen X, Pan W. (2019).
Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images.
Genetic Epi, 43(3), 330-341.
-
Download:
Zhuang Z, Shen X, Pan W. (2019).
A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data.
Bioinformatics, 35(17), 2899-2906.
-
Download:
Fan J, Ma C, Zhong Y. (2019).
A Selective Overview of Deep Learning. arXiv:1904.05526.
-
Download:
Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, Yoshua Bengio. (2021).
Towards Causal Representation Learning. arXiv:2102.11107
-
Download:
Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK (2018) Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 15(11): e1002683. doi:10.1371/journal.pmed.1002683.
-
Download:
Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, Jonathan K. Su (2019).
This Looks Like That: Deep Learning for Interpretable Image Recognition.
Advances in Neural Information Processing Systems 32 (NeurIPS 2019).
-
Download:
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision (IJCV) 2019.
-
Download here or at the
Journal
:
Stephanie Clark, Rob J Hyndman, Dan Pagendam, Louise M Ryan. (2020).
Modern strategies for time series regression.
International Stat Rev, 88(S1), S179-S204.
- Week 7: More on RF;
Support vector machines (Chapter 12);
Feed-forward neural networks (Chapter 11).
notes 1,
notes 2.
- Download
Stefan Wager, Trevor Hastie, Bradley Efron. (2014).
Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.
JMLR 15(48):1625-1651.
- Download
Lucas Mentch, Giles Hooker. (2016).
Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests.
JMLR 17: 1-41.
- Download
Ishwaran H, Lu M. (2019).
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Stat Med. 38(4):558-582.
- Download
Hugh A. Chipman, Edward I. George, Robert E. McCulloch. (2010).
BART: Bayesian additive regression trees.
Annals of Applied Statistics 4(1), 266-298.
- Download
Lu M, Sadiq S, Feaster DJ, Ishwaran H. (2018).
Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods.
J Comput Graph Stat. 27(1), 209-219.
- Download or
here:
Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, and Dan Cervone. (2019).
Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition.
Statist Sci, 34, 43-68.
-
Download:
Wang, L., and Shen, X. (2007). On L1-norm multi-class support vector machines: methodology and theory. JASA, 102, 583-594.
-
Download:
Shen, X., Tseng, G.C., Zhang, X., Wong, W.H. (2003).
On psi-Learning.
JASA, 98, 724-734.
-
Download:
Wang J, Shen X, Liu Y. (2008).
Probability estimation for large margin classifiers. Biometrika. 95, 149-167.
-
Download:
Geman S, Bienenstock E, Doursat R (1992). Neural networks and the bias/variance dilemma.
Neural Computation, 4, 1-58.
-
Download:
Kondor RI, Lafferty J. Diffusion kernels on graphs and other discrete structures.
- (Review)
Download:
Hofmann T, Scholkopf B and Smola AJ (2008).
Kernel methods in machine learning.
Annals of Statistics, 36, 1171-1220.
- (Review)
Download:
Javier M. Moguerza and Alberto Muqoz (2006).
Support Vector Machines with Applications.
Statistical Science, 21, 322-336.
Comments and rejoiner, 337-362.
- (Review)
Download:
Bing Cheng, D. M. Titterington (1994).
Neural Networks: A Review from a Statistical Perspective.
Statistical Science, 9, 2-30.
Comments and rejoiner. 31-54.
- (Review)
Download:
B. D. Ripley (1994).
Neural Networks and Related Methods for Classification.
JRSS-B, 56, 409-456.
- HWK3 due on March 1
- Interdisciplinary Health Data Competition.
Important dates: February 17: Registration opens;
March 3: Last day to register;
March 8: Competition kickoff and data release (will be recorded);
March 21: All project submissions due by 11:59 pm CST;
March 26: Finalists announced;
March 30: Finalists' presentations;
March 31: Winners announced.
- Weeks 5&6: Computational algorithms and statistical inference for high-dimensional data; Trees; Ensemble methods:
Bagging (8.7);
Bayes model averaging (BMA) and
stacking (8.8), ARM (Yang, 2003);
Random forest (Chapter 15); Boosting (Chapter 10): AdaBoost and GBM.
notes,
notes
Go to an info page for R package gbm.
- Download
Loh W-Y (2014). Fifty years of classification and regression trees (with discussion), International Statistical Review, 34, 329-370.
- Download
Loh W-Y, He X, Man M (2015).
A regression tree approach to identifying subgroups with differential treatment effects.
Stat Med, 34(11),1818-1833.
https://doi.org/10.1002/sim.6454.
- Download
Athey S, Imbens G (2016).
Recursive partitioning for heterogeneous causal effects.
Proceedings of the National Academy of Sciences, 113(27), 7353-7360. DOI: 10.1073/pnas.1510489113.
- Download
Breiman L (1996). Bagging predictors. Machine Learning, 24, 123-140.
- Download
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999).
Bayesian model averaging: a tutorial (with comments). Stat Sci, 14:362-417.
- Download:
Yang Y (2003). Regression with multiple candidate models: selecting or mixing? Statistica Sinica, vol. 13, 783-809.
- Download:
Yang Y (2001). Adaptive regression by mixing, JASA, vol. 96, 574-588.
- Download:
Shen X, Huang H-C (2006)
Optimal model assessment, selection and combination. JASA 101:554-568.
- Download:
Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P (2007)
Random-set methods identify distinct aspects of the enrichment signal in
gene-set analysis. Annals of Applied Statistics, 1:85-106.
- Download:
Pan W, Kim J, Zhang Y, Shen X, Wei P (2014)
A powerful and adaptive association test for rare variants. Genetics, 197(4):1081-1095.
-
Download:
Pan W, Xiao G and Huang X (2006).
Input Dependent Weights for Model Combination
and Model Selection with Multiple Sources of Data.
Statistica Sinica, 16:523-540.
-
Download:
Zhang Y, Yang Y (2015).
Cross-validation for selecting a model selection procedure.
J of Econometrics.
- Download
Breiman L (2001). Random forests. Machine Learning, 45, 5-32.
- Download:
Saharon Rosset, Ji Zhu, Trevor Hastie (2004).
Boosting as a Regularized Path to a Maximum Margin Classifier.
JMLR 5:941--973.
- Download
Friedman's MART papers.
- Download
Hugh A. Chipman, Edward I. George, Robert E. McCulloch. (2010).
BART: Bayesian additive regression trees.
Annals of Applied Statistics 4(1), 266-298.
- Download
Lu M, Sadiq S, Feaster DJ, Ishwaran H. (2018).
Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods.
J Comput Graph Stat. 27(1), 209-219.
- Download or
here:
Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, and Dan Cervone. (2019).
Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition.
Statist Sci, 34, 43-68.
- HWK2 due on Feb 17
- Week 4:
Other penalties (3.8): SCAD (Fan and Li 2001), elastic net (Zou and Hastie 2005),
adaptive LASSO (Zou 2006), TLP (Shen et al 2012), group lasso, fused lasso...;
SIS;
methods based on derived inputs (3.5-3.6): PCR, PLS.
- Download:
Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and
its oracle properties.
Journal of the American Statistical Association 96 (456), 1348-1360.
- Download:
Zou H (2006),
The Adaptive Lasso and Its Oracle Properties. JASA,
101, 418-1429.
- Download:
Zou H, Hastie T (2005),
Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B,
67, 301-320.
- Download:
Austin E, Pan W, Shen X. (2013).
Penalized Regression and Risk Prediction in Genome-Wide Association Studies.
Stat Anal Data Min. 6(4). doi: 10.1002/sam.11183.
- Download:
Zhu Y, Shen X, Pan W (2013).
Simultaneous grouping pursuit and feature selection over an undirected graph.
JASA, 108, 713-725.
- Download:
Kim S, Pan W, Shen X (2013).
Network-based penalized regression with application to genomic data.
Biometrics. 69(3), 582-593.
- Download:
Friedman J, Hastie T, Hoefling H, Tibshirani R (2007). Pathwise Coordinate Optimization.
The Annals of Applied Statistics, 2(1), 302–332.
- Download:
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
- Download:
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, (2011).
Distributed optimization and statistical learning via the alternating direction method
of multipliers.
Foundations and Trends in Machine Learning, 3(1):1-122.
- Download:
Shi C, Song R, Chen Z, Li R. (2019).
Linear hypothesis testing for high dimensional generalized linear models.
Ann Stat, 47(5), 2671-2703.
- Download:
Zhu Y, Shen X, Pan W. (2020).
On High-Dimensional Constrained Maximum Likelihood Inference.
JASA, 115(529), 217-230.
- Download:
Dezeure R, Buhlmann P, Meier L and Meinshausen N (2015).
High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi.
Stat Sci, 30(4), 533-558.
- Download:
Fan J, Lv J (2008). Sure independence screening for ultrahigh dimensional feature space.
JRSS-B 70, 849-911.
- Download:
Chun H and Keles S (2010).
Sparse partial least squares regression for simultaneous dimension reduction and variable selection. JRSS-B, 72(1):3-25. (R packages "spls")
- Week 3:
LDA and QDA (4.3);
RDA (4.3.1), nearest shrunken centroid (18.2), logistic regression (4.4);
penalized logistic regression (18.3.2, 18.4).
Linear regression: LS (3.1-3.2); Subset selection (3.3),
shrinkage methods: ridge, Lasso (3.4.1-3.4.3);
notes
- Download:
Banfield JD, Raftery AE (1993). Model-based Gaussian and non-gaussian clustering.
Biometrics, 49, 803-821.
Note:
Section 2 contains a discussion on the eigen-decomposition of a covariance matrix of a Normal
distribution.
- Download:
Peter J. Bickel, Elizaveta Levina (2004),
Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations.
Bernoulli, 10, 989--1010.
- Download
Tibshirani et al. (2002). Diagnosis of multiple cancer types by shrunken
centroids of gene expression. PNAS, 99, 6567-6572.
- Download:
Xiaohong Huang and Wei Pan (2003),
Linear regression and two-class classification with gene expression data.
Bioinformatics, 19, 2072-2078.
- Download:
Dudoit S., Fridlyand J, Speed T. P. (2002).
Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data.
JASA, 97, 77-87.
- Download:
Fan J, Fan Y (2008). High-dimensional classification using features annealed independence rules. Ann Statist, 36, 2605-2637.
- Download:
Mai, Q., Zou, H., and Yuan, M. (2012). A Direct Approach to Sparse Discriminant Analysis in Ultra-high Dimensions. Biometrika, 99(1), 29-42.
- Download:
Cai T, Liu W. (2011).
A Direct Estimation Approach to Sparse Linear Discriminant Analysis.
JASA, 106, 1566-1577.
- Week 2:
Overview (2.1-2.3);
Model selection and assessment (2.9, 7.10);
read Curse of dimesionality (2.5).
Linear models for classification:
intro (4.1), linear regression (4.2);
notes
- HWK1 due on Feb 3.
(by the end of the day on Canvas).
Note:
It is due by the end of the day on Canvas;
you can
have separate pages for the answers and code, or you can mark
out (or hand-print) your answers with the mixed code and output.
Again no late HWK is accepted unless with a prior approval
or a legitimate reason (e.g. illness).
- Download
WSJ: Big Data Is on the Rise, Bringing Big Questions.
(A subscription may be needed.)
- Download
WSJ: Big Data's Big Problem: Little Talent.
(A subscription may be needed.)
- Download
McKinsey Global InstituteJune 2011 Big data: The next frontierfor innovation, competition,and productivity.
- Download
Donoho D. (2015), 50 years of Data Science.
- Download
Breiman L. (2001), Statistical Modeling: The Two Cultures
(with comments and a rejoinder by the author). Statist. Sci. 16, iss. 3,
199-231.
- Download
Hand, D.J. (2006), Classifier Technology and the Illusion of Progress
(with comments and a rejoinder by the author). Statist. Sci. 21, iss. 1,
1-34.
- Download
S. Guha, R. Hafen, J. Xia, J. Rounds, J. Li, B. Xi, and W. S. Cleveland (2012), Large complex data: divide and recombine (D&R) with RHIPE, Stat 1, 53-67.
- Download
Cleveland W.S. (2001, republished 2014),
Data science: An action plan for expanding the technical areas of the field of statistics.
Statistical Analysis and Data Mining 7, iss. 6, 414-417.
- Download
B. Yu (2014). Let us own data science. Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Conference, Sydney, July, 2014.
- Download
Yang S, et al. (2015). Accurate estimation of influenza
epidemics using Google search data via ARGO. PNAS, 112,
14473-8.
- Download
McKinney SM, (2020). International evaluation of an AI
system for breast cancer screening. Nature, 577, 89-94.
- Download
Hollon TC, et al. (2020). Near real-time intraoperative brain
tumor diagnosis using stimulated Raman histology and deep
neural networks. Nat Med, 26, 52-58.
- Week 1 (one class on W): Introduction (Chapter 1);
notes