PubH 7475/8475 Update Page
PubH 7475/8475 Statistical Learning and Data Mining
Spring 2022 Updates
Instructors:
Dr. Wei Pan, panxx014@umn.edu
- Week 14-15: Project presentations
- Week 13 W:
Community detection in networks.
notes.
-
Download:
Neuman MEJ. Detecting community structure in networks.
-
Download:
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. (2008).
Fast unfolding of communities in large networks. arXiv:0803.0476
-
Download, or
preprint:
Zhao Y, Levina E, Zhu J (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist.
Volume 40, Number 4 (2012), 2266-2292.
-
Download:
Fortunato S (2010). Community detection in graphs. Physics Reports 486, 75-174.
-
Download:
David Meunier, Renaud Lambiotte and Edward T. Bullmore (2010).
Modular and hierarchically modular organization of brain networks.
Front. Neurosci., 4, 200.
- Week 13 M: Semi-supervised learning.
SSL notes.
-
Download:
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton, (2020).
A Simple Framework for Contrastive Learning of Visual Representations.
ICMK 2020.
-
Download:
Peng Liu , Yusi Fang, Zhao Ren, Lu Tang, George C. Tseng (2021).
Outcome-Guided Disease Subtyping for High-Dimensional Omics Data. arXiv:2007.11123
-
Download:
Wagstaff et al (2001). Constrained K-means Clustering with Background Knowledge.
-
Download:
Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect
population stratification. Frontiers in Genetics. 4:215. doi:10.3389/fgene.2013.00215.
-
Download:
Wang J, Shen X, Pan W. (2009). On efficient large margin semisupervised learning: method and theory. Journal of Machine Learning Research, 10, 719-742.
-
Download:
Wang, J., Shen, X., and Pan, W. (2006). On transductive support vector
machines. Contemp. Math., 43, 7-19.
-
Download:
Wei Pan, Xiaotong Shen, Aixiang Jiang, and Robert P. Hebbel (2006).
Semi-supervised learning via penalized mixture model with application
to microarray sample classification.
Bioinformatics, 22, 2388-2395.
- HWK5 due on April 20;
.
- Week 12: Unsupervised learning (Chapter 14).
-
Download:
Tibshirani R, Walther G (2005). Clustering
validation by prediction strength. JCGS, 14, 511-528.
-
Download:
Liu Y, Hayes DN, Nobel A, Marron JS (2012). Statistical Significance of Clustering for High-Dimension, Low–Sample Size Data. JASA, 103, 1281-1293.
-
Download:
McShane LM, Radmacher MD, Freidlin B, Yu R, Li MC, Simon R. (2002).
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.
Bioinformatics. 18(11):1462-9.
-
Download:
Candes EJ, Li X, Ma Y, Wright J (2009).
Robust Principal Component Analysis?
-
Download:
Shen Y, Wen Z, Zhang Y (2011). Augmented Lagrangian alternating direction method for
matrix separation based on low-rank factorization.
- (Review)
Download:
Zhang Z, Jordan MI (2008). Multiway spectral clustering: a margin-based perspective.
Stat Sci, 23, 383-403.
- (Review)
Download:
von Luxburg. A tutorial on spectral clustering.
-
Download:
Ng AY, Jordan MI, Weiss Y. On spectral clustering: analysis and an algorithm.
-
Download:
Pan W, Shen X, Liu B (2013). Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.
Journal of Machine Learning Research, 14, 1865-.
-
Download:
Pan W, Shen X (2007). Penalized Model-Based Clustering with Application to Variable
Selection. Journal of Machine Learning Research, 8, 1145-1164.
-
Download:
Li J, Ray S, Lindsay BG (2007). A nonparametric statistical approach to clustering via
mode identification. Journal of Machine Learning Research, 8, 1687-1723.
- Week 11: Unsupervised learning (Chapter 14).
clustering notes.
- (Review)
Download:
Xu R, Wunsch D (2005). Survey of clustering algorithms.
IEEE Transaction on Neural Networks, 16, 645-678.
- Project proposal due on April 6
- HWK4 due on March 30
(which can be extended to April 6)
- Mid-term exam is scheduled in class on Wednesday, March 23, 2022.
It will be a closed-book, one-hour in-class exam: no books or notes allowed;
it will cover the contents up to the end of the class on March 21.
Old exam,
old sum
- Weeks 8-10: FNNs (Chapter 11); CNNs; RNNs;
two applications:
protein subcellular localization prediction with
micorscopic cell images (Xiao et al 2019);
enhancer-promoter interaction prediction with DNA seq
(Zhuang et al 2019); RL.
FNN&CNN notes,
R/Keras FNN&CNN,
RNN,
RL notes.
Mengli's
slides
and
example on CNNs in R.
SLDS'18 slides
-
Download:
LeCun et al (1998). Gradient-based learning applied to document recognition.
Proc of IEEE. (Comment: Section I. p.5-7 most helpful to understand
convolutional NNs.)
-
Download:
Krizhevsky A, Sutskever I, Hinton G. (2012).
ImageNet Classification with Deep Convolutional Neural Networks. NeurIPS.
-
Download:
Zhou J and Troyanskaya OG (2015). Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12, 931-934.
-
Download:
Silver et al. (2016).
Mastering the game of Go with deep neural networks and tree search.
Nature, 529, 484-489.
-
Download:
Xiao M, Shen X, Pan W. (2019).
Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images.
Genetic Epi, 43(3), 330-341.
-
Download:
Zhuang Z, Shen X, Pan W. (2019).
A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data.
Bioinformatics, 35(17), 2899-2906.
-
Download:
Fan J, Ma C, Zhong Y. (2019).
A Selective Overview of Deep Learning. arXiv:1904.05526.
-
Download:
Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, Yoshua Bengio. (2021).
Towards Causal Representation Learning. arXiv:2102.11107
-
Download:
Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK (2018) Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 15(11): e1002683. doi:10.1371/journal.pmed.1002683.
-
Download:
Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, Jonathan K. Su (2019).
This Looks Like That: Deep Learning for Interpretable Image Recognition.
Advances in Neural Information Processing Systems 32 (NeurIPS 2019).
-
Download:
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision (IJCV) 2019.
-
Download here or at the
Journal
:
Stephanie Clark, Rob J Hyndman, Dan Pagendam, Louise M Ryan. (2020).
Modern strategies for time series regression.
International Stat Rev, 88(S1), S179-S204.
-
Download:
Volodymyr Mnih et al. (2015).
Human-level control through deep reinforcement learning.
Nature, 518, 529-533.
- Week 7: More on RF;
Support vector machines (Chapter 12).
SVM notes,
- Download
Stefan Wager, Trevor Hastie, Bradley Efron. (2014).
Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.
JMLR 15(48):1625-1651.
- Download
Lucas Mentch, Giles Hooker. (2016).
Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests.
JMLR 17: 1-41.
- Download
Ishwaran H, Lu M. (2019).
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Stat Med. 38(4):558-582.
- Download
Hugh A. Chipman, Edward I. George, Robert E. McCulloch. (2010).
BART: Bayesian additive regression trees.
Annals of Applied Statistics 4(1), 266-298.
- Download
Lu M, Sadiq S, Feaster DJ, Ishwaran H. (2018).
Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods.
J Comput Graph Stat. 27(1), 209-219.
- Download or
here:
Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, and Dan Cervone. (2019).
Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition.
Statist Sci, 34, 43-68.
-
Download:
Wang, L., and Shen, X. (2007). On L1-norm multi-class support vector machines: methodology and theory. JASA, 102, 583-594.
-
Download:
Shen, X., Tseng, G.C., Zhang, X., Wong, W.H. (2003).
On psi-Learning.
JASA, 98, 724-734.
-
Download:
Wang J, Shen X, Liu Y. (2008).
Probability estimation for large margin classifiers. Biometrika. 95, 149-167.
-
Download:
Geman S, Bienenstock E, Doursat R (1992). Neural networks and the bias/variance dilemma.
Neural Computation, 4, 1-58.
-
Download:
Kondor RI, Lafferty J. Diffusion kernels on graphs and other discrete structures.
- (Review)
Download:
Hofmann T, Scholkopf B and Smola AJ (2008).
Kernel methods in machine learning.
Annals of Statistics, 36, 1171-1220.
- (Review)
Download:
Javier M. Moguerza and Alberto Muqoz (2006).
Support Vector Machines with Applications.
Statistical Science, 21, 322-336.
Comments and rejoiner, 337-362.
- (Review)
Download:
Bing Cheng, D. M. Titterington (1994).
Neural Networks: A Review from a Statistical Perspective.
Statistical Science, 9, 2-30.
Comments and rejoiner. 31-54.
- (Review)
Download:
B. D. Ripley (1994).
Neural Networks and Related Methods for Classification.
JRSS-B, 56, 409-456.
- HWK3 due on March 2
- Weeks 5&6: Trees; Ensemble methods:
Bagging (8.7);
Bayes model averaging (BMA) and
stacking (8.8), ARM (Yang, 2003);
Random forest (Chapter 15); Boosting (Chapter 10): AdaBoost and GBM.
notes,
notes
Go to an info page for R package gbm.
- Download
Loh W-Y (2014). Fifty years of classification and regression trees (with discussion), International Statistical Review, 34, 329-370.
- Download
Loh W-Y, He X, Man M (2015).
A regression tree approach to identifying subgroups with differential treatment effects.
Stat Med, 34(11),1818-1833.
https://doi.org/10.1002/sim.6454.
- Download
Athey S, Imbens G (2016).
Recursive partitioning for heterogeneous causal effects.
Proceedings of the National Academy of Sciences, 113(27), 7353-7360. DOI: 10.1073/pnas.1510489113.
- Download
Breiman L (1996). Bagging predictors. Machine Learning, 24, 123-140.
- Download
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999).
Bayesian model averaging: a tutorial (with comments). Stat Sci, 14:362-417.
- Download:
Yang Y (2003). Regression with multiple candidate models: selecting or mixing? Statistica Sinica, vol. 13, 783-809.
- Download:
Yang Y (2001). Adaptive regression by mixing, JASA, vol. 96, 574-588.
- Download:
Shen X, Huang H-C (2006)
Optimal model assessment, selection and combination. JASA 101:554-568.
- Download:
Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P (2007)
Random-set methods identify distinct aspects of the enrichment signal in
gene-set analysis. Annals of Applied Statistics, 1:85-106.
- Download:
Pan W, Kim J, Zhang Y, Shen X, Wei P (2014)
A powerful and adaptive association test for rare variants. Genetics, 197(4):1081-1095.
-
Download:
Pan W, Xiao G and Huang X (2006).
Input Dependent Weights for Model Combination
and Model Selection with Multiple Sources of Data.
Statistica Sinica, 16:523-540.
-
Download:
Zhang Y, Yang Y (2015).
Cross-validation for selecting a model selection procedure.
J of Econometrics.
-
Download:
Shao J (1997). AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION. Stat Sinica 7:221-264.
- Download
Breiman L (2001). Random forests. Machine Learning, 45, 5-32.
- Download:
Saharon Rosset, Ji Zhu, Trevor Hastie (2004).
Boosting as a Regularized Path to a Maximum Margin Classifier.
JMLR 5:941--973.
- Download
Friedman's MART papers.
- Download
Stefan Wager, Trevor Hastie, Bradley Efron. (2014).
Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.
JMLR 15(48):1625-1651.
- Download
Lucas Mentch, Giles Hooker. (2016).
Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests.
JMLR 17: 1-41.
- Download
Ishwaran H, Lu M. (2019).
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Stat Med. 38(4):558-582.
- Download
Hugh A. Chipman, Edward I. George, Robert E. McCulloch. (2010).
BART: Bayesian additive regression trees.
Annals of Applied Statistics 4(1), 266-298.
- Download
Lu M, Sadiq S, Feaster DJ, Ishwaran H. (2018).
Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods.
J Comput Graph Stat. 27(1), 209-219.
- Download or
here:
Vincent Dorie, Jennifer Hill, Uri Shalit, Marc Scott, and Dan Cervone. (2019).
Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition.
Statist Sci, 34, 43-68.
- HWK2 due on Feb 16
- Week 4:
Other penalties (3.8): SCAD (Fan and Li 2001), elastic net (Zou and Hastie 2005),
adaptive LASSO (Zou 2006), TLP (Shen et al 2012), group lasso, fused lasso...;
SIS;
Computational algorithms and statistical inference for high-dime
nsional data;
methods based on derived inputs (3.5-3.6): PCR, PLS.
- Download:
Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and
its oracle properties.
Journal of the American Statistical Association 96 (456), 1348-1360.
- Download:
Zou H (2006),
The Adaptive Lasso and Its Oracle Properties. JASA,
101, 418-1429.
- Download:
Zou H, Hastie T (2005),
Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B,
67, 301-320.
- Download:
Austin E, Pan W, Shen X. (2013).
Penalized Regression and Risk Prediction in Genome-Wide Association Studies.
Stat Anal Data Min. 6(4). doi: 10.1002/sam.11183.
- Download:
Zhu Y, Shen X, Pan W (2013).
Simultaneous grouping pursuit and feature selection over an undirected graph.
JASA, 108, 713-725.
- Download:
Kim S, Pan W, Shen X (2013).
Network-based penalized regression with application to genomic data.
Biometrics. 69(3), 582-593.
- Download:
Friedman J, Hastie T, Hoefling H, Tibshirani R (2007). Pathwise Coordinate Optimization.
The Annals of Applied Statistics, 2(1), 302–332.
- Download:
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
- Download:
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, (2011).
Distributed optimization and statistical learning via the alternating direction method
of multipliers.
Foundations and Trends in Machine Learning, 3(1):1-122.
- Download:
Shi C, Song R, Chen Z, Li R. (2019).
Linear hypothesis testing for high dimensional generalized linear models.
Ann Stat, 47(5), 2671-2703.
- Download:
Zhu Y, Shen X, Pan W. (2020).
On High-Dimensional Constrained Maximum Likelihood Inference.
JASA, 115(529), 217-230.
- Download:
Dezeure R, Buhlmann P, Meier L and Meinshausen N (2015).
High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi.
Stat Sci, 30(4), 533-558.
- Download:
Fan J, Lv J (2008). Sure independence screening for ultrahigh dimensional feature space.
JRSS-B 70, 849-911.
- Download:
Chun H and Keles S (2010).
Sparse partial least squares regression for simultaneous dimension reduction and variable selection. JRSS-B, 72(1):3-25. (R packages "spls")
- Week 3:
LDA and QDA (4.3);
RDA (4.3.1), nearest shrunken centroid (18.2), logistic regression (4.4);
penalized logistic regression (18.3.2, 18.4).
Linear regression: LS (3.1-3.2); Subset selection (3.3),
shrinkage methods: ridge, Lasso (3.4.1-3.4.3);
notes
- Download:
Banfield JD, Raftery AE (1993). Model-based Gaussian and non-gaussian clustering.
Biometrics, 49, 803-821.
Note:
Section 2 contains a discussion on the eigen-decomposition of a covariance matrix of a Normal
distribution.
- Download:
Peter J. Bickel, Elizaveta Levina (2004),
Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations.
Bernoulli, 10, 989--1010.
- Download
Tibshirani et al. (2002). Diagnosis of multiple cancer types by shrunken
centroids of gene expression. PNAS, 99, 6567-6572.
- Download:
Xiaohong Huang and Wei Pan (2003),
Linear regression and two-class classification with gene expression data.
Bioinformatics, 19, 2072-2078.
- Download:
Dudoit S., Fridlyand J, Speed T. P. (2002).
Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data.
JASA, 97, 77-87.
- Download:
Fan J, Fan Y (2008). High-dimensional classification using features annealed independence rules. Ann Statist, 36, 2605-2637.
- Download:
Mai, Q., Zou, H., and Yuan, M. (2012). A Direct Approach to Sparse Discriminant Analysis in Ultra-high Dimensions. Biometrika, 99(1), 29-42.
- Download:
Cai T, Liu W. (2011).
A Direct Estimation Approach to Sparse Linear Discriminant Analysis.
JASA, 106, 1566-1577.
- Week 2:
Overview (2.1-2.3);
Model selection and assessment (2.9, 7.10);
read Curse of dimesionality (2.5).
Linear models for classification:
intro (4.1), linear regression (4.2);
notes
- HWK1 due on Feb 2.
(by the end of the day on Canvas).
Note:
It is due by the end of the day on Canvas;
you can
have separate pages for the answers and code, or you can mark
out (or hand-print) your answers with the mixed code and output.
Again no late HWK is accepted unless with a prior approval
or a legitimate reason (e.g. illness).
- Download
WSJ: Big Data Is on the Rise, Bringing Big Questions.
(A subscription may be needed.)
- Download
WSJ: Big Data's Big Problem: Little Talent.
(A subscription may be needed.)
- Download
McKinsey Global InstituteJune 2011 Big data: The next frontierfor innovation, competition,and productivity.
- Download
Donoho D. (2015), 50 years of Data Science.
- Download
Breiman L. (2001), Statistical Modeling: The Two Cultures
(with comments and a rejoinder by the author). Statist. Sci. 16, iss. 3,
199-231.
- Download
Hand, D.J. (2006), Classifier Technology and the Illusion of Progress
(with comments and a rejoinder by the author). Statist. Sci. 21, iss. 1,
1-34.
- Download
S. Guha, R. Hafen, J. Xia, J. Rounds, J. Li, B. Xi, and W. S. Cleveland (2012), Large complex data: divide and recombine (D&R) with RHIPE, Stat 1, 53-67.
- Download
Cleveland W.S. (2001, republished 2014),
Data science: An action plan for expanding the technical areas of the field of statistics.
Statistical Analysis and Data Mining 7, iss. 6, 414-417.
- Download
B. Yu (2014). Let us own data science. Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Conference, Sydney, July, 2014.
- Download
Yang S, et al. (2015). Accurate estimation of influenza
epidemics using Google search data via ARGO. PNAS, 112,
14473-8.
- Download
McKinney SM, (2020). International evaluation of an AI
system for breast cancer screening. Nature, 577, 89-94.
- Download
Hollon TC, et al. (2020). Near real-time intraoperative brain
tumor diagnosis using stimulated Raman histology and deep
neural networks. Nat Med, 26, 52-58.
- Week 1 (one class on W): Introduction (Chapter 1);
notes