PubH 8475/Stat 8056

PubH 8475/Stat 8056 Advanced Topics on Machine Learning

Spring 2024

Instructors: Dr. Wei Pan, panxx014@umn.edu
Dr. Xiaotong Shen, xshen@umn.edu
Class: 9:45 AM - 11:00 AM, M&W, Health Sciences Edu Ctr 2-132.

Syllabus
Link to Dr. Shen's part.
Download R
Info related to "The Elements of Statistical Learning".
Book PDF of "An Introduction to Statistical Learning".

Week 13 M: Network analysis (community detection).
notes.
Group 9 reading list:

Download: Neuman MEJ. Detecting community structure in networks.
Download: Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. (2008). Fast unfolding of communities in large networks. arXiv:0803.0476
Download, or preprint: Zhao Y, Levina E, Zhu J (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. Volume 40, Number 4 (2012), 2266-2292.
Download: Fortunato S (2010). Community detection in graphs. Physics Reports 486, 75-174.
Download: David Meunier, Renaud Lambiotte and Edward T. Bullmore (2010). Modular and hierarchically modular organization of brain networks. Front. Neurosci., 4, 200.
Download: Langfelder, P., Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

Week 13 W: Semi-supervised learning.
notes.
Group 8 reading list:

Download: Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton, (2020). A Simple Framework for Contrastive Learning of Visual Representations. ICMK 2020.
Download: Peng Liu , Yusi Fang, Zhao Ren, Lu Tang, George C. Tseng (2021). Outcome-Guided Disease Subtyping for High-Dimensional Omics Data. arXiv:2007.11123
Download: Wagstaff et al (2001). Constrained K-means Clustering with Background Knowledge.
Download: Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect population stratification. Frontiers in Genetics. 4:215. doi:10.3389/fgene.2013.00215.
Download: Wang J, Shen X, Pan W. (2009). On efficient large margin semisupervised learning: method and theory. Journal of Machine Learning Research, 10, 719-742.
Download: Wang, J., Shen, X., and Pan, W. (2006). On transductive support vector machines. Contemp. Math., 43, 7-19.
Download: Wei Pan, Xiaotong Shen, Aixiang Jiang, and Robert P. Hebbel (2006). Semi-supervised learning via penalized mixture model with application to microarray sample classification. Bioinformatics, 22, 2388-2395.

Week 12: Causal machine learning.
notes.
Group 7 reading list:

Download: Angrist, J.D. and G.W. Imbens (1995). Two-stage least squares estimation of average causal effect in models with variable treatment intensity. JASA, 90(430): 431-442.
Download: Angrist, J.D., G.W. Imbens, and D.B. Rubin (1996). Identification of causal effects using instrumental variables. JASA, 91: 444-472.
Download: Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C, Morrison JV, Pan W, Relton CL, Theodoratou E. Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res. 2023 Aug 4;4:186. doi: 10.12688/wellcomeopenres.15555.3. PMID: 32760811; PMCID: PMC7384151.
Download: Xue H, Shen X, Pan W. Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Am J Hum Genet. 2021 Jul 1;108(7):1251-1269.
Download: Feizi S, Marbach D, MÃ©dard M, Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol. 2013 Aug;31(8):726-33.
Download: Lin Z, Xue H, Pan W. Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data. PLoS Genet. 2023 May 18;19(5):e1010762.
Download: Jason Hartford, Greg Lewis, Kevin Leyton-Brown, Matt Taddy. Deep IV: A Flexible Approach for Counterfactual Prediction. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1414-1423, 2017.
Download: He R, Liu M, Lin Z, Zhuang Z, Shen X, Pan W. DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies. Biostatistics. 2023 Jan 4:kxac051. doi: 10.1093/biostatistics/kxac051. Epub ahead of print. PMID: 36610078.
Download: Yao Y, Charkraborty D, Zhang L, Shen X; Alzheimer's Disease Neuroimaging Initiative; Pan W. Deep causal feature extraction and inference with neuroimaging genetic data. Stat Med. 2023 Sep 10;42(20):3665-3684.
Download: PAUL R. ROSENBAUM, DONALD B. RUBIN (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.
Download: Shiba K, Kawahara T. Using Propensity Scores for Causal Inference: Pitfalls and Tips. J Epidemiol. 2021 Aug 5;31(8):457-463.
Download: Dorie et al (2019). Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition. Stat Sci.
Download: Lu M, Sadiq S, Feaster DJ, Ishwaran H. Estimating Individual Treatment Effect in Observational Data Using Random Forest Methods. J Comput Graph Stat. 2018;27(1):209-219.
Download: Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7353-60.
Download: Susan Athey, Stefan Wager. (2019). Estimating Treatment Effects with Causal Forests: An Application.

Week 9: Graphical models.
notes,
Group 6 reading list:

Download: Chen L, Li C, Shen X, Pan W (2023). Discovery and Inference of a Causal Network with Hidden Confounding. JASA.
Download: Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441.
Download: Guo, J., Levina, E., Michailidis, G. and Zhu (2010). Joint estimation of multiple graphical models. Biometrika, 98, 1-15.
Download: Jankova, Jana and van de Geer, Sara (2018). Inference in high-dimensional graphi cal models. arXiv preprint arXiv:1801.08512.
Download: Jiao R, Lin N, Hu Z, Bennett DA, Jin L, Xiong M. (2018). Bivariate Causal Discovery and Its Applications to Gene Expression and Imaging Data Analysis. Front Genet. 9:347.
Download: Jordan, M.I. (2004). Graphical models. Statistical Sciences, 19, 140-155.
Download: Li, C., Shen, X., and Pan, W. (2020). Likelihood ratio tests for a large directed acyclic graph. Journal of American Statistical Association. 115, 1304-1319.
Download: Li, C., Shen, X., and Pan, W. (2023). Inference for a large directed acyclic graph with unspecified interventions. arXiv:2110.03805. Journal of Machine Learning Research, 24, 73.
Download: Li C, Shen X, Pan W. (2023). Nonlinear causal discovery with confounders. JASA.
Download: Mazumder, R., and Hastie, T. (2012) The graphical lasso: New insights and alternatives. Electrical Journal of Statistics, 6, 2125-2149.
Download: Peters, J. and Buhlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika, 101, 219-228.
Download: Zhu, Y., Shen, X. and Pan, W. (2014). Structural pursuit over multiple undirected graphs. Journal of American Statistical Association. 109, 1683-1696.

Week 4: FNNs (Chapter 11); CNNs;
FNN&CNN notes, R/Keras FNN&CNN, An example application.

Example 7.1.r FNN.
Group 4 reading list:

Download: LeCun et al (1998). Gradient-based learning applied to document recognition. Proc of IEEE. (Comment: Section I. p.5-7 most helpful to understand convolutional NNs.)
Download: Krizhevsky A, Sutskever I, Hinton G. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NeurIPS.
Download: Zhou J and Troyanskaya OG (2015). Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12, 931-934.
Download: Silver et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484-489.
Download: Xiao M, Shen X, Pan W. (2019). Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images. Genetic Epi, 43(3), 330-341.
Download: Zhuang Z, Shen X, Pan W. (2019). A simple convolutional neural network for prediction of enhancer-promoter interactions with DNA sequence data. Bioinformatics, 35(17), 2899-2906.
Download: Fan J, Ma C, Zhong Y. (2019). A Selective Overview of Deep Learning. arXiv:1904.05526.
Download: Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK (2018) Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 15(11): e1002683. doi:10.1371/journal.pmed.1002683.
Download: Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, Jonathan K. Su (2019). This Looks Like That: Deep Learning for Interpretable Image Recognition. Advances in Neural Information Processing Systems 32 (NeurIPS 2019).
Download: Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision (IJCV) 2019.
Download here: Ben Dai, Xiaotong Shen, Lin Yee Chen, Chunlin Li, Wei Pan. Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation. AOAS 2023.

Group 3 reading list:

Download: Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96 (456), 1348-1360.
Download: Zou H (2006), The Adaptive Lasso and Its Oracle Properties. JASA, 101, 418-1429.
Download: Zou H, Hastie T (2005), Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301-320.
Download: Austin E, Pan W, Shen X. (2013). Penalized Regression and Risk Prediction in Genome-Wide Association Studies. Stat Anal Data Min. 6(4). doi: 10.1002/sam.11183.
Download: Zhu Y, Shen X, Pan W (2013). Simultaneous grouping pursuit and feature selection over an undirected graph. JASA, 108, 713-725.
Download: Kim S, Pan W, Shen X (2013). Network-based penalized regression with application to genomic data. Biometrics. 69(3), 582-593.
Download: Friedman J, Hastie T, Hoefling H, Tibshirani R (2007). Pathwise Coordinate Optimization. The Annals of Applied Statistics, 2(1), 302–332.
Download: Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
Download: S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1-122.
Download: Shi C, Song R, Chen Z, Li R. (2019). Linear hypothesis testing for high dimensional generalized linear models. Ann Stat, 47(5), 2671-2703.
Download: Zhu Y, Shen X, Pan W. (2020). On High-Dimensional Constrained Maximum Likelihood Inference. JASA, 115(529), 217-230.
Download: Dezeure R, Buhlmann P, Meier L and Meinshausen N (2015). High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi. Stat Sci, 30(4), 533-558.
Download: Ben Dai, Xiaotong Shen, Wei Pan. (2022). Significance tests of feature relevance for a black-box learner. IEEE Transactions on Neural Networks and Learning Systems.
Download: Fan J, Lv J (2008). Sure independence screening for ultrahigh dimensional feature space. JRSS-B 70, 849-911.
Download: Chun H and Keles S (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. JRSS-B, 72(1):3-25. (R packages "spls")

HWK1 due on Feb 19.
Example 3.2: use PCR and PLS, plots.
Example 3.4: High-dimensional inference using LASSO.
Example 3.3: use other penalties. plot 1: Enet, SCAD and TL P; plot 2: group penaltie s; plot 3: 2-d Fused Lasso.
Example 3.1: use LASSO, plots; Lars.
Week 3: High-dimensional data for GLMs. Linear regression: LS (3.1-3.2); Subset selection (3.3), shrinkage methods: ridge, Lasso (3.4.1-3.4.3; 3.8), other penalties; penalized logistic regression (4.4); inference; PCR and PLS (3.5). notes
Group 1 reading list:

Download WSJ: Big Data Is on the Rise, Bringing Big Questions. (A subscription may be needed.)
Download WSJ: Big Data's Big Problem: Little Talent. (A subscription may be needed.)
Download McKinsey Global InstituteJune 2011 Big data: The next frontierfor innovation, competition,and productivity.
Download Donoho D. (2015), 50 years of Data Science.
Download Breiman L. (2001), Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statist. Sci. 16, iss. 3, 199-231.
Download Hand, D.J. (2006), Classifier Technology and the Illusion of Progress (with comments and a rejoinder by the author). Statist. Sci. 21, iss. 1, 1-34.
Download S. Guha, R. Hafen, J. Xia, J. Rounds, J. Li, B. Xi, and W. S. Cleveland (2012), Large complex data: divide and recombine (D&R) with RHIPE, Stat 1, 53-67.
Download Cleveland W.S. (2001, republished 2014), Data science: An action plan for expanding the technical areas of the field of statistics. Statistical Analysis and Data Mining 7, iss. 6, 414-417.
Download B. Yu (2014). Let us own data science. Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Conference, Sydney, July, 2014.
Download Yang S, et al. (2015). Accurate estimation of influenza epidemics using Google search data via ARGO. PNAS, 112, 14473-8.
Download McKinney SM, (2020). International evaluation of an AI system for breast cancer screening. Nature, 577, 89-94.
Download Hollon TC, et al. (2020). Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Nat Med, 26, 52-58.

Week 1 (one class on W): Introduction