UMN Biostatistics Student Seminar
2022-2023
Student Seminar is a bi-weekly seminar series that focuses on student research in the Biostatistics division. Our goal is to enhance students' public speaking abilities in a casual, student-only environment. Both PhD students and MS students are encouraged to present! Presentations can be on a variety of topics, including:
- Your own research! A good opportunity to do a test run of a research presentation.
- Share your skills! In the past, students have presented on useful R packages or their experiences with the job application process.
- Journal-club format - select a paper by someone who is giving a Wednesday Division of Biostatistics seminar (check out the schedule) and introduce us to the topics, with a short discussion!
This school year marks the 9th year of student seminars for the Biostatistics division. The organizers are Jennifer Proper (prope012@umn.edu) and Jonathan Kim (kim00225@umn.edu). If you have any feedback or ideas for future talks, please let us know!
The website will be updated as the year progresses.
Please note that the Fall 2022 seminars will be held in a hybrid format every other Tuesday at noon with options to attend either in-person in MayoA110 or on Zoom. The meeting link will be emailed out before each seminar.
Current Seminars
Click the date/title to reveal each presentation abstract.
September 20, 2022 - TBD
TBD
TBD
October 4, 2022 - Websites in R: Quarto & Github
Quinton Neville
Brief intro to git, Github, & Quarto documents (.QMD, the new and improved .RMD), followed by a guided tutorial for publicly hosting a professional website to share your work with the world - all within the familiar R & Rstudio IDE! At the end we can take a quick tour around a Quarto website with examples of some more advanced topics with the same tools like R packages development & websites and hosting interactive/reactive documents.
October 18, 2022 - Multiple Augmented Reduced Rank Regression
Jiuzhou Wang
Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (i.e., cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variation. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised matrix factorization, and includes a promising novel approach to regression and factorization of a single dataset (aRRR) as a special case. Simulations demonstrate substantial gains in power from combining multiple datasets, and from parsimoniously accounting for all structured variation.
Resources
Past Skillshare Presentations
- Websites in R: Quarto & Github
- A brief introduction to missing data
- Using the student servers
- Using Rcpp to speed up R
- Using Github with RStudio
- SAS ODS & R Officer
- Parallel Computing with MSI
- Alternate method for parallel computing
- Parallel computing with the “doParallel” and “parallel” packages
- Tutorial on mediation analysis using marginal structural models
- A crash course in statistical genetics
External Funding Opportunities
- Doctoral Dissertation Fellowship
- Interdisciplinary Doctoral Fellowship
- MnDRIVE PhD Graduate Assistantship Program
- Interdisciplinary Biostatistics Training Grant in Genetics and Genomics
Job Listings
- ASA Internships - The internship list for summer 2021
- UFL Jobs - Mostly faculty and post docs
- UW Jobs Board - Mostly faculty and post docs
- ASA JobWeb - The ASA list of bio/stats related jobs
- Purdue Jobs - Jobs from universities and industry
- ISBA Jobs - Bayesian friendly positions
- IMSTAT Jobs - Several faculty listings
- USAJobs - Listings Mostly Census, DHHS, DHS, NSF, and DOE
Career Tips
Zoom Recordings
- Websites in R: Quarto & Github by Quinton Neville
- Student Intership Panel - Fall 2021
- A Brief Introduction to Missing Data by Jonathan Kim
- Practical Advice for Gaining Teaching Experience at the U and in the Twin Cities by Aparajita Sur and Sarah Samorodnitsky
- Student Research Panel - Spring 2022
- R Packages and You by Jack Wolf
Contact
E-mail the Student Seminar organizers Jennifer Proper at prope012@umn.edu or Jonathan Kim at kim00225@umn.edu .
Past Seminars
October 1, 2018 - An Introduction to R Shiny
Brian Hart
This week's seminar will be held in Mayo 1250.
R Shiny is an R package that allows you to make interactive web-apps that run R code in the background.
They can be very useful for sharing statistical results and allowing collaborators to explore your results.
In this seminar, I will go over the basics of what a shiny app is, how to build a shiny app, and how to share
a shiny app. I will also share an example of a shiny app I built for fantasy football projections along with
other apps that are available on the internet.
October 29, 2018 - Statistical Consulting During Your Master's Years (and Ph.D. probably): But how?
Adam Kaplan
This week's seminar will be held in Mayo 3-125. To begin, this talk is geared towards Master's students who are eager to begin a consulting experience or are
somewhat unhappy with the apparent lack of opportunities in statistical consulting displayed to Master's students. This talk also applies to Ph.D. students as well.
Initially during the first semester of my Master's years here at the U of M Biostats, I grew tired of taking only classes. Additionally, my online TA-ing under the
phenomenal instructors added little-to-no improvement in my statistician-to-people skills and left my hands empty of real data. Throughout classes we were gifted "clean"
data pruned for an already planned model being taught in the course. Sadly, there is only one course that teaches you Statistical Consulting and how to properly carry
yourself as a statistical consult, taught by Ann Brearley and Kyle Rudsor PUBH 7465 - usually taught in your second year's second semester. The problem with this timing is
that you, as a Master's student (speaking from my perspective), have little experience with clients as well as discussing models in a non-statistical way to
non-statistically leaning people. Additionally, the research experiences given to Master's students does not equate the "research experience" quota that many job/internship
applications request. In this talk, I will discuss various approaches, problems, and treatments to these problems, for a first statistical consulting opportunity. If you
have any suggestions to be added to the presentations or preset questions that I can answer during the presentation, email me at kapla271@umn.edu. I will gladly try to
answer all questions.
November 26, 2018 - A Powerful and Versatile Colocalization Test
Yangqing Deng
This week's seminar will be held in Mayo 3-125. In the post-GWAS era, colocalization testing has been playing an increasingly important role in inferring causal
genetic variants and genes from GWAS trait-associated loci. However, colocalization testing is challenging. We first discuss some severe limitations of the existing methods,
thus motivating our development of a general and powerful method. We use extensive simulations to demonstrate the advantages of our method over existing methods. We apply
our and other methods (when possible) for colocalization analyses of multiple correlated GWAS traits and that of a GWAS trait and gene expression.
December 10, 2018 - Student Research Presentation
Andrew Dilernia
This week's seminar will be held in Mayo 3-125. We aim to concurrently conduct sparse precision matrix estimation and clustering of subjects using a random
covariance method with the EM algorithm. The proposed method is computationally more efficient in the high-dimensional setting than some existing methods due to its use of
the KL Divergence as a dissimilarity measure between matrices rather than an L1 penalty on corresponding matrix entries. The algorithm yields sparse precision matrix
estimates, estimated cluster assignments, and uncertainty measures for these assignments.
March 4, 2019 - Batch Bayesian Optimization Design for Optimizing a Neurostimulator
Adam Kaplan
Mayo 3-125. Biomedical devices that stimulate the spinal cord have indicated quite greatly beneficial results for paraplegic patients as compared to standard physical
therapies. These devices have many grids of values such as the placement along the lower spine, the pulse width, and the frequency. However, there is an apparent scarcity
in the literature on implementing a Phase-I Adaptive clinical trial to assess the grid of setting values for these Spinal Cord Epidural Stimulation (SCES) devices in the
case of one patient. In this presentation, we disclose a possible design for a Phase-I Adaptive clinical trial that effectively provides the patient with monthly sets of
settings to explore and considers an early stopping rule for the trial.
April 1, 2019 - Semiparametric modeling of time-varying activation and connectivity in task-based fMRI data
Jun Young Park
Mayo 3-125. In functional magnetic resonance imaging (fMRI), there is a rise in evidence that the temporal change in the synchronization of brain activity, known as dynamic
functional connectivity (dFC) or time-varying connectivity, provides additional information on brain networks not captured by measures of connectivity that is static over
time. While there have been many developments for statistical models for dFC when the study participants are at rest, there remains a gap in the literature on how to
simultaneously model both dFC and time-varying activation when the study participants are undergoing an experimental task designed to probe at a cognitive process of
interest. We propose a method to estimate the dFC between two regions of interest (ROI) in task-based fMRI where the activation effects are also allowed to vary over time.
Our method uses penalized splines to model both time-varying activation effects and time-varying connectivity, and uses the bootstrap for statistical inference. We validate
our approach using simulations and show that ignoring time-varying activation effects would lead to poor estimation of dFC. Our proposed model, called TVAAC (time-varying
activation and connectivity), can estimate the both static and time-varying activation and functional connectivity. We give an empirical illustration of both time-varying
activation and connectivity by using our proposed method to analyze two subjects in an event-related fMRI learning experiment.
April 15, 2019 - Iterated Multi-Source Exchangeability Models for Individualized Inference with an Application to Mobile Sensor Data
Roland Brown
Mayo 3-125. Researchers are increasingly interested in using sensor technology to collect accurate activity information and make individualized inference about treatments,
exposures, and policies. How to optimally combine population data with data from an individual remains an open question. Multi-source exchangeability models (MEMs) are a
Bayesian approach for increasing precision by combining potentially heterogeneous supplemental data sources into analysis of a primary source. MEMs are a potentially
powerful tool for individualized inference but can integrate only a few sources; their model space grows exponentially, making them intractable for high-dimensional
applications. We propose iterated MEMs (iMEMs), which identify a subset of the most exchangeable sources prior to fitting a MEM model. iMEM complexity scales linearly with
the number of sources, and iMEMs greatly increase precision while maintaining desirable asymptotic and small sample properties. We apply iMEMs to individual-level behavior
and emotion data from a smartphone app and show that they achieve individualized inference with up to 99% efficiency gain relative to standard analyses that do not borrow
information.
April 22, 2019 - Writing High Performance Functions with Rcpp
Maria Masotti
Mayo 3-125. Abstract: Sometimes R code just isn't fast enough. You've tried everything in R to increase performance...but it's still too slow!
The package Rcpp allows you to rewrite key functions in C++ in order to eliminate bottlenecks in your R code (loops, recursive functions, etc.).
This is particularly useful if you want to write your own R package. In fact, as of May 2017, 1,026 packages on
CRAN and a further 91 on
BioConductor deploy Rcpp to accelerate computations and to connect to other C++ projects.
I'll provide a gentle introduction to incorporating C++ into your own R code and into your R packages. I'll also point out great resources that go into much greater detail
into all aspects of Rcpp. Note: you do not need any background in C++ to get started using Rcpp! All coding levels are welcome!
April 29, 2019 - Integrative Factorization of Bidimensionally Linked Matrices
Jun Young Park
Mayo 3-125. Advances in molecular "omics'' technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However,
most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts
on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (e.g., multiple cohorts measured on multiple platforms), which are
increasingly common in large-scale biomedical studies. In this paper, we propose BIDIFAC (Bidimensional Integrative Factorization) for integrative dimension reduction and
signal approximation of bidimensionally linked data matrices. Our method factorizes the data into (i) globally shared, (ii) row-shared, (iii) column-shared, and (iv)
single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation we use a penalized objective function that
extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from random matrix theory to choose
tuning parameters. We apply our method to integrate two genomics platforms (mRNA and miRNA expression) across two sample cohorts (tumor samples and normal tissue samples)
using the breast cancer data from TCGA.
May 6, 2019 - Detecting Participant Noncompliance Across Multiple Time Points
Ross Peterson
Mayo 3-125.
Background/Aims: Participant noncompliance, in which participants do not follow their assigned treatment protocol, has long complicated the interpretation and
conduct of
randomized clinical trials. No gold standard exists for detection of noncompliance, but participants biomakers can suggest exposure to non-study treatments. However,
existing methods can only detect noncompliance based on a single biomarker measurement. We propose a novel method that uses longitudinal biomarker data to model compliance
across time when compliance is unobserved. Conditional on longitudinal biomarker data, our method can estimate: 1) the probability of compliance at a single time point of
the trial; 2) the probability of compliance at all time points; and 3) the prediction probability of compliance at a future time point.
Methods: We model the joint distribution of the biomarker as a mixture density across time points, in which joint compliance probabilities serve as the weights and
joint
biomarker densities conditional on compliance serve as the components. To derive the mixture density, we assume that both compliance and the biomarker were generated from
corresponding mixed effects models. Modeling the biomarker as a mixture density allows us to calculate compliance probabilities that condition on the longitudinal biomarker
data. To evaluate the accuracy of the compliance probabilities, we conduct a Monte Carlo simulation study across three different effects of compliance on the biomarker. We
compare probability estimators 1) and 2) to those that ignore the longitudinal correlation in the data according to AUC. As 3) does not have a naive comparator, we plotted
its calibration lines.
Results: Across all three compliance effects on the biomarker, conditioning on the longitudinal biomarker data uniformly raised AUC. For estimating the probability
of
compliance at a single time point, conditioning on participants' full biomarker history increased AUC by 4-5 percentage points relative to only conditioning on their most
recent biomarker measurement. For full compliance, adjusting for the longitudinal data correlation boosted AUC by 8-10 percentage points relative to ignoring the
correlation. The calibration lines for the prediction of compliance closely approximated perfect calibration.
Conclusion: Compared to existing methods that can only use a single biomarker measurement, our method can use all biomarker measurements to more accurately identify
noncompliant participants. Our method can also use participants' biomarker history to predict compliance at a future time point.
Background/Aims: Participant noncompliance, in which participants do not follow their assigned treatment protocol, has long complicated the interpretation and conduct of randomized clinical trials. No gold standard exists for detection of noncompliance, but participants biomakers can suggest exposure to non-study treatments. However, existing methods can only detect noncompliance based on a single biomarker measurement. We propose a novel method that uses longitudinal biomarker data to model compliance across time when compliance is unobserved. Conditional on longitudinal biomarker data, our method can estimate: 1) the probability of compliance at a single time point of the trial; 2) the probability of compliance at all time points; and 3) the prediction probability of compliance at a future time point.
Methods: We model the joint distribution of the biomarker as a mixture density across time points, in which joint compliance probabilities serve as the weights and joint biomarker densities conditional on compliance serve as the components. To derive the mixture density, we assume that both compliance and the biomarker were generated from corresponding mixed effects models. Modeling the biomarker as a mixture density allows us to calculate compliance probabilities that condition on the longitudinal biomarker data. To evaluate the accuracy of the compliance probabilities, we conduct a Monte Carlo simulation study across three different effects of compliance on the biomarker. We compare probability estimators 1) and 2) to those that ignore the longitudinal correlation in the data according to AUC. As 3) does not have a naive comparator, we plotted its calibration lines.
Results: Across all three compliance effects on the biomarker, conditioning on the longitudinal biomarker data uniformly raised AUC. For estimating the probability of compliance at a single time point, conditioning on participants' full biomarker history increased AUC by 4-5 percentage points relative to only conditioning on their most recent biomarker measurement. For full compliance, adjusting for the longitudinal data correlation boosted AUC by 8-10 percentage points relative to ignoring the correlation. The calibration lines for the prediction of compliance closely approximated perfect calibration.
Conclusion: Compared to existing methods that can only use a single biomarker measurement, our method can use all biomarker measurements to more accurately identify noncompliant participants. Our method can also use participants' biomarker history to predict compliance at a future time point.
September 26, 2019 - Non-parametric estimation in an illness-death model with component-wise censoring
Anne Eaton
In disease settings where patients are at risk for death as well as a serious non-fatal event, a composite endpoint defined as the time until the earliest of either death or the non-fatal event can be used to measure prognosis. If the non-fatal event can only be detected at clinic visits, the non-fatal event is interval censored, while date of death is usually known exactly, leading to "component-wise censoring". The method recommended by the FDA to estimate event-free survival for this type of data fails to account for component-wise censoring. We apply an existing non-parametric method (referred to as the Sun et al method) in a novel way to produce unbiased estimates of event-free survival, and use simulations to compare this method to the FDA method and a parametric method. The Sun et al method performs well if patients' visit schedule is independent of their non-fatal event status. We propose an estimator that relaxes this assumption, and explain the intuition behind it. Finally, we illustrate the methods on data from the MRFIT trial, which tested a multifactor intervention aimed at lowering cholesterol and blood pressure to reduce coronary heart disease, using the composite endpoint cardiovascular event free survival.
October 10, 2019 - Estimating the Effect of Somatic Mutations on Survival Using a Pan-Cancer and Polygenic Bayesian Hierarchical Model
Sarah Samorodnitsky
Through my RA with Dr. Eric Lock, we built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to “borrow” information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues-of-origin or tumor types. The effect of a mutation at each gene was allowed to vary by cancer type while the mean effect of each gene was shared across cancers. Within this framework we considered four parametric survival models (normal, log-normal, exponential, and Weibull), and we compared their performance via a cross-validation approach in which we fit each model on training data and estimate the log-posterior predictive likelihood on test data. We concluded the log-normal model fit the data best and proceeded to investigate the partial effect of each gene on survival via a forward selection procedure. Through this we determined that mutations at TP53, MUC5B, and PIK3CA were together the most useful for predicting patient survival. We validated the model via simulation to ensure that our algorithm for posterior computation gave nominal coverage rates.
October 24, 2019 - Using Github for version control on R code
Chuyu Deng
A tutorial on what is version control, why you should use version control in your research, and how to implement Github in RStudio for this purpose. Presentation slides here!
November 7, 2019 - SAPTrees: Using Conditional Inference Trees to Characterize Heterogeneity in Human Activity Patterns
Roland Brown
Sensor technology has revolutionized our ability to collect objective data on human activity. In particular, the ubiquity of smartphones has allowed researchers to gather a wide range of "digital biomarkers" that can be used to quantify the effects of biobehavioral interventions at both the individual and system level. This paper focuses on the problem of characterizing human activity patterns inferred from smartphone sensor data. Viewing each individual's data as a sequence, and building on techniques for sequence alignment in genetics and genomics, we propose a novel method, the Sequential Activity Pattern Tree (SAPTree) for characterizing population subgroups that are relatively homogeneous with respect to their activity sequences. Our proposed method follows a two-step approach, first calculating a pairwise sequence distance matrix between the activity sequences, followed by a novel implementation of conditional inference trees (CTrees) that utilizes multivariate distance matrix regression in determining splits. One key benefit of our method is that, unlike many data-driven procedures for characterizing heterogeneity, it controls the Type I error rate, i.e., the probability of detecting heterogeneous subgroups when none exist. We additionally present visualizations which can help researchers interpret output from sequence-based decision trees, and apply our method to human activity data collected via smartphone sensors in a recent Minneapolis-area study.
November 14, 2019 - Student Panel on Summer Internships
Ales Kotalik, James Normington, Roland Brown, Torri Simon, Chuyu Deng, Shannon McKearnan
Come to find out more about the different summer internships that our students have done in the past including the FDA, Novartis, IDA, Boston Scientific, Mayo Clinic, Genentech and HealthPartners! We will have general prepared questions for our panelists, and then an open discussion.
November 21, 2019 - Bayesian variable selection in hierarchical difference-in-differences models
James Normington
A popular method for estimating a causal treatment effect with observational data is the difference-in-differences (DiD) model.
In this work, we consider an extension of the classical DiD setting to the hierarchical context. We propose a Bayesian hierarchical difference-in-differences (HDiD) model which estimates the treatment effect by regressing the treatment on a latent variable representing the mean change in group-level outcome. We present theoretical and empirical results showing that a HDiD model that fails to adjust for a particular class of confounding variables, or confounding with the baseline (pre-treatment) outcomes, biases the treatment effect estimate. We propose and implement various approaches to perform variable selection using a structured Bayesian spike-and-slab model in the HDiD context. Our proposed methods leverage the temporal structure within the DiD context to select those covariates that lead to unbiased and efficient estimation of the causal treatment effect. We evaluate the methods' properties through simulation, and we use them to assess the impact of primary care redesign of clinics in Minnesota on the management of diabetes outcomes from 2008 to 2017.
December 5, 2019 - Permutation-based Inference for Spatially Localized Signals in Longitudinal MRI Data
Jun Young Park
Alzheimer's disease is a neurodegenerative disease in which the degree of cortical atrophy in specific structures of the brain serves as a useful imaging biomarker. A massive-univariate analysis, a simplified approach that fits a univariate model for every vertex along the cortex, is insufficient to model cortical atrophy because it does not account for the spatial relatedness of cortical thickness from magnetic resonance imaging (MRI), and it can suffer from Type I error rate control. Using the longitudinal structural MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), we develop a permutation-based inference procedure to detect spatial clusters of vertices showing statistically significant differences in the rates of cortical atrophy. The proposed method uses spatial information to combine the signals adaptively across nearby vertices, yielding high statistical power while maintaining an accurate family-wise error rate (FWER). When the global null hypothesis is rejected, we use a cluster selection algorithm to identify the spatial clusters of significant vertices. We validate our method using simulation studies and apply it to the ADNI data to show its superior performance over existing methods.
February 6, 2020 - SAS ODS & Officer Skillshare
Torri Simon
One of the most time-consuming tasks statisticians face is formatting tables and figures to meet journal
requirements or appeal to client needs. For this talk, I’ll present two tools to make your job easier,
regardless of career path! SAS ODS in an output delivery system, similar to R markdown, that allows you
to send your tables and graphics to a document with very little effort. (No more using the snipping tool
to copy/paste your output!). Similarly, the R package officer allows you to add or remove tables and
graphs from Microsoft documents. The package differs from R markdown in that the plots and figures
exported remain editable in the new documents, making it easy for collaborators or clients to adjust the
figures to their liking.
Slides and related materials are available in the Resources section below.
February 20, 2020 - Penalized model-based clustering of fMRI data
Andrew DiLernia
Functional magnetic resonance imaging (fMRI) data have become increasingly available and are useful for describing functional connectivity (FC), the relatedness of neuronal activity in regions of the brain. This FC of the brain provides insight into certain neurodegenerative diseases and psychiatric disorders, and thus is of clinical importance. To help inform physicians regarding patient diagnoses, unsupervised clustering of subjects based on FC is desired, allowing the data to inform us of groupings of patients based on shared features of connectivity. Even within these groups of patients, heterogeneity across patients in FC is still present. As such, it is important to allow subject-level differences in connectivity, while still pooling information across patients within each group to describe group-level FC. To this end, we propose a random covariance clustering model (RCCM) to concurrently cluster subjects based on their FC networks, estimate the unique FC networks of each subject, and to infer shared network features. Although current methods exist for estimating FC or clustering subjects using fMRI data, our novel contribution is to cluster or group subjects based on similar FC of the brain while simultaneously providing group- and subject-level FC network estimates. The competitive performance of RCCM relative to other methods is demonstrated through simulations in various settings, achieving both improved clustering of subjects and estimation of FC networks. Utility of the proposed method is demonstrated with application to a resting-state fMRI data set collected on 43 healthy controls and 61 participants diagnosed with schizophrenia.
March 5, 2020 - Batch Bayesian Optimization Design for Optimizing a Neurostimulator
Adam Kaplan
Recently, spinal epidural neurostimulation is being considered for rehabilitation of persons suffering from partial spinal cord injury. The neurostimulator must be programmed by a neurosurgeon, yet little work has been done to develop rigorous methods for optimally programming the device. We propose an adaptive design to efficiently optimize programming of the neurostimulator based on monthly evaluations of patient reported preferences. Preferences for the eligible device configurations are estimated after each month through a conditionally auto-regressive model that assumes preference for one configuration is related to preferences for neighboring configurations. Using the adaptively updated preferences, a group of configurations is programmed into the device for the patient to evaluate during the next month. This selection is based on a balance of device exploration and preference maximization. We repeat this process until a specified stopping rule or the trial-end is reached. We show simulation studies to evaluate the overall quality of the adaptive trial for various configuration selection strategies and the effects of stopping the trial early.
March 19, 2020 - MSI Skillshare
Shannon McKearnan, Chuyu Deng, Mengli Xiao
This presentation will be on how we use the Minnesota Supercomputing Institute (MSI) for parallelizing our work and how you can too! We will go through an example workflow for beginner users, and we'll also have additional tips and tricks for more experienced users.
April 2, 2020 - Intro to SMART Designs & Missing Data
Aparajita Sur
Adaptive interventions/dynamic treatment regimes provide a method to individualize sequences of treatment. However, critical questions must be answered to develop a quality adaptive intervention (ie. should a treatment be augmented, switched, or continued if an individual does not initially respond?). Additionally, missing data is a common predicament of many studies, but the nature of missing data in studies with adaptive interventions invites a unique set of challenges. This talk will give a brief introduction to SMART (sequential multiple assignment randomized trials) studies and their role in building adaptive interventions, missing data mechanisms, methods to handle missing data and the nature of missing data in SMART studies. Given that SMARTs were only introduced into the mainstream less than a decade ago, there is plenty of room for biostatisticians to get to work on this novel area of research!
April 16, 2020 - Early Career Panel
Thomas Murray, Jeff Boatman, Sarah Boatman, Ziyu Ji
Join us at Student Seminar this week for a very exciting early career panel! We will have representatives with experience working in academia and industry with PhD and MS degrees on our Zoom call to talk about their experiences working and applying for jobs post-graduation and answer your burning questions.
April 30, 2020 - Estimating longitudinal causal effects with participant noncompliance and non-normal confounders
Ross Peterson
Participant noncompliance, in which participants do not follow their assigned treatment protocol, often obscures the causal relationship between treatment and treatment effect in randomized trials. In the longitudinal setting, the G-computation algorithm can adjust for confounding to estimate causal effects. Typically, G-computation assumes that compliance is known and that the confounders are normally distributed. We aim to develop a G-computation estimator in the setting where both assumptions are violated. In place of compliance, we substitute in probability weights derived from modeling a biomarker associated with compliance. We specify the joint conditional confounder density as a factorization. To generate random samples of the non-normal confounders, we use predictive mean matching, in which the predicted values of the model fit given randomly generated data are matched with similar observed values. In simulation and application, we compare multiple causal estimators already established in the literature with those derived from our method.
September 24, 2020 - Internship Panel
Zheng Wang, Jiuzhou Wang, Yuan Zhang, Shannon McKearnan and Chuyu Deng
Come to find out more about the different summer internships that our students have done in the past including TechData Service, Pfizer, Merck, Takeda, Genentech, Boston Scientific, and HealthPartners! We will have general prepared questions for our panelists, and then an open discussion.
October 8, 2020 - "But why does it work?" A mediation analysis tutorial using marginal structural models
Grace Lyden
Most randomized clinical trials are concerned with whether treatment A has a causal effect on outcome Y, but once that treatment effect is confirmed, many investigators want to know why the treatment works. Enter: Mediation analysis, our statistical friend who helps us uncover mechanisms of causal effects! Sometimes a mediation analysis is as simple as fitting a regression model, but other times we need more sophisticated methods. In this tutorial, I go step-by-step through a mediation analysis from my RA on pregnancy planning and preterm birth, based on the hypothesis that the harmful effects of unplanned pregnancy on preterm birth are partially mediated by oxidative stress. Traditional methods do not work here, for reasons that will be described, and marginal structural models are employed to correctly estimate the causal effects of interest.
October 22, 2020 - A Crash Course in Statistical Genetics: A brief overview of concepts and methods with application examples
Rachel Zilinskas
I will talk mainly about common concepts such some basic definitions and data types, data pre-processing steps, and applications (eQTL analyses, polygenic risk scores, heritability estimates, and causal inference using 2SLS or Mendelian Randomization)
November 19, 2020 - A Hierarchical Spike-and-Slab Model for Pan-Cancer Survival Using Pan-Omic Data
Sarah Samorodnitsky
Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer, expanding what was known from single-cancer or single-omics studies. However, present methods are limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict important clinical outcomes, like overall survival. We address the issue of prediction across multiple high-dimensional sources of data and multiple sample sets by using exploratory results from BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a predictive model. We propose a Bayesian hierarchical framework with hierarchical spike-and-slab priors that allow for the borrowing of information across naturally clustered data to inform variable selection. We use this method to predict overall patient survival from the Cancer Genome Atlas (TCGA) using data from 29 cancer types and 4 omics sources. Our proposed model selected patterns of variation identified by BIDIFAC+ that differentiate clinical tumor subtypes with markedly different survival outcomes. We also evaluate the performance of our proposed Bayesian model and variations on it using simulations to characterize each model's flexibility in fitting different underlying data-generating frameworks.
December 3, 2020 - MSI with Slurm Tutorial
Shannon McKearnan, Jack Wolf, Chuyu Deng
MSI (Minnesota Supercomputing Institute) is a high-powered computing tool for parallel processing and is a valuable resource for use in your own research. This presentation will look very similar to the tutorial we gave on MSI last semester, however MSI has a new interface called Slurm that we'll be discussing and introducing.
February 16, 2021 - Augmenting Analyses of Randomized Controlled Trial Data with External Data
Lillian Haine
Randomized controlled trials (RCTs) are the most rigorous way to find unbiased treatment effect estimates. However they are often criticized for being costly in both time and money. As such, people often look to leverage external data on one or both trial arms to increase efficiency of RCTs and decrease necessary sample size through data borrowing. Data borrowing approaches often occur in the Bayesian paradigm and have been extensively researched in literature. One such approach is multisource exchangeability modeling (MEMs). However they assume the external data is a previously conducted trial data from the same underlying population as the current trial. These borrowing approaches are not flexible enough to apply to an external data source that is observational, as this assumption is often violated. We look to extend the MEM borrowing approach to incorporate observational data into RCTs through a semi-supervised mixture distribution using multisource exchangeability models (SS-MIX-MEM).
March 2, 2021 - A Gentle Introduction to Bayesian Multiway Distance Weighted Discrimination
Jonathan Kim
This will be a general overview of my research area with an emphasis on accessibility to students who don't have prior knowledge of the subject area. If you're curious about what terms like "multiway" or "distance weighted discrimination" mean, this is the talk for you! This is NOT a formal methods presentation. Instead, the goal is to show where this topic falls in the larger body of statistical knowledge and how I arrived at it as a topic of research.
March 16, 2021 - Survival Probability ROC Curves for Time to Event Data in Clinical Trials
Sandra Castro-Pearson
This will be a brief intro to some basics of survival analysis (Kaplan-Meier) and ROC curves for folks who have not had previous experience with those. I will follow that with my current dissertation research where we use the ROC framework to combine survival curves from two arms of a trial in order to visualize the treatment effect as well as possible effect modifiers. If we have time I'll review the Cox model and how our ROC approach can be used to improve the interpretation of the hazard ratio and interaction terms.
April 13, 2021 - Evaluating the protection of multiple imputation and the need for sensitivity analyses in SMART studies with non-random missingness
Aparajita Sur
Missing data can compromise the validity of inference, particularly when the missingness is not random (MNAR). The complex structure of sequential multiple assignment randomized trials (SMART) presents unique challenges when handling missing data. Recently an adapted multiple imputation (MI) strategy was proposed to analyze incomplete SMART data. But this imputation framework relies on the unverifiable assumption that the data is missing at random (MAR).
While MI may provide some protection when the MAR assumption is violated in some contexts, it is unclear how these violations will affect an analysis in a SMART setting. We will discuss the performance of MI when there is non-random missingness in a SMART study and the motivation for (and general idea of) sensitivity analyses.
September 28, 2021 - Biostatistics Committees Fair
Presentation
Come learn about all of the ways that you can get involved in the various committees groups
within the department of Biostatistics and the School of Public Health!
October 12, 2021 - Internship Panel
Jennifer Proper, Rachel Zilinskas, Jonathan Kim, Lily Haine, and Torri Simon
Come learn about the application process and the experience of current students who have interned at various places in academia, industry, and government.
October 26, 2021 - An Overview of Missing Data
Jonathan Kim
A basic introduction to missing data including commonly used simple approaches and an overview of more sophisticated approaches including multiple imputation and maximum likelihood methods.
November 9, 2021 - An introduction to data.table
Justin Clark
This presentation will give an overview of `data.table`, an R package designed to make manipulating data in R faster and more fun! I will cover the basics of data frames and data tables in R: how they're different, how they're similar, and, hopefully, how using a package like `data.table` can make your code more readable, consistent, and concise!
November 30, 2021 - Alumni Career Panel (Tentative)
TBA
Stay tuned for a list of all-star alumni who will dazzle us with their knowledge and experience in industry, academia, and government careers!
December 14, 2021 - Practical Advice for Gaining Teaching Experience at the U and in the Twin Cities
Aparajita Sur and Sarah Samorodnitsky
January 25, 2022 - Oral Prelim Exam Practice
Jennifer Proper
Randomized controlled trials (RCTs) are widely regarded as the gold standard for studying the efficacy of new treatments. Randomly assigning participants to either treatment or control precludes selection bias, provides a sound basis for valid statistical inference, and, on average, balances groups with respect to both known and unknown confounders so that unbiased treatment effect estimates can be obtained. However, in recent years, meta-analysis has become an increasingly popular statistical method for supporting evidence-based medicine and informing decision-making in health care. Meta-analysis (MA) allows for the synthesis of data from multiple RCTs and produces an overall estimate of a treatment effect, which advantageously increases the statistical power of a treatment comparison and provides more definitive answers to scientific questions compared to individual trials, which may present conflicting results. Because traditional MA methods are limited to pairwise comparisons, network meta-analysis was developed to simultaneously compare multiple treatments and strengthen inference through the incorporation of both direct and indirect evidence. This thesis proposal proposes new methods for Bayesian response-adaptive randomization and network meta-analysis.
February 8, 2022 - Research Panel
Jennifer Proper, Sarah Samorodnitsky, Maria Masotti, and Andy Becker
This seminar will involve a research panel consisting of current 4th and 5th year students that will discuss their research experiences in our program and what they have learned (or wish they knew sooner). Topics that will be discussed include (but are certainly not limited to) the following: how and when we chose our current research advisor(s), how to conduct productive research meetings, our communication style with our advisor(s), how to structure your time, and how and when to schedule your oral prelim (or plan B).
February 22, 2022 - Planning a Platform Trial with Real World Data Borrowing
Lily Haine
Designing a randomized controlled trial takes a substantial amount of statistical thought and planning. This seminar will look at some aspects of a platform trial design and aspects that change when increasing trial complexity by borrowing from external data. This is going to break down some of the new statistical areas that we now need to think about and options for dealing with these complexities. Also hoping to have a fun discussion with all attendees to see what other ideas they might have and potential trial design aspects I might have missed!
March 15, 2022 - An Overview of the Regulation of Medical Devices
Sarah Leismer
This talk will cover a quick history of the FDA, explanation of the FDA's role in regulating medical devices, definition and basics of medical devices, and the different pathways to study a new device and bring it to market.
March 29, 2022 - R Packages and You
Jack Wolf
R Package development has flourished over the past five years. In 2016 only three new R packages were added to the Comprehensive R Archive Network (CRAN) each day. In 2021 that rate shot up to 17 new daily packages and in 2022 the R community currently averages 36 new packages per day! This increase, while driven by many other factors, owes a lot to the wealth of resources and tools for package development that have recently entered the R atmosphere. While back in the dark ages creating a package would have been a long and arduous task, today’s tools alleviate most of the technical complexities of package development and allow you to focus on what matters. This talk will showcase a selection of the tools that help automate package development while encouraging you to go forth and create a package of your own.
April 12, 2022 - Working with Medicare Claims Data and Identifying Transfers
Michelle Sonnenberger
This seminar will discuss the basics of what the different types of Medicare claims data there are, how to work with them, examples of SAS code used when working with Medicare data, and discuss identifying transfers within medicare claims.
- Last Updated: January 23, 2020
- Design: HTML5 UP