Wei Pan's Reference List for Microarray Data Analysis
Wei Pan's Reference List for Microarray Data Analysis
(With Wei Pan's comments in parentheses)
Spring 2002
- Introduction: biology of microarray technology
- Brown P and Botstein D. Exploring the new world of the
genome with DNA microarrays. Nature Genetics
Supplement 21:33-37, 1999.
(A general introduction on cDNA array technology and its
applications)
- Duggan DJ, Bittner M, Chen Y, Meltzer P and Trent JM.
Expression profiling using cDNA microarrays.
Nature Genetics Supplement 21:10-14, 1999.
(Intro to cDNA arrays)
- Lipshutz RJ, Fodor SPA, Gingeras TR and Lockhart DJ.
High density synthetic oligonucleotide arrays.
Nature Genetics Supplement 21:20-24, 1999.
(Intro to Affy oligonucleotide arrays)
- Affymetrix Inc. Statistical Algorithms Reference Guide.
(Intro to algorithms being used to summarize gene
expression levels for Affymetrix Microarray Suite version 5.0)
- Many other articles in
Nature Genetics Supplement, 21, 1999.
- Detecting differentially expressed genes
- Chen Y, Dougherty ER and Bittner ML. Ratio-based decisions
and the quantitative analysis of cDNA microarray images.
J Biomedical Optics, 2:364-367, 1997.
(Probably the earliest paper on statistical analysis of
array data; use of the Wilcoxon nonparametric test;
proposed Normal-based parametric models)
- Luo L et al. Gene expression profiles of laser-captured
adjacent neuronal subtypes. Naure Medicine, 5:117-122,
1999.
(t-tests used; see Statistical Analyses section
on p.121)
- Tusher VG, Tibshirani R and Chu G. Significance analysis
of microarrays applied to the ionizing radiation response.
PNAS, 98, 5116-5121, 2001.
(Two conditions with replications; SAM:
use permutation-type tests
and FDR to control for multiplicity)
- Lee M-L T, Kuo FC, Whitmore GA and Sklar J. Importance of
replication in microarray gene expression studies:
statistical methods and evidence from repetitive cDNA
hybridizations. PNAS, 97:9834-9839, 2000.
(One condition with replications: A mixture of two normals)
- Newton M etc. On differential variability of expression
ratios: improving statistical inference about gene
expression changes from microarray data. Journal of
Computational Biology, 8:37-52, 2001.
(Online
access at Uof M)
(Parametric Bayesian approach, w/o replications)
- Lin Y, Nadler ST, Attie AD and Yandell BS. Mining for
low-abundance transcripts in microarray data.
PS
(Nonparametric approach, w/o replications)
- Ideker, T., Thorsson, V., Siehel, A.F. and Hood, L.E. (2000). Testing
for differentially-expressed genes by maximum likelihood analysis of
microarray data. {\em Journal of Computational Biology}, {\bf 7}, 805-817.
(A Normal-based linear regression approach)
- Kerr, M.K. et al. Statistical analysis of a gene expression
microarray experiemnt with replication.
- Efron B, Tibshirani R, Goss V and Vhu G. Microarrays and
their use in a comparative experiment. 2000.
PS
(Empirical Bayesian and Frequentist approaches, with
replications)
- Thomas, J.G., Olson, J.M., Tapscott, S.J. and Zhao, L.P. (2001). An
efficient and robust statistical modeling approach to discover differentially
expressed genes using genomic expression profiles. {\em Genome Research},
{\bf 11}, 1227-1236.
(A regression approach using the robust/sandwich estimator)
- Zhao LP, Prentice R and Breeden L. Statistical modeling of
large microarray data sets to identify stimulus-response
profiles. PNAS, 98:5631-5636.
(Statistical modeling of multi-time expressions using GEE)
- Pan W. A Comparative Review of Statistical Methods for Discovering Differentially
Expressed Genes in Replicated Microarray Experiments.
To appear in Bioinformatics. Also
Research Report 2001-028,
Division of Biostatistics, University of Minnesota, 2001.
PS
(Compared the t-test, the Wilcoxon rank test,
the robust regression of Thomas et al, the EB of Efron et al,
the SAM of Tusher et al, and the mixture model of Pan et al)
- Pan W, Lin J and Le C.
A Mixture Model Approach to Detecting Differentially
Expressed Genes with Microarray Data. research Report 2001-011,
Division of Biostatistics, University of Minnesota, 2001.
PS
(Use of replications to estimate null distribution as
in SAM and EB approaches, but conduct traditional statistical hypothesis
testing)
- Pavlidis P and Noble WS. Analysis of strain and region variation
in gene expression in mouse brain.
Genome Biology,
2001/2/10/research/0042.
(Normal-based two-way ANOVA for two factors with
possibly more than two categories)
- Data preprocessing and normalization
- Dudoit S, Yang YH, Callow MJ and Speed TP. Statistical
methods for identifying differentially expressed genes
in replicated cDNA microarray experiements.
Tech Rept, Stat Dept, UC-Berkeley, 2000.
PS
(Use of loess curve to center;
permutation test using t-statistic; adjustment for multiple
tests)
- Kerr MK, Martin M and Churchill GA. Analysis of variance
for gene expression microarray data.
Journal of Computational Biology, 7:819-837, 2000.
(Online
access at Uof M)
or PDF
(Use of ANOVA model)
- Yang YH, Buckley MJ, Dudoit S and Speed TP. Comparison
of methods for image analysis on cDNA microarray data.
A href="http://www.stat.Berkeley.EDU/users/terry/zarray/Html/papersindex.html">PS
- Wolfinger RD, et al. Assessing gene significance from
cDNA microarray expression data via mixed models.
PDF
J of Computational Biology, 8:625-637.
(Use of Normal-based
linear mixed models to do normalization and detecting
differential expression)
- Rocker DM and Durbin B. A model for measurement error
for gene expression arrays.
PDF
(A parametric model that captures "higher variability for lower
expression levels" for cDNA arrays)
- Kooperberg C et al. Improved background correction for
spotted DNA microarrays. 2000.
- Li C and Wong WH. Model-based analysis of oligonucleotide
arrays: expression index computation and outlier detection.
PNAS, 98:31-36, 2001.
(Use of a multiplicative model to summarize expression levels
for Affy arrays)
- Li C and Wong WH. Model-based analysis of oligonucleotide
arrays: model validation, design issues and standard error
application.
Genome Biology,
2001/2/8/research/0032.
(Further development...)
- Clustering: hierachical, K-means, SOM and model-based clustering.
- Eisen M, Spellman P, Brown P and Botstein D. Cluster
analysis and display of genome-wide expression patterns.
PNAS, 95:14863-14868, 1998.
(hierachical clustering)
- Tavazoie et al. Systematic determination of genetic
network architecture. Nature Genetics, 22:281-285,
1999.
(K-means clustering)
- Tamayo et al. Interpreting patterns of gene expression
with self-organizing maps: methods and application to
hematopoietic differntiation.
PNAS, 96:2907-2912, 1999.
(SOM clustering)
- Fraley C and Raftery AE. How many clusters? Which clustering
method? Answers via model-based cluster analysis.
Computer J, 41:578-588, 1998.
PS
(Intro to model-based clustering)
- Yeung et al. Model-Based Clustering and Data Transformations
for Gene Expression Data. TR-396, 2001.
PS
(Application of model-based clustering)
- Kerr MK and Churchill GA. Bootstrapping cluster analysis:
assessing the reliability of conclusions from microarray
experiments. 2000.
PDF
- Ghosh D and Chinnaiyan AM. Mixture modelling of gene expression
data from microarray experiments. Bioinformatics,
18:275-286, 2002.
(Model-based clustering of gene expression patterns)
- Pan W, Lin J and Le C. Model-based cluster analysis of microarray
gene expression data.
Genome Biology, 3(2): research0009.1-0009.8, 2002.
(Model-based clustering of t-statistics to
explore differential gene expression)
- Tibshirani et al. Clustering methods for the analysis of
DNA microarray data. 1999.
PS
Review of some clustering methods)
- Tibshirani R, Walther G and Hastie T. Estimating the number
of clusters in a dataset via the Gap statistic.
JRSS-B, 2001.
PS
- Tibshirani R, Walther G, Botstein D and Brown P.
Cluster validation by prediction strength.
PS
- Hastie T et al. Gene shaving: a new class of clustering
methods for expression arrays. Genome Biology,
2(1): research0003.1-0003.12, 2001.
PS or PDF
- Lazzeroni L and Owen AB. Plaid Models for Gene Expression
Data. 2000.
PS
- Heyer LJ, Kruglyak S and Yooseph S. Exploring expression
data: identification and analysis of coexpressed genes.
Genome Research, 9:11-6-1115, 1999. (Jackknife
correlation as the distance matric)
- Alter O, Brown PO and Botstein D. Singular value
decomposition for genome-wide expression data processing
and modeling. PNAS, 97:10101-10106, 2000. (SVD)
- Holter NS et al. Fundamental patterns underlying gene
expression profiles: simplicity from complexity. PNAS,
97:8409-8414, 2000. (SVD)
- van der Laan MJ and Bryan JF. Gene expression analysis with
the parametric bootstrap.
To appear Biostatistics. 2000.
PS or PDF
- Zhang K and Zhao H. Assessing reliability of gene clusters
from gene expression data. Funct Integr Genomics,
1:156-173, 2000.
- Dimension reduction: principal component analysis
- Raychaudhuri S, Stuart JM and Altman RB. Principal
components analysis to summarize microarray experiments:
application to sporulation time series. PSB00, 5:452-463.
PDF
- Hilsenbeck SG et al. Statistical analysis of array
expression data as applied to the problem of taxmoxifen
resistance. J NCI, 91:453-459.
- Wittes J and Friedman HP. Searching for evidence of
altered gene expression: a comment on statistical
analysis of microarray data. J NCI, 91:400-401.
- Wen X et al. Large-scale temporal gene expression mapping
of central nervous system development. PNAS,
95:334-339, 1998. (see their Fig 3(d))
- Classification: discriminant analysis
- Golub T et al. Molecular classification of cancer: class
discovery and class prediction by gene expression
monitoring. Science, 286:531-537, 1999.
(Proposed a weighted voting algorithm)
- S. Dudoit, J. Fridlyand, and T. P. Speed.
Comparison of Discrimination Methods for the Classification
of Tumors Using Gene Expression Data. June 2000.
PS
- Li W and Yang Y.
How many genes are needed for a discriminant microarray data
analysis? 2001.
PDF
- Brown et al. Knowledge-based analysis of microarray gene
expression data by using support vector machines.
PNAS, 97:262-267, 2000.
(Use of support vector machines)
- West M et al. DNA microarray analysis and regression modeling
for genetic expression profiling. 2000.
link==>Discussion papers==>2000==>00-15==>PS
(Bayesian binary regression)
- Nguyen DV and Rocker DM. Tumor classification by partial
least squares using microarray gene expression data. 2001.
PDF
(Using PLS for dimension reduction, then applying LDA and QLA)
- Radmacher MD, McShane LM and Simon R. A paradigm for class
prediction using gene expression profiles.
Technical Report 001, National Cancer Institute.
(Assessing the "significance" of classoification results)
Other topics:
- Kerr MK and Churchill GA.
Experimental design for gene expression microarrays.
PDF
- Black MA and Doerge RW. Calculation of the minimum number of
replicate spots required for detection of significanct gene
expression fold change in microarray experiments.
(Sample size calculations with parametric models
in detecting differential gene expression)
- Pan, W., Lin, J. and Le, C. (2001b).
How Many Replicates of Arrays Are Required to Detect Gene Expression Changes
in Microarray Experiments? A Mixture Model Approach.
Tech Rept, Division of Biostatistics, U of Minnesota. \\ Available at
{\tt http://www.biostat.umn.edu/cgi-bin/rrs?print+2001}.
- Butte AJ et al. Discovering functional relationships between
RNA expression and chemotherapeutic susceptibility using relevance
networks. PNAS, 97:12182-12186, 2000.