A System of SAS Macros for Producing Statistical
Reports
Greg Grandits, M.S.
Ken Svendsen, M.S.
Division of Biostatistics, University of Minnesota
Monitoring clinical trials
requires periodic generation of statistical reports for Data and Safety
Monitoring Board (DSMB) reviews and other purposes. These reports display comparisons between treatment groups for
specific outcomes and usually consist of summary statistics for each treatment
group and assessment of statistical significance. The summary statistics can be simple descriptive summaries
(counts, means, SD’s, etc.) or summaries from more complicated statistical
analyses (e.g. hazard ratios and confidence intervals from proportional hazards
regression analyses) and often include a combination of both. Existing SAS report procedures are adequate
to produce a report with simple summary statistics but are not able to produce
summaries from more complicated analyses, nor display significance levels. However, with procedure output datasets, the
output delivery system (ODS), and the data-step, SAS has the necessary tools to
produce such a report. This paper
describes a system of macros that make use of these tools that allow the easy
generation of statistical reports. With
use of these macros information that comes from several SAS procedures can be
placed onto a single report page.
These macros have been used extensively by the Division of Biostatistics
to produce DSMB reports for the dozens of ongoing clinical trials and
observational studies.
Introduction
In clinical trials periodic
reports are generated to monitor study progress and to compare treatment for
relevant variables of interest. Often
these reports are presented to Data and Safety Monitoring Board (DSMB) groups. These reports can be varied, but usually
contain summary statistics of variables (counts, means, SD’s, etc.) but often
also contain other statistical information such as p-values, Z-statistics,
hazard ratios, and confidence intervals.
It is important then for the statistical coordinating center responsible
for producing these reports to be able to do so using methods that do not
require transcription of numbers, typing, or editing of computer output. SAS reporting procedures, in general, are inadequate
to meet these needs. However, SAS
capabilities in the data-step, the ability of SAS to output datasets from
procedures, and the advent of the output delivery system (ODS) provides the
tools from which macros can be developed to accomplish this task. This paper describes a system of macros that
produce customized statistical reports that are easy to program and modify, and
give complete flexibility to placement of text and data values onto the report
page.
The user first defines
columns across the report page. Text or
data values (summary statistics) are then moved to these columns and specified
lines using macros MOVE and NMOVE.
Summary statistics are available from calling a macro which runs a
procedure (GLM, PHREG, etc.), outputs statistics to a SAS dataset, and then
compresses the statistics into a one observation dataset. Statistics are placed into array type names
which can be moved to the report page after a SET statement. Example: %nmove(p1-p8, col=7, line=12L8)
moves 8 p-values to the 7th defined column starting on line 12.
The description and use of
the macros for moving text and data values to the report page have been given
in an earlier SUGI paper (1). These are
briefly outlined here and followed by a description of the statistical macros
that make available information from several procedures, which can be placed on
the report page using the earlier developed macros. These system of macros provide a comprehensive package for
generating statistical reports for a variety of research applications.
Report programs are made up of the following statements:
1.
%REPORT is used simply
as %REPORT which indicates the start of a new report.
2.
%COLSET is used as
follows:
%COLSET (column1 size column2 size … )
Example:
%COLSET (25 20 2x 10 2x 10)
This statement sets up 4 columns. The first column is 25 positions long and
the last 3 columns are 10 positions long.
Two spaces are placed between the last 3 columns. This is used to set off text from other
text.
3.
%MOVE is used as
follows:
%MOVE (‘string 1’:’string 2’:…, line=, col=, center=,
under=)
This is best illustrated by examples.
‘Men’:’Women’:’Total’ text
strings to be placed on report
line = 12 21 33 moves
strings to lines 12, 21, and 33
line = 12L3 moves
strings to lines 12, 13, and 14
col = 3 4 8 moves
strings to defined columns 3, 4, and 8
col = 2-3 4-5 6-7 moves strings to columns formed by
combining columns 2-3, 4-5, and 6-7
col = 2.4 moves strings to columns 2 through
4
Example: %MOVE (‘Men’:’Women’:’Total’,
col=1, line=10L3)
4.
%NMOVE is used as
follows:
%NMOVE (var1-var(n), line=, col=, fmt=);
The line and col parameters are identical to those in
%MOVE. The fmt parameter formats the
values.
Example: %NMOVE (m1-m20, col = 2 3, line=12L10,
fmt=6.2)
This statement would move the values of m1 through
m20 to columns 2 and 3, and from lines 12 through 21.
STATISTIC GENERATING MACROS
Below is a listing of some of the statistical generating macros with the
SAS procedure that is called, the statistics that are available, and a brief
description of the macro.
MACRO |
PROCEDURE |
STATISTICS |
DESCRIPTION |
Breakdn |
Summary |
N, mean, SD, etc. |
Summary statistics by level of class
variable |
Freqdis |
Summary |
Counts, percents, cumulative percents |
Distribution of variable by level of
another variable |
Glmp |
Glm |
ANOVA p-values |
Statistics from analysis of variance |
Regp |
Reg |
p-values, betas, t-stat, etc. |
Statistics from linear regression |
Phregp |
Phreg |
Betas, SEs, RRs, CIs, p-values, etc. |
Statistics from cox regression |
Logistp |
Logist |
Betas, SE, ORs, CIs, p-values, etc. |
Statistics from logistic regression |
Chisqp |
Freq |
CMH p-values |
Stratified contingengy table analyses |
Several others are also available, and
others can be added as needed. |
For illustration, two of the
macros, %BREAKDN and %PHREGP, are
described in more detail. Other macros
have similar syntax.
%BREAKDN ( data=,
class=, var=, out=, sfirst= );
This macro reads the
SAS dataset specified in DATA using PROC SUMMARY and computes summary
statistics for each variable specified in VAR by each level of the variable
specified in CLASS. A one observation
dataset containing these statistics is written to the dataset specified in OUT.
The statistics
calculated are N, MEAN, MEDIAN, SDEV, SE, SUM, MIN, and MAX. They are contained in the variables N1-N?,
M1-M?, MED1-MED?, S1-S?, SE1-SE?,
SUM1-SUM?, MIN1-MIN?, and MAX1-MAX?, where ? depends on the number of variables
in VAR and the number of levels in the variables in CLASS.
The parameters
specified are illustrated by examples.
class = sex 2T statistics
are stored for both levels of the variable SEX and the total
class = sex 2 group 6 statistics
are stored for each level of SEX crossed with GROUP.
var = age dbp sbp chol This is the list of variables to
compute statistics .
DATA is the SAS dataset
to be read; OUT is the SAS dataset statistics are written to and contains one
observation, and SFIRST indicates the order in which the statistics are stored.
Example:
%BREAKDN (class = sex
2T, var = age dbp sbp chol, out = table1, sfirst=VAR)
The n's for the 4
variables where sex = 1 are stored in n1-n4.
The n's for the 4
variables where sex = 2 are stored in n5-n8.
The n's for the 4
variables for women and women combined are stored in n9-n12.
The variables are
similarly stored for the other statistics.
%PHREGP (
parameters)
PHREGP runs PROC PHREG
to perform proportional hazards regression and saves results for factors of
interest into SAS datasets.
data = SAS dataset to be read
dlist = Dependent
variable list. An analysis is done for
each variable listed. The variables are
event indicators coded as 1 if event, 0 if censored.
ilist = Independent
variable list. Used for each dependent
variable given in dlist.
tlist = Failure or censoring
time list corresponding to events in dlist.
factor= Independent
variable (s) for which statistics are obtained.
units = Value
regression coefficients are multiplied by before relative risks are computed.
strata = Optional list if strata
variables.
out = SAS dataset(s) to
which statistics are written.
The statistics and the
names of the variables that contain them are as follows:
e1-e? regression
coefficients for factor
se1-se? standard errors of
coefficients
z1-z? z-statistics for
factor
p1-p? p-values for factor
rr1-rr? relative risks for
factor
u1-u? upper 95% CI for RR
l1-l? lower 95% CI for RR
Much effort has been made by
SAS and SAS users to make reporting easier.
Although no individual SAS product or procedure is sufficient to provide
the ease or flexibility in producing statistical reports, with use of the
system of macros described, which takes advantage of the data step, output form
procedures, and the ODS, a very useful reporting system can be developed. The statistical report macros described here
are simple to use and have tremendous flexibility. Programs are easy to write, understand, and modify. Typical report programs are less than one
page. These macros can also be expanded
to include statistics from other procedures through use of ODS.
The key to making the
numeric moves in the data step is getting the statistics into one observation
datasets. Then a single SET statement
is all that is needed to make available the statistics, without worry of the
implied loop in the data step.
Information from several different sources can easily be included on the
report by multiple SET/NMOVE statements.
This gives the flexibility to the system. These macros have proved invaluable to the clinical trials and
other projects monitored by the Division of Biostatistics at the University of
Minnesota, and could be useful for any research organization or pharmaceutical
company producing statistical reports for clinical trials.
Contact Information
Greg Grandits
Division of Biostatistics
2221 University Ave. SE
Suite 200
Minneapolis, MN 55414
Email: grand001@umn.edu
Phone: 612-626-9033
Fax: 612-624-3584:
Example Program
*
Assume dataset mort contains all needed variables ;
%let phlist = xcvd xchd xami xochd xcd xhhd xoh xcv xcvsub
xcvint xcvoth xothcvd ;
%breakdn(data=mort,class=group
2,var = &phlist,out=m);
%phregp(data=mort,
dlist=&phlist, ilist=trt,
tlist = t t t t t t t t t t t t ,
factor= trt, strata=clinic, out=rr) ;
%report;
%colset
(32 9 9 2x 9 9 9 2x 9);
%move('Cause
of Death By Treatment Group in Study X', col=1-0,line=3);
%move('Cause
of Death', col=1, center=n, line=7, u=y);
%move('All
cardiovascular':' CHD':' Acute MI':' Other CHD':
'
Cardiac dysrhythmias':
'
Hypertensive heart disease':'
Other hypertensive':
'
Cerebrovascular':'
Subarachnoid hemorrhage':
'
Intracerebral hemorrhage':
'
Other cerebrovascular':' Other
cardiovascular':
col=1, center=n, line=9L12 );
%move('Events
in Group', col=2-3, line=6);
%move('Hazard',col=4,line=6)
;
%move('A':'B':'Ratio':'95%
LB':'95% UB':'P-value',
col=2.0, line=7, u=y);
set m;
%nmove(sum1-sum24,
col=2 3, line=9L12, fmt=5.0);
set rr
;
%nmove(r1-r12,col=4,line=9L12,fmt=5.2)
;
%nmove(l1-l12,col=5,fmt=5.2)
;
%nmove(u1-u12,col=6,fmt=5.2)
;
%nmove(p1-p12,col=7,fmt=5.3)
;
run ;
Cause of Death By
Treatment Group in Study X
Events in
Group Hazard
Cause of
Death A B Ratio 95% LB 95% UB
P-value
------------------------------- -------
------- ------- -------
------- -------
All cardiovascular 1114 1160 0.96 0.88 1.04
0.302
CHD 767 827 0.93 0.84
1.02 0.121
Acute MI 338
397 0.85 0.73
0.98 0.024
Other CHD 429
430 1.00 0.87
1.14 0.985
Cardiac dysrhythmias 35 35 1.00 0.62
1.60 0.995
Hypertensive heart disease 28 30 0.92 0.55
1.54 0.749
Other hypertensive 13
11 1.16 0.52
2.60 0.712
Cerebrovascular 107
105 1.02 0.78
1.33 0.887
Subarachnoid hemorrhage 8 13 0.62 0.26
1.50 0.287
Intracerebral hemorrhage 22 24 0.92 0.51
1.64 0.772
Other cerebrovascular 77 68 1.13 0.82
1.57 0.457
Other cardiovascular 164 152 1.07 0.86
1.34 0.526