A System of SAS Macros for Producing Statistical Reports

Greg Grandits, M.S.

Ken Svendsen, M.S.

Division of Biostatistics, University of Minnesota

Abstract

Monitoring clinical trials requires periodic generation of statistical reports for Data and Safety Monitoring Board (DSMB) reviews and other purposes. These reports display comparisons between treatment groups for specific outcomes and usually consist of summary statistics for each treatment group and assessment of statistical significance. The summary statistics can be simple descriptive summaries (counts, means, SD’s, etc.) or summaries from more complicated statistical analyses (e.g. hazard ratios and confidence intervals from proportional hazards regression analyses) and often include a combination of both. Existing SAS report procedures are adequate to produce a report with simple summary statistics but are not able to produce summaries from more complicated analyses, nor display significance levels. However, with procedure output datasets, the output delivery system (ODS), and the data-step, SAS has the necessary tools to produce such a report. This paper describes a system of macros that make use of these tools that allow the easy generation of statistical reports. With use of these macros information that comes from several SAS procedures can be placed onto a single report page. These macros have been used extensively by the Division of Biostatistics to produce DSMB reports for the dozens of ongoing clinical trials and observational studies.

Introduction

In clinical trials periodic reports are generated to monitor study progress and to compare treatment for relevant variables of interest. Often these reports are presented to Data and Safety Monitoring Board (DSMB) groups. These reports can be varied, but usually contain summary statistics of variables (counts, means, SD’s, etc.) but often also contain other statistical information such as p-values, Z-statistics, hazard ratios, and confidence intervals. It is important then for the statistical coordinating center responsible for producing these reports to be able to do so using methods that do not require transcription of numbers, typing, or editing of computer output. SAS reporting procedures, in general, are inadequate to meet these needs. However, SAS capabilities in the data-step, the ability of SAS to output datasets from procedures, and the advent of the output delivery system (ODS) provides the tools from which macros can be developed to accomplish this task. This paper describes a system of macros that produce customized statistical reports that are easy to program and modify, and give complete flexibility to placement of text and data values onto the report page.

The user first defines columns across the report page. Text or data values (summary statistics) are then moved to these columns and specified lines using macros MOVE and NMOVE. Summary statistics are available from calling a macro which runs a procedure (GLM, PHREG, etc.), outputs statistics to a SAS dataset, and then compresses the statistics into a one observation dataset. Statistics are placed into array type names which can be moved to the report page after a SET statement. Example: %nmove(p1-p8, col=7, line=12L8) moves 8 p-values to the 7^th defined column starting on line 12.

The description and use of the macros for moving text and data values to the report page have been given in an earlier SUGI paper (1). These are briefly outlined here and followed by a description of the statistical macros that make available information from several procedures, which can be placed on the report page using the earlier developed macros. These system of macros provide a comprehensive package for generating statistical reports for a variety of research applications.

STEPS IN WRITING REPORT

Report programs are made up of the following statements:

%REPORT statement that indicates a new report is starting

%COLSET statement that defines columns and column widths across the report page.

%MOVE statements which move text to the report. Features include centering, underlining, and repeating text.

SET statement(s) that read in statistical information from a SAS dataset. These one observation datasets will have been generated from one of the statistical generating macros.

%NMOVE statements which place the statistical information to the report page.

SYNTAX FOR REPORT MACROS

1. %REPORT is used simply as %REPORT which indicates the start of a new report.

2. %COLSET is used as follows:

%COLSET (column1 size column2 size … )

Example: %COLSET (25 20 2x 10 2x 10)

This statement sets up 4 columns. The first column is 25 positions long and the last 3 columns are 10 positions long. Two spaces are placed between the last 3 columns. This is used to set off text from other text.

3. %MOVE is used as follows:

%MOVE (‘string 1’:’string 2’:…, line=, col=, center=, under=)

This is best illustrated by examples.

‘Men’:’Women’:’Total’ text strings to be placed on report

line = 12 21 33 moves strings to lines 12, 21, and 33

line = 12L3 moves strings to lines 12, 13, and 14

col = 3 4 8 moves strings to defined columns 3, 4, and 8

col = 2-3 4-5 6-7 moves strings to columns formed by combining columns 2-3, 4-5, and 6-7

col = 2.4 moves strings to columns 2 through 4

Example: %MOVE (‘Men’:’Women’:’Total’, col=1, line=10L3)

4. %NMOVE is used as follows:

%NMOVE (var1-var(n), line=, col=, fmt=);

The line and col parameters are identical to those in %MOVE. The fmt parameter formats the values.

Example: %NMOVE (m1-m20, col = 2 3, line=12L10, fmt=6.2)

This statement would move the values of m1 through m20 to columns 2 and 3, and from lines 12 through 21.

STATISTIC GENERATING MACROS

Below is a listing of some of the statistical generating macros with the SAS procedure that is called, the statistics that are available, and a brief description of the macro.

MACRO	PROCEDURE	STATISTICS	DESCRIPTION
Breakdn	Summary	N, mean, SD, etc.	Summary statistics by level of class variable
Freqdis	Summary	Counts, percents, cumulative percents	Distribution of variable by level of another variable
Glmp	Glm	ANOVA p-values	Statistics from analysis of variance
Regp	Reg	p-values, betas, t-stat, etc.	Statistics from linear regression
Phregp	Phreg	Betas, SEs, RRs, CIs, p-values, etc.	Statistics from cox regression
Logistp	Logist	Betas, SE, ORs, CIs, p-values, etc.	Statistics from logistic regression
Chisqp	Freq	CMH p-values	Stratified contingengy table analyses
Several others are also available, and others can be added as needed.

For illustration, two of the macros, %BREAKDN and %PHREGP, are described in more detail. Other macros have similar syntax.

%BREAKDN ( data=, class=, var=, out=, sfirst= );

This macro reads the SAS dataset specified in DATA using PROC SUMMARY and computes summary statistics for each variable specified in VAR by each level of the variable specified in CLASS. A one observation dataset containing these statistics is written to the dataset specified in OUT.

The statistics calculated are N, MEAN, MEDIAN, SDEV, SE, SUM, MIN, and MAX. They are contained in the variables N1-N?, M1-M?, MED1-MED?, S1-S?, SE1-SE?, SUM1-SUM?, MIN1-MIN?, and MAX1-MAX?, where ? depends on the number of variables in VAR and the number of levels in the variables in CLASS.

The parameters specified are illustrated by examples.

Parameter/Value Description

class = sex 2T statistics are stored for both levels of the variable SEX and the total

class = sex 2 group 6 statistics are stored for each level of SEX crossed with GROUP.

var = age dbp sbp chol This is the list of variables to compute statistics .

DATA is the SAS dataset to be read; OUT is the SAS dataset statistics are written to and contains one observation, and SFIRST indicates the order in which the statistics are stored.

Example:

%BREAKDN (class = sex 2T, var = age dbp sbp chol, out = table1, sfirst=VAR)

The n's for the 4 variables where sex = 1 are stored in n1-n4.

The n's for the 4 variables where sex = 2 are stored in n5-n8.

The n's for the 4 variables for women and women combined are stored in n9-n12.

The variables are similarly stored for the other statistics.

%PHREGP ( parameters)

PHREGP runs PROC PHREG to perform proportional hazards regression and saves results for factors of interest into SAS datasets.

Parameter Description

data = SAS dataset to be read

dlist = Dependent variable list. An analysis is done for each variable listed. The variables are event indicators coded as 1 if event, 0 if censored.

ilist = Independent variable list. Used for each dependent variable given in dlist.

tlist = Failure or censoring time list corresponding to events in dlist.

factor= Independent variable (s) for which statistics are obtained.

units = Value regression coefficients are multiplied by before relative risks are computed.

strata = Optional list if strata variables.

out = SAS dataset(s) to which statistics are written.

The statistics and the names of the variables that contain them are as follows:

e1-e? regression coefficients for factor

se1-se? standard errors of coefficients

z1-z? z-statistics for factor

p1-p? p-values for factor

rr1-rr? relative risks for factor

u1-u? upper 95% CI for RR

l1-l? lower 95% CI for RR

Discussion

Much effort has been made by SAS and SAS users to make reporting easier. Although no individual SAS product or procedure is sufficient to provide the ease or flexibility in producing statistical reports, with use of the system of macros described, which takes advantage of the data step, output form procedures, and the ODS, a very useful reporting system can be developed. The statistical report macros described here are simple to use and have tremendous flexibility. Programs are easy to write, understand, and modify. Typical report programs are less than one page. These macros can also be expanded to include statistics from other procedures through use of ODS.

The key to making the numeric moves in the data step is getting the statistics into one observation datasets. Then a single SET statement is all that is needed to make available the statistics, without worry of the implied loop in the data step. Information from several different sources can easily be included on the report by multiple SET/NMOVE statements. This gives the flexibility to the system. These macros have proved invaluable to the clinical trials and other projects monitored by the Division of Biostatistics at the University of Minnesota, and could be useful for any research organization or pharmaceutical company producing statistical reports for clinical trials.

Contact Information

Greg Grandits

Division of Biostatistics

2221 University Ave. SE

Suite 200

Minneapolis, MN 55414

Email: grand001@umn.edu

Phone: 612-626-9033

Fax: 612-624-3584:

Example Program

* Assume dataset mort contains all needed variables ;

%let phlist = xcvd xchd xami xochd xcd xhhd xoh xcv xcvsub xcvint xcvoth xothcvd ;

%breakdn(data=mort,class=group 2,var = &phlist,out=m);

%phregp(data=mort, dlist=&phlist, ilist=trt,

tlist = t t t t t t t t t t t t ,

factor= trt, strata=clinic, out=rr) ;

%report;

%colset (32 9 9 2x 9 9 9 2x 9);

%move('Cause of Death By Treatment Group in Study X', col=1-0,line=3);

%move('Cause of Death', col=1, center=n, line=7, u=y);

%move('All cardiovascular':' CHD':' Acute MI':' Other CHD':

' Cardiac dysrhythmias':

' Hypertensive heart disease':' Other hypertensive':

' Cerebrovascular':' Subarachnoid hemorrhage':

' Intracerebral hemorrhage':

' Other cerebrovascular':' Other cardiovascular':

col=1, center=n, line=9L12 );

%move('Events in Group', col=2-3, line=6);

%move('Hazard',col=4,line=6) ;

%move('A':'B':'Ratio':'95% LB':'95% UB':'P-value',

col=2.0, line=7, u=y);

set m;

%nmove(sum1-sum24, col=2 3, line=9L12, fmt=5.0);

set rr ;

%nmove(r1-r12,col=4,line=9L12,fmt=5.2) ;

%nmove(l1-l12,col=5,fmt=5.2) ;

%nmove(u1-u12,col=6,fmt=5.2) ;

%nmove(p1-p12,col=7,fmt=5.3) ;