SAS MACROS, Continued SPH 5421 notes.023 SAS Macros: The Basics, Plus ... The reason SAS macros are useful is that they give you a way of writing your own procedures once rather than over and over again. If you have written a complex section of code that does what you want for variables x1, x2, x3, ..., and xn, it is nice to be able to use it again without rewriting the whole thing for variables y1, y2, y3, ..., yn. The simple, basic idea of SAS macros is to construct SAS code by a process of substituting character strings. The following is an example of a program with a very simple macro that calculates the sum of the logarithms of a set of 5 variables: ================================================================================= options linesize = 80 mprint ; footnote "prog: /home/walleye/john-c/5421/macro4.sas &sysdate &systime"; * ------------------The Macro Defined-------------------------------; %macro logsum(t1, t2, t3, t4, t5, sumlogs) ; &sumlogs = sum(log(&t1), log(&t2), log(&t3), log(&t4), log(&t5)) ; %mend ; * ----------------------End of Macro--------------------------------; data small ; input x1-x5 y1-y5 ; * ------------Calling the macro-------------------------------------; %logsum(x1, x2, x3, x4, x5, xlogsum) ; %logsum(y1, y2, y3, y4, y5, ylogsum) ; output ; cards ; .5 . .8 1.1 1.7 .4 1.4 . 3.4 4.4 ; run ; proc print ; title1 'Print of input and output data from the macro logsum' ; -------------------------------Proc Print Output--------------------------------- The SAS System 1 17:43 Tuesday, March 7, 2000 OBS X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5 XLOGSUM YLOGSUM 1 0.5 . 0.8 1.1 1.7 0.4 1.4 . 3.4 4.4 -0.29035 2.12556 ================================================================================= This example illustrates several features of macros: 1. All macros must start with '%macro macname(variables) ;'. 2. All macros must end with '%mend ;'. 3. Any number of variables can be used. In this example, in the first call of the macro, x1, ..., x5 are the input variables and xlogsum is the output variable. 4. In the macro, the variables t1, ..., t5 are essentially placeholders for the variables that will actually be used in the calculation. Note that in the '%macro' line itself, t1, t2, ..., t5 and sumlog are not preceded by &, but in the body of the macro, they occur as &t1, &t2, ..., &t5, and &sumlog. 5. The macro can be called any number of times in the main program. Here it is called once with input variables x1, ..., x5 and output xlogsum, and then again with input variables y1, ..., y5 and output ylogsum. 6. Note that the first line of the program is options linesize = 80 mprint ; The effect of the 'mprint' option is to print, on the log file, the SAS code that results from calling the macro. This can be very helpful for debugging purposes. It also shows you how the macro parameters are used. The lines containg the code resulting from the macro are labelled 'MPRINT(LOGSUM)' ; ================================================================================== The SAS System 17:43 Tuesday, March 7, 2000 NOTE: Copyright (c) 1989-1996 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software Release 6.12 TS020 Licensed to UNIVERSITY OF MINNESOTA, Site 0001046017. This message is contained in the SAS news file, and is presented upon initialization. Edit the files "news" in the "misc/base" directory to display site-specific news and information in the program log. The command line option "-nonews" will prevent this display. NOTE: AUTOEXEC processing beginning; file is /net/sas612/autoexec.sas. NOTE: SAS initialization used: real time 0.520 seconds cpu time 0.119 seconds NOTE: AUTOEXEC processing completed. 1 options linesize = 80 mprint ; 2 3 %macro logsum(t1, t2, t3, t4, t5, sumlogs) ; 4 5 &sumlogs = sum(log(&t1), log(&t2), log(&t3), 6 log(&t4), log(&t5)) ; 7 8 %mend ; 9 10 data small ; 11 12 input x1-x5 y1-y5 ; 13 14 %logsum(x1, x2, x3, x4, x5, xlogsum) ; MPRINT(LOGSUM): XLOGSUM = SUM(LOG(X1), LOG(X2), LOG(X3), LOG(X4), LOG(X5)) ; 15 %logsum(y1, y2, y3, y4, y5, ylogsum) ; MPRINT(LOGSUM): YLOGSUM = SUM(LOG(Y1), LOG(Y2), LOG(Y3), LOG(Y4), LOG(Y5)) ; 16 output ; 17 18 cards ; NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 14:33 1 at 15:42 NOTE: The data set WORK.SMALL has 1 observations and 12 variables. NOTE: DATA statement used: real time 0.230 seconds cpu time 0.041 seconds 20 ; 21 22 run ; 23 The SAS System 17:43 Tuesday, March 7, 2000 24 proc print ; 25 26 NOTE: The PROCEDURE PRINT printed page 1. NOTE: PROCEDURE PRINT used: real time 0.140 seconds cpu time 0.013 seconds NOTE: The SAS System used: real time 0.990 seconds cpu time 0.190 seconds NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 ================================================================================== It is possible to use macros with a variable number of variables. The basic reason this is possible is that macros are based on substituting strings in SAS statements. Here is an example which exploits features of both macros and proc iml: ================================================================================== /* Example of a program employing a macro with a keyword parameter */ /* */ /* This program also makes use of proc iml inside the macro, */ /* and an output dataset from proc reg. */ /* */ options linesize = 100 mprint ; footnote "prog: /home/walleye/john-c/5421/macro5.sas &sysdate &systime"; /* Beginning of macro */ %macro doreg(dataset, title, yvar, xvar = ) ; proc reg data = &dataset outest = outpar covout ; model &yvar = &xvar / covb ; proc print data = outpar ; /* */ /* Beginning of proc iml inside the macro ... */ /* */ proc iml ; use &dataset ; read all var {&xvar} into xvars ; nobs = nrow(xvars) ; /* */ /* The reason for computing nobs is to be able to calculate the */ /* error degrees of freedom (edf) later on. */ /* */ use outpar ; read all var {intercep &xvar} into xmatx ; xlabelx = {intercep &xvar} ; p = nrow(xmatx) - 1 ; edf = nobs - p ; /* */ /* Initialization of vectors ... */ /* */ xmeans = j(p, 1, 0) ; xserrs = j(p, 1, 0) ; xclow = j(p, 1, 0) ; xchigh = j(p, 1, 0) ; xtstat = j(p, 1, 0) ; tpvalu = j(p, 1, 0) ; /* */ /* The following do loop computes various stats to be printed ... */ /* */ do j = 1 to p ; xmeans[j] = xmatx[1, j] ; xserrs[j] = sqrt(xmatx[j + 1, j]) ; xtstat[j] = xmeans[j] / xserrs[j] ; tpvalu[j] = 2 * (1 - probt(abs(xtstat[j]), edf)) ; t105 = tinv(.975, edf) ; xclow[j] = xmeans[j] - t105 * xserrs[j] ; xchigh[j] = xmeans[j] + t105 * xserrs[j] ; end ; file 'outtable' ; current = today() ; put @1 current worddate.; put ; put ; put @10 "&title" ; put ; put ; put @1 "Variable" $8. @10 "Coeff. Est." $11. @22 "Std. Err." $10. @32 "Lower 95%" $10. @42 "Upper 95%" $10. @52 "t-value" $10. @62 "p-value" $8. ; put @1 "--------" $8. @10 "-----------" $11. @22 "---------" $10. @32 "---------" $10. @42 "---------" $10. @52 "-------" $10. @62 "-------" $8. ; put ; do ivar = 1 to p ; xlab = xlabelx[ivar] ; xcoeff = xmeans[ivar] ; xserr = xserrs[ivar] ; xlow = xclow[ivar] ; xhigh = xchigh[ivar] ; xtval = xtstat[ivar] ; xpval = tpvalu[ivar] ; put @1 xlab $8. @10 xcoeff 8.4 @22 xserr 8.4 @32 xlow 8.4 @42 xhigh 8.4 @52 xtval 6.2 @62 xpval 6.4 ; end ; quit ; /* */ /* The end of the macro ... */ /* */ %mend ; *----------------------------------------------------------------------------; /* */ /* The main program ... */ /* */ DATA lhs ; infile '/home/walleye/john-c/5421/lhs.data' ; INPUT CASENUM AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE BODYMASS F31MSTAT VPCQUIT1 VPCQUIT2 VPCQUIT3 VPCQUIT4 VPCQUIT5 CIGSA0 CIGSA1 CIGSA2 CIGSA3 CIGSA4 CIGSA5 S1MFEV S2FEVPRE A1FEVPRE A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE S2FEVPOS A1FEVPOS A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS WEIGHT0 WEIGHT1 WEIGHT2 WEIGHT3 WEIGHT4 WEIGHT5 ; RUN ; *----------------------------------------------------------------------------; /* */ /* Call the macro ... Note that xvar here equals four variables. */ /* However, this macro will accept any number of variables as */ /* regressors. */ /* */ %doreg (lhs, Regression of Pre-BD FEV1 (Baseline) on Other Variables, s2fevpre, xvar = age gender bodymass cigsa0) ; endsas ; ================================================================================== The SAS System 1 11:31 Friday, March 10, 2000 Model: MODEL1 Dependent Variable: S2FEVPRE Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 4 87.99464 21.99866 157.795 0.0001 Error 495 69.00937 0.13941 C Total 499 157.00401 Root MSE 0.37338 R-square 0.5605 Dep Mean 2.50042 Adj R-sq 0.5569 C.V. 14.93270 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 4.395333 0.19401552 22.655 0.0001 AGE 1 -0.028274 0.00262218 -10.783 0.0001 GENDER 1 -0.855959 0.03680688 -23.255 0.0001 BODYMASS 1 -0.006699 0.00494640 -1.354 0.1762 CIGSA0 1 0.000233 0.00123188 0.189 0.8504 Covariance of Estimates COVB INTERCEP AGE GENDER BODYMASS CIGSA0 INTERCEP 0.0376420234 -0.000362185 -0.002715788 -0.00061151 -0.000086032 AGE -0.000362185 6.8758462E-6 0.0000163609 -3.860528E-7 5.7760704E-7 GENDER -0.002715788 0.0000163609 0.0013547466 0.0000469912 6.76923E-6 BODYMASS -0.00061151 -3.860528E-7 0.0000469912 0.0000244669 1.3356675E-7 CIGSA0 -0.000086032 5.7760704E-7 6.76923E-6 1.3356675E-7 1.5175246E-6 prog: /home/walleye/john-c/5421/macro5.sas 10MAR00 11:31 The SAS System 2 11:31 Friday, March 10, 2000 OBS _MODEL_ _TYPE_ _NAME_ _DEPVAR_ _RMSE_ INTERCEP 1 MODEL1 PARMS S2FEVPRE 0.37338 4.39533 2 MODEL1 COV INTERCEP S2FEVPRE 0.37338 0.03764 3 MODEL1 COV AGE S2FEVPRE 0.37338 -0.00036 4 MODEL1 COV GENDER S2FEVPRE 0.37338 -0.00272 5 MODEL1 COV BODYMASS S2FEVPRE 0.37338 -0.00061 6 MODEL1 COV CIGSA0 S2FEVPRE 0.37338 -0.00009 OBS AGE GENDER BODYMASS CIGSA0 S2FEVPRE 1 -0.028274 -0.85596 -.0066991 0.00023250 -1 2 -0.000362 -0.00272 -.0006115 -.00008603 . 3 0.000007 0.00002 -.0000004 0.00000058 . 4 0.000016 0.00135 0.0000470 0.00000677 . 5 -0.000000 0.00005 0.0000245 0.00000013 . 6 0.000001 0.00001 0.0000001 0.00000152 . prog: /home/walleye/john-c/5421/macro5.sas 10MAR00 11:31 ================================================================================== The following section is the file 'outtable'. ================================================================================== March 10, 2000 Regression of Pre-BD FEV1 (Baseline) on Other Variables Variable Coeff. Est. Std. Err. Lower 95% Upper 95% t-value p-value -------- ----------- --------- --------- --------- ------- ------- INTERCEP 4.3953 0.1940 4.0141 4.7765 22.65 0.0000 AGE -0.0283 0.0026 -0.0334 -0.0231 -10.78 0.0000 GENDER -0.8560 0.0368 -0.9283 -0.7836 -23.26 0.0000 BODYMASS -0.0067 0.0049 -0.0164 0.0030 -1.35 0.1762 CIGSA0 0.0002 0.0012 -0.0022 0.0027 0.19 0.8504 ================================================================================== The above program, macro5.sas, is fairly complex. It is intended as an example of how macros with variable numbers of parameters can be used. It produces an output table which actually does not add much to what you get from PROC REG: the only things it adds are 95% confidence limits for the coefficient estimates. There are several points worth noting about the program: 1. In the first line of the macro, %macro doreg(dataset, title, yvar, xvar = ) ; the parameters dataset, title, and yvar are all positional macro variables, and xvar is what is called a keyword parameter (see pages 7 and 196 of the SAS Macro Language reference manual). What this means is that, in the calling program, one can put any number of variables on the right side of the equals sign. They should be separated by spaces (NOT commas). They must be variables that are defined in the calling program. If none are defined, in this case the macro will do the regression and compute and print the statistics for a model that includes only the intercept term. 2. Note that the PROC REG statement includes the option OUTEST. The output dataset is called OUTPAR. Also the option COVOUT is specified. This causes the covariance matrix of the parameter estimates to be included in the output dataset. The actual contents of the output dataset are shown by PROC PRINT which follows the regression. It is important to understand the contents and structure of this output dataset, because it is used extensively in the PROC IML section. The output dataset includes the variables _MODEL_, _TYPE_, _NAME_, _DEPVAR_, _RMSE_, INTERCEP, and data for the 4 regressors specified in the call to the macro, AGE, GENDER, BODYMASS, and CIGSA0. The first line of the output dataset includes the regression coefficient estimates for these variables. The next 5 lines include estimates of the covariances of these parameter estimates. For example, in the printout, it is indicated that the covariance of the coefficients for the INTERCEP term and for AGE is -.00036. 3. The PROC IML section starts by reading in the variables specified in the 'xvar = ' part of the call to the macro. These are read in from the dataset specified in the macro call. The only reason here to read in these variables is to count the number of observations in the dataset. This is used later to estimate the (denominator) degrees of freedom for the the t-test. 4. The next part of the IML section uses the dataset 'outpar', which is defined earlier as the output dataset from proc reg. The variables from 'outpar' which are used here are the INTERCEP and the variables specified as '&xvar'. In the call to the macro, &xvar is specified as xvar = age gender bodymass cigsa0 The desired contents of 'outpar' are put into the matrix 'xmatx'. In this case 'xmatx' has 6 rows and 5 columns. 5. The number of variables in the regression is computed as p = nrow(xmatx) - 1. Note that 'nrow' is an IML function which computes the number of rows of a given matrix. Here the number of variables in the regression is 5 : one intercept, and the 4 other specified regressors. We could have specified the number of regressors as a parameter in the call to the macro. However it is more elegant to have it computed within the macro. Also it avoids a problem that could occur if it were a separate parameter: it could disagree with the number of variables specified in the 'xvar = ' statement. 6. Various statistics (coefficients, standard errors, lower and upper 95% confidence limits, t-statistics, and p-values associated with the t-statistics) are computed from the contents of 'xmatx'. The vectors which store these statistics are all dimensioned p x 1. 7. The title of the table, a macro variable stored in &title, is printed. 8. The table is printed on a file called 'outtable', using the 'put' command in SAS. First the date and table title are printed, then the column headings (note the specified begin locations and format). Then the content of the table is printed in a 'do' loop. Note the format specifications for each of the numbers printed in the table: format 8.4, for example, means 8 digits with 4 digits after the decimal point. ================================================================================== PROBLEM 23 Write a macro which computes the sum of the means of two variables. It is not assumed that the two variables have the same number of valid observations. The first line of the macro should look like: %macro meansum1(x, xname, y, yname) ; Here x and y are the two input variables; xname is the printed name of the variable x (see below) and yname is the printed name of variable y. The output from the program should look like: Mean value for age = 47.7 Mean value for weight = 143.0 Sum of mean values = 190.7 PROBLEM 24 Write a macro which computes the sum of a variable number of variables. The first line of this macro should look like: %macro meansum2(xvar = ) ; The calling statement for the macro should look like: %meansum2(xvar = age height weight IQ) ; The printed output should look like: Mean for age = 38.5 Mean for height = 65.5 Mean for weight = 144.2 Mean for IQ = 122.4 Sum of means = 370.6 PROBLEM 25 Write a macro and a program which calls the macro to produce a table which looks like the following: Quintile Means for Variables in LHS Dataset Mean 1st Mean 2nd Mean 3rd Mean 4th Mean 5th Variable Quintile Quintile Quintile Quintile Quintile -------- -------- -------- -------- -------- -------- Age 33.4 41.0 45.6 50.9 57.1 BMI 20.4 23.3 25.2 27.2 29.6 . . . . . . . . . . [other vars] . . . . . . . . . . . . . . . S2FEVPOS 2.3 3.1 3.3 3.6 4.0 ------------------------------------------------------------------------------ The first line of this macro should look like the following: %macro quints(dataset, title, xvar = ) ; PROBLEM 25A: Write a macro which produces confidence intervals for correlation coefficients for random samples from a bivariate normal distribution. Input to the macro should be: 1. a dataset, D 2. n = number of observations 3. X and Y, sample values from a bivariate normal distribution on the dataset D 4. A percentile, e.g., 95, for the confidence level Output should be: 1. The sample correlation coefficient 2. The upper and lower confidence limits corresponding to the specified percentile Note that you may want to make use of some stat theory regarding the distribution of the correlation coefficient. PROBLEM 25B Confidence limits for the median: Assume a dataset with N observations of variable X. Let med(X) be the median. This corresponds to the 50th percentile. There is a 95% chance that the true median is between the percentiles corresponding to 100*(.5 - 1.96*sqrt(.5*.5/n)) and 100(.5 + 1.96*sqrt(.5*.5/n)). You can find the observations corresponding to these percentiles by sorting an array which contains the n X-values. This will be a 95% confidence interval for the median. Write a macro which computes this confidence interval. The call to the macro should look like: %medci (dataset, xvar, xmed, xmedlow, xmedhigh) ; Show how your macro works with an array of 250 values chosen from a chi-square distribution with 1 degree of freedom. [Note: this is an asymptotic approximation to the confidence interval. A better choice as indicated in previous notes is to use the bootstrap to estimate a 95% confidence interval.] /home/gnome/john-c/5421/notes.023 Last update: November 22, 2010