SAS MACROS, Continued                                    SPH 5421 notes.023

SAS Macros: The Basics, Plus ...


     The reason SAS macros are useful is that they give you a way of writing
your own procedures once rather than over and over again.  If you have written
a complex section of code that does what you want for variables x1, x2, x3, ...,
and xn, it is nice to be able to use it again without rewriting the whole thing
for variables y1, y2, y3, ..., yn.

     The simple, basic idea of SAS macros is to construct SAS code by a process of
substituting character strings.

     The following is an example of a program with a very simple macro that
calculates the sum of the logarithms of a set of 5 variables:

=================================================================================

options  linesize = 80  mprint ;
footnote "prog: /home/walleye/john-c/5421/macro4.sas &sysdate &systime";

* ------------------The Macro Defined-------------------------------;

%macro logsum(t1, t2, t3, t4, t5, sumlogs) ;

   &sumlogs = sum(log(&t1), log(&t2), log(&t3), log(&t4), log(&t5)) ;

%mend ;
* ----------------------End of Macro--------------------------------;

data small ;

     input  x1-x5 y1-y5 ;

* ------------Calling the macro-------------------------------------;

     %logsum(x1, x2, x3, x4, x5, xlogsum) ;
     %logsum(y1, y2, y3, y4, y5, ylogsum) ;
     output ;

     cards ;
     .5  .   .8  1.1  1.7  .4  1.4   .   3.4  4.4
     ;

run ;

proc print ;
title1 'Print of input and output data from the macro logsum' ;

-------------------------------Proc Print Output---------------------------------

                 The SAS System                                1
                                                  17:43 Tuesday, March 7, 2000

     OBS   X1  X2   X3   X4   X5   Y1   Y2  Y3   Y4   Y5   XLOGSUM  YLOGSUM

      1   0.5   .  0.8  1.1  1.7  0.4  1.4   .  3.4  4.4  -0.29035  2.12556
=================================================================================

     This example illustrates several features of macros:

     1.  All macros must start with '%macro macname(variables) ;'.

     2.  All macros must end with '%mend ;'.

     3.  Any number of variables can be used.  In this example, in the first
         call of the macro, x1, ..., x5 are the input variables and xlogsum is
         the output variable.

     4.  In the macro, the variables t1, ..., t5 are essentially placeholders for
         the variables that will actually be used in the calculation.  Note that
         in the '%macro' line itself, t1, t2, ..., t5  and sumlog
         are not preceded by &, but in the body of the macro, they occur as
         &t1, &t2, ..., &t5, and &sumlog.

     5.  The macro can be called any number of times in the main program.  Here
         it is called once with input variables x1, ..., x5 and output xlogsum, and
         then again with input variables y1, ..., y5 and output ylogsum.

     6.  Note that the first line of the program is

              options  linesize = 80  mprint ;

         The effect of the 'mprint' option is to print, on the log file, the
         SAS code that results from calling the macro.  This can be very helpful
         for debugging purposes.  It also shows you how the macro parameters are
         used.  The lines containg the code resulting from the macro are
         labelled 'MPRINT(LOGSUM)' ;

==================================================================================

      The SAS System                               17:43 Tuesday, March 7, 2000

NOTE: Copyright (c) 1989-1996 by SAS Institute Inc., Cary, NC, USA. 
NOTE: SAS (r) Proprietary Software Release 6.12  TS020
      Licensed to UNIVERSITY OF MINNESOTA, Site 0001046017.




This message is contained in the SAS news file, and is presented upon
initialization.  Edit the files "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.



NOTE: AUTOEXEC processing beginning; file is /net/sas612/autoexec.sas.

NOTE: SAS initialization used:
      real time           0.520 seconds
      cpu time            0.119 seconds
      

NOTE: AUTOEXEC processing completed.

1          options  linesize = 80  mprint ;
2          
3          %macro logsum(t1, t2, t3, t4, t5, sumlogs) ;
4          
5             &sumlogs = sum(log(&t1), log(&t2), log(&t3),
6                             log(&t4), log(&t5)) ;
7          
8          %mend ;
9          
10         data small ;
11         
12              input  x1-x5 y1-y5 ;
13         
14              %logsum(x1, x2, x3, x4, x5, xlogsum) ;
MPRINT(LOGSUM):   XLOGSUM = SUM(LOG(X1), LOG(X2), LOG(X3), LOG(X4), LOG(X5)) ;
15              %logsum(y1, y2, y3, y4, y5, ylogsum) ;
MPRINT(LOGSUM):   YLOGSUM = SUM(LOG(Y1), LOG(Y2), LOG(Y3), LOG(Y4), LOG(Y5)) ;
16              output ;
17         
18              cards ;

NOTE: Missing values were generated as a result of performing an operation on 
      missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 14:33   1 at 15:42   
NOTE: The data set WORK.SMALL has 1 observations and 12 variables.
NOTE: DATA statement used:
      real time           0.230 seconds
      cpu time            0.041 seconds
      

20              ;
21         
22         run ;
23         
                             The SAS System     17:43 Tuesday, March 7, 2000

24         proc print ;
25         
26         
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used:
      real time           0.140 seconds
      cpu time            0.013 seconds
      

NOTE: The SAS System used:
      real time           0.990 seconds
      cpu time            0.190 seconds
      
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
==================================================================================



     It is possible to use macros with a variable number of variables.  The
basic reason this is possible is that macros are based on substituting strings in
SAS statements.  Here is an example which exploits features of both macros
and proc iml:


==================================================================================

/*   Example of a program employing a macro with a keyword parameter */
/*                                                                   */
/*   This program also makes use of proc iml inside the macro,       */
/*   and an output dataset from proc reg.                            */
/*                                                                   */
options  linesize = 100  mprint ;
footnote "prog: /home/walleye/john-c/5421/macro5.sas &sysdate &systime";

/*   Beginning of macro                                              */

%macro doreg(dataset, title, yvar, xvar = ) ;

   proc reg data = &dataset  outest = outpar covout ;
        model &yvar = &xvar / covb ;

   proc print data = outpar ;

/*                                                                   */
/*  Beginning of proc iml inside the macro ...                       */
/*                                                                   */

   proc iml ;

      use &dataset ;
      read all var {&xvar} into xvars ;
      nobs = nrow(xvars) ;
/*                                                                   */
/*  The reason for computing nobs is to be able to calculate the     */
/*  error degrees of freedom (edf) later on.                         */
/*                                                                   */

      use outpar ;

      read all var {intercep &xvar} into xmatx ;

      xlabelx = {intercep &xvar} ;

      p = nrow(xmatx) - 1 ;
      edf = nobs - p ;

/*                                                                   */
/*    Initialization of vectors ...                                  */
/*                                                                   */
      xmeans = j(p, 1, 0) ;
      xserrs = j(p, 1, 0) ;
      xclow  = j(p, 1, 0) ;
      xchigh = j(p, 1, 0) ;
      xtstat = j(p, 1, 0) ;
      tpvalu = j(p, 1, 0) ;

/*                                                                   */
/*    The following do loop computes various stats to be printed ... */
/*                                                                   */

      do j = 1 to p ;

         xmeans[j] = xmatx[1, j] ;
         xserrs[j] = sqrt(xmatx[j + 1, j]) ;
         xtstat[j] = xmeans[j] / xserrs[j] ;
         tpvalu[j] = 2 * (1 - probt(abs(xtstat[j]), edf)) ;
         t105 = tinv(.975, edf) ;

         xclow[j]  = xmeans[j] - t105 * xserrs[j] ;
         xchigh[j] = xmeans[j] + t105 * xserrs[j] ;

      end ;

      file 'outtable' ;

      current = today() ;
      put @1 current worddate.;
      put ; put ;
      put @10 "&title" ;
      put ; put ;

      put @1  "Variable"    $8.
          @10 "Coeff. Est." $11.
          @22 "Std. Err."   $10.
          @32 "Lower 95%"   $10.
          @42 "Upper 95%"   $10.
          @52 "t-value"     $10.
          @62 "p-value"     $8.  ;

      put @1  "--------"    $8.
          @10 "-----------" $11.
          @22 "---------"   $10.
          @32 "---------"   $10.
          @42 "---------"   $10.
          @52 "-------"     $10.
          @62 "-------"     $8.  ;

      put ;

      do ivar = 1 to p ;

             xlab = xlabelx[ivar] ;
             xcoeff = xmeans[ivar] ;
             xserr  = xserrs[ivar] ;
             xlow   = xclow[ivar]  ;
             xhigh  = xchigh[ivar] ;
             xtval  = xtstat[ivar] ;
             xpval  = tpvalu[ivar] ;

         put @1  xlab         $8.
             @10 xcoeff       8.4
             @22 xserr        8.4
             @32 xlow         8.4
             @42 xhigh        8.4
             @52 xtval        6.2
             @62 xpval        6.4 ;

      end ;

    quit ;

/*                                                                   */
/*    The end of the macro ...                                       */
/*                                                                   */

%mend ;

*----------------------------------------------------------------------------;

/*                                                                   */
/*    The main program ...                                           */
/*                                                                   */
 DATA lhs ;
      infile '/home/walleye/john-c/5421/lhs.data' ;


      INPUT CASENUM  AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE
            BODYMASS F31MSTAT
            VPCQUIT1 VPCQUIT2 VPCQUIT3  VPCQUIT4 VPCQUIT5
            CIGSA0   CIGSA1   CIGSA2    CIGSA3   CIGSA4   CIGSA5
            S1MFEV   S2FEVPRE  A1FEVPRE  A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE
                     S2FEVPOS  A1FEVPOS  A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS
                     WEIGHT0   WEIGHT1   WEIGHT2  WEIGHT3  WEIGHT4  WEIGHT5 ;

 RUN ;

*----------------------------------------------------------------------------;


/*                                                                   */
/*    Call the macro ... Note that xvar here equals four variables.  */
/*    However, this macro will accept any number of variables as     */
/*    regressors.                                                    */
/*                                                                   */

%doreg (lhs, Regression of Pre-BD FEV1 (Baseline) on Other Variables,
        s2fevpre, xvar = age gender bodymass cigsa0) ;


endsas ;
==================================================================================

                                 The SAS System                                1
                                                    11:31 Friday, March 10, 2000

Model: MODEL1  
Dependent Variable: S2FEVPRE                                           

                              Analysis of Variance

                                 Sum of         Mean
        Source          DF      Squares       Square      F Value       Prob>F

        Model            4     87.99464     21.99866      157.795       0.0001
        Error          495     69.00937      0.13941
        C Total        499    157.00401

            Root MSE       0.37338     R-square       0.5605
            Dep Mean       2.50042     Adj R-sq       0.5569
            C.V.          14.93270

                              Parameter Estimates

                       Parameter      Standard    T for H0:               
      Variable  DF      Estimate         Error   Parameter=0    Prob > |T|

      INTERCEP   1      4.395333    0.19401552        22.655        0.0001
      AGE        1     -0.028274    0.00262218       -10.783        0.0001
      GENDER     1     -0.855959    0.03680688       -23.255        0.0001
      BODYMASS   1     -0.006699    0.00494640        -1.354        0.1762
      CIGSA0     1      0.000233    0.00123188         0.189        0.8504



                            Covariance of Estimates

 COVB          INTERCEP           AGE        GENDER      BODYMASS        CIGSA0

 INTERCEP  0.0376420234  -0.000362185  -0.002715788   -0.00061151  -0.000086032
 AGE       -0.000362185  6.8758462E-6  0.0000163609  -3.860528E-7  5.7760704E-7
 GENDER    -0.002715788  0.0000163609  0.0013547466  0.0000469912    6.76923E-6
 BODYMASS   -0.00061151  -3.860528E-7  0.0000469912  0.0000244669  1.3356675E-7
 CIGSA0    -0.000086032  5.7760704E-7    6.76923E-6  1.3356675E-7  1.5175246E-6

 
 
 
            prog: /home/walleye/john-c/5421/macro5.sas 10MAR00 11:31


                                 The SAS System                                2
                                                    11:31 Friday, March 10, 2000

    OBS    _MODEL_    _TYPE_    _NAME_      _DEPVAR_     _RMSE_    INTERCEP

     1     MODEL1     PARMS                 S2FEVPRE    0.37338     4.39533
     2     MODEL1     COV       INTERCEP    S2FEVPRE    0.37338     0.03764
     3     MODEL1     COV       AGE         S2FEVPRE    0.37338    -0.00036
     4     MODEL1     COV       GENDER      S2FEVPRE    0.37338    -0.00272
     5     MODEL1     COV       BODYMASS    S2FEVPRE    0.37338    -0.00061
     6     MODEL1     COV       CIGSA0      S2FEVPRE    0.37338    -0.00009

    OBS       AGE        GENDER      BODYMASS      CIGSA0      S2FEVPRE

     1     -0.028274    -0.85596    -.0066991    0.00023250       -1   
     2     -0.000362    -0.00272    -.0006115    -.00008603        .   
     3      0.000007     0.00002    -.0000004    0.00000058        .   
     4      0.000016     0.00135    0.0000470    0.00000677        .   
     5     -0.000000     0.00005    0.0000245    0.00000013        .   
     6      0.000001     0.00001    0.0000001    0.00000152        .   
 
 
            prog: /home/walleye/john-c/5421/macro5.sas 10MAR00 11:31
==================================================================================
     The following section is the file 'outtable'.
==================================================================================

    March 10, 2000


         Regression of Pre-BD FEV1 (Baseline) on Other Variables


Variable Coeff. Est. Std. Err. Lower 95% Upper 95% t-value   p-value 
-------- ----------- --------- --------- --------- -------   ------- 

INTERCEP   4.3953      0.1940    4.0141    4.7765   22.65    0.0000
AGE       -0.0283      0.0026   -0.0334   -0.0231  -10.78    0.0000
GENDER    -0.8560      0.0368   -0.9283   -0.7836  -23.26    0.0000
BODYMASS  -0.0067      0.0049   -0.0164    0.0030   -1.35    0.1762
CIGSA0     0.0002      0.0012   -0.0022    0.0027    0.19    0.8504

==================================================================================

     The above program, macro5.sas, is fairly complex.  It is intended as an
example of how macros with variable numbers of parameters can be used.  It
produces an output table which actually does not add much to what you get from
PROC REG: the only things it adds are 95% confidence limits for the coefficient
estimates.

     There are several points worth noting about the program:

     1.  In the first line of the macro,

         %macro doreg(dataset, title, yvar, xvar = ) ;

         the parameters dataset, title, and yvar are all positional
         macro variables, and xvar is what is called a keyword parameter
         (see pages 7 and 196 of the SAS Macro Language reference manual).

         What this means is that, in the calling program, one can put any
         number of variables on the right side of the equals sign.  They
         should be separated by spaces (NOT commas).  They must be variables
         that are defined in the calling program.  If none are defined, in this
         case the macro will do the regression and compute and print the
         statistics for a model that includes only the intercept term.

     2.  Note that the PROC REG statement includes the option OUTEST.  The output
         dataset is called OUTPAR.  Also the option COVOUT is specified.  This
         causes the covariance matrix of the parameter estimates to be included
         in the output dataset.

         The actual contents of the output dataset are shown by PROC PRINT
         which follows the regression.  It is important to understand the
         contents and structure of this output dataset, because it is used
         extensively in the PROC IML section.

         The output dataset includes the variables _MODEL_, _TYPE_, _NAME_,
         _DEPVAR_, _RMSE_, INTERCEP, and data for the 4 regressors specified in
         the call to the macro, AGE, GENDER, BODYMASS, and CIGSA0.  The first
         line of the output dataset includes the regression coefficient
         estimates for these variables.  The next 5 lines include estimates of
         the covariances of these parameter estimates.  For example, in the
         printout, it is indicated that the covariance of the coefficients
         for the INTERCEP term and for AGE is -.00036.

     3.  The PROC IML section starts by reading in the variables specified in
         the 'xvar = ' part of the call to the macro.  These are read in from
         the dataset specified in the macro call.  The only reason here to
         read in these variables is to count the number of observations in the
         dataset.  This is used later to estimate the (denominator) degrees
         of freedom for the the t-test.

     4.  The next part of the IML section uses the dataset 'outpar', which is
         defined earlier as the output dataset from proc reg.  The variables
         from 'outpar' which are used here are the INTERCEP and the variables
         specified as '&xvar'.  In the call to the macro, &xvar is specified as

                 xvar = age gender bodymass cigsa0

         The desired contents of 'outpar' are put into the matrix 'xmatx'.  In
         this case 'xmatx' has 6 rows and 5 columns.

     5.  The number of variables in the regression is computed as

                 p = nrow(xmatx) - 1.

         Note that 'nrow' is an IML function which computes the number of
         rows of a given matrix.   Here the number of variables in the regression
         is 5 : one intercept, and the 4 other specified regressors.

         We could have specified the number of regressors as a parameter
         in the call to the macro.  However it is more elegant to have it computed
         within the macro.  Also it avoids a problem that could occur if it
         were a separate parameter: it could disagree with the number of
         variables specified in the 'xvar = ' statement.

     6.  Various statistics (coefficients, standard errors, lower and upper
         95% confidence limits, t-statistics, and p-values associated with
         the t-statistics) are computed from the contents of 'xmatx'.
         The vectors which store these statistics are all dimensioned  p x 1.

     7.  The title of the table, a macro variable stored in &title, is printed.

     8.  The table is printed on a file called 'outtable', using the 'put'
         command in SAS.  First the date and table title are printed,
         then the column headings (note the specified begin locations and
         format).  Then the content of the table is printed in a 'do' loop.
         Note the format specifications for each of the numbers printed in
         the table: format 8.4, for example, means 8 digits with 4 digits
         after the decimal point.

==================================================================================

PROBLEM 23

  Write a macro which computes the sum of the means of two variables.  It is
  not assumed that the two variables have the same number of valid observations.

  The first line of the macro should look like:

      %macro meansum1(x, xname, y, yname) ;

  Here x and y are the two input variables; xname is the printed name of the
  variable x (see below) and yname is the printed name of variable y.

  The output from the program should look like:

      Mean value for  age    =  47.7
      Mean value for  weight = 143.0
      Sum of mean values     = 190.7


PROBLEM 24

  Write a macro which computes the sum of a variable number of variables.

  The first line of this macro should look like:

      %macro meansum2(xvar = ) ;

  The calling statement for the macro should look like:

      %meansum2(xvar = age height weight IQ) ;

  The printed output should look like:

      Mean for   age  =  38.5
      Mean for height =  65.5
      Mean for weight = 144.2
      Mean for IQ     = 122.4

      Sum of means    = 370.6


PROBLEM 25

  Write a macro and a program which calls the macro to produce a table
  which looks like the following:

                    Quintile Means for Variables in LHS Dataset


               Mean 1st      Mean 2nd      Mean 3rd     Mean 4th     Mean 5th
  Variable     Quintile      Quintile      Quintile     Quintile     Quintile
  --------     --------      --------      --------     --------     --------

  Age            33.4          41.0          45.6         50.9         57.1
  BMI            20.4          23.3          25.2         27.2         29.6
                  .             .             .            .            .
                  .             .             .            .            .
  [other vars]    .             .             .            .            .
                  .             .             .            .            .
                  .             .             .            .            .

  S2FEVPOS        2.3           3.1           3.3          3.6          4.0

  ------------------------------------------------------------------------------


       The first line of this macro should look like the following:

           %macro quints(dataset, title, xvar = ) ;


PROBLEM 25A:

   Write a macro which produces confidence intervals for
correlation coefficients for random samples from a
bivariate normal distribution.  Input to the macro should be:

   1.  a dataset, D
   2.  n = number of observations
   3.  X and Y, sample values from a bivariate normal
       distribution on the dataset D
   4.  A percentile, e.g., 95, for the confidence level


   Output should be:

   1.  The sample correlation coefficient
   2.  The upper and lower confidence limits corresponding
       to the specified percentile

Note that you may want to make use of some stat theory regarding the
distribution of the correlation coefficient.


PROBLEM 25B

   Confidence limits for the median:

   Assume a dataset with N observations of variable X.  Let med(X) be
the median.  This corresponds to the 50th percentile.  There is a
95% chance that the true median is between the percentiles corresponding
to  100*(.5 - 1.96*sqrt(.5*.5/n)) and 100(.5 + 1.96*sqrt(.5*.5/n)).
You can find the observations corresponding to these percentiles by
sorting an array which contains the n X-values.  This will be a 95%
confidence interval for the median.

   Write a macro which computes this confidence interval.  The call
to the macro should look like:

   %medci (dataset, xvar, xmed, xmedlow, xmedhigh) ;

   Show how your macro works with an array of 250 values chosen from
a chi-square distribution with 1 degree of freedom.


   [Note: this is an asymptotic approximation to the confidence
    interval.  A better choice as indicated in previous notes is
    to use the bootstrap to estimate a 95% confidence interval.]

/home/gnome/john-c/5421/notes.023    Last update: November 22, 2010