SAS MACROS, Continued                                    SPH 5421 notes.024

SAS Macros - SAS/GRAPH


     SAS/GRAPH is a versatile graphics package.  However, its versatility is
offset by the fact that it can be difficult to learn and use.  For this reason,
it is good to be able to construct macros that construct graphs which have the
general appearance that you want but which can be customized easily for
different variables, titles and scales.

     As is often the case, a good way to learn SAS/GRAPH is to study examples.
Below is a series of problems, followed by a macro/SASGRAPH solution:

     1.  Make a scatterplot of variable  Y  versus variable  X.  Input to the
         macro will be: the dataset which contains X and Y, and titles for the
         graph: macro gmac01.sas

         Solution:  The appropriate SASGRAPH procedure is PROC GPLOT.  If you
         want X on the horizontal axis and Y on the vertical axis, the appropriate
         PLOT statement is:  plot y * x.

         The macro gmac01.sas shows how the problem can be solved with X = WEIGHTKG
         and Y = S2FEVPOS on the LHS datafile.  The graphics output is a
         postscript file which can be examined using ghostview.

     2.  Add the regression line and 95% confidence-of-prediction curves
         to the plot:  macro gmac02.sas

         Note the use of the overlay option after the PLOT statement: this
         permits multiple dependent variables to be plotted against one
         independent variable on the same axes.

         Note also the use of multiple SYMBOL statements in gmac02.sas,
         preceding the PROC GPLOT.  These are used in the order of the
         graphs in the PLOT statement.  Thus, SYMBOL1 refers to the first
         plot, which in this case is &y versus &x.  The plotting symbol is
         'o'.  This produces the scatterplot of &y versus &x.

         SYMBOL2 refers to the second plot, which is the predicted regression
         line of &y on &x.  Note the option l = 1 for SYMBOL2: this refers to
         line type: 1 = solid line.  In SYMBOL3 and SYMBOL4, for the lower and
         upper 95% predicted values, l = 2 (dotted line).

         Also note in this example that the upper and lower 95% predicted
         curves are put on the output dataset from PROC REG in the OUTPUT
         statement.

     3.  Add labels to the x- and y-axes of the graph: macro gmac03.sas

         In gmac03.sas, labels for the x-axis and y-axis have been added
         using the AXIS1 and AXIS2 statements.  Note the options used after
         the PLOT statement:

                 / overlay haxis = axis1  vaxis = axis2 ;

         This means that the AXIS1 statement refers to the horizontal axis (haxis)
         and AXIS2 refers to the vertical axis (vaxis).

         AXIS1 specifies that the label be printed in black (c = black),
         the font is swissb (f = swissb), the font size is 2 (h = 2), and
         the label content is &xlabel.

         AXIS2 is similar except it includes the statement 'a = 90'.  This
         means that the label will be rotated through an angle of 90 degrees
         before it is printed.

     4.  Add the equation of the regression line to the graph: macro gmac04.sas.

         This macro makes use of the ANNOTATE option in PROC GPLOT.
         Basically, an ANNOTATE dataset specifies text or boxes or other
         graphic elements, and the location of these elements.  This is a
         complex and rather difficult option.

         In the 'data equation' datastep:

              length text $40 textb0 $8  textb1 $8

         specifies the lengths of text strings.  Here textb0 will be a character
         variable of length 8 which will contain the value of the intercept term.
         Similarly textb1 will contain the slope (coefficient of &x).  The
         character variable 'text' contains the entire equation.

         The statements

              xsys = '1'; ysys = '1'; size = 2; hsys = '1';

         say the following: x and y locations will be interpreted as percents
         of the graphics area.  The statement hsys = '1' specifies that the
         coordinate system for 'size' is also percent of the graphics area.

         The statements

              textb0 = right(put(intercep, 7.3));
              textb1 = right(put(&x,       7.3));

         put text equalling the value of the intercept coefficient and the
         coefficient of &x (from the output dataset parest, from PROC REG)
         into the two specified text variables.

         The statement

              text = "&y" || ' = ' || textb0 || ' + ' || textb1 || '* ' || "&x" ;

         creates the text for the equation, putting together the name of the
         y-variable, an equals sign, the coefficient b0, a '+' sign, the
         coefficient b1, a multiplication sign, and the name of the x-variable
         all in one string.  The vertical bars indicate concatenation of
         character variables.  In this example, after everything is assembled,

              text = 's2fevpos = .987 + 0.022 * weightkg'

         The statements

              x = 2; y = 90;

         tell where the text will be printed: at 2% of the horizontal
         distance from the left side, and at 90% of the vertical distance from
         the bottom of the graph.

         The statements

              function = 'label'; position = '>';

         say that the thing that is created here is a 'label' (text object),
         and that the position will be to the right of the specified x and y.

         Finally, the statement

              output ;

         puts the information on the dataset equation.

         In this case, remarkably, the dataset equation has only one observation.

         The dataset equation is later referenced in the PROC GPLOT statement as
         follows:

              proc gplot data = gmac04ot annotate = equation ;



==================================================================================
/*   Example of a SAS/GRAPH macro for scatterplots.                  */

FILENAME GRAPH 'gsas.grf' ;
LIBNAME  loc '' ;

OPTIONS  LINESIZE = 80 MPRINT ;

GOPTIONS
         RESET = GLOBAL
         ROTATE = PORTRAIT
         FTEXT = SWISSB
         DEVICE = PSCOLOR
         GACCESS = SASGASTD
         GSFNAME = GRAPH
         GSFMODE = REPLACE
         GUNIT = PCT BORDER
         CBACK = WHITE
         HTITLE = 2 HTEXT = 1 ;

*===================================================================== ;        

/*   Example of a SAS/GRAPH macro for scatterplots.                  */

FILENAME GRAPH 'gsas.grf' ;
LIBNAME  loc v8 '.' ;

OPTIONS  LINESIZE = 80 MPRINT ;

GOPTIONS
         RESET = GLOBAL
         ROTATE = PORTRAIT
         FTEXT = SWISSB
         DEVICE = PSCOLOR
         GACCESS = SASGASTD
         GSFNAME = GRAPH
         GSFMODE = REPLACE
         GUNIT = PCT BORDER
         CBACK = WHITE
         HTITLE = 2 HTEXT = 1 ;

*===================================================================== ;        

footnote "prog: /home/gnome/john-c/5421/macro6.sas &sysdate &systime";

 DATA lhs ;
      infile '/home/gnome/john-c/5421/lhs.data' ;
      retain nobs 0 ;

      INPUT CASENUM  AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE
            BODYMASS F31MSTAT
            VPCQUIT1 VPCQUIT2 VPCQUIT3  VPCQUIT4 VPCQUIT5
            CIGSA0   CIGSA1   CIGSA2    CIGSA3   CIGSA4   CIGSA5
            S1MFEV   S2FEVPRE  A1FEVPRE  A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE
                     S2FEVPOS  A1FEVPOS  A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS
                     WEIGHT0   WEIGHT1   WEIGHT2  WEIGHT3  WEIGHT4  WEIGHT5 ;

nobs = nobs + 1 ;

if nobs gt 500 then delete ;

weightkg = weight0 ;

 RUN ;


*===================================================================== ;        

run ;

 %include '/home/walleye/john-c/5421/gmac01.sas' ;
 %include '/home/walleye/john-c/5421/gmac02.sas' ;
 %include '/home/walleye/john-c/5421/gmac03.sas' ;
 %include '/home/walleye/john-c/5421/gmac04.sas' ;

%gmac01 (LHS, weightkg, s2fevpos,
         LHS Data: FEV1 (liters) vs Weight (kg) ,
         Example of Macro gmac01.sas, loc.cat) ;

goptions  gsfmode = append ;

%gmac02 (LHS, weightkg, s2fevpos,
         LHS Data: FEV1 (liters) vs Weight (kg) ,
         Example of gmac02.sas: regression line + 95% CI included,
         loc.cat) ;

%gmac03 (LHS, weightkg, Weight in Kg, s2fevpos, FEV1 in Liters,
         LHS Data: FEV1 (liters) vs Weight (kg) ,
         Example of gmac03.sas: axis labels included., loc.cat) ;

%gmac04 (LHS, weightkg, Weight in Kg, s2fevpos, FEV1 in Liters,
         LHS Data: FEV1 (liters) vs Weight (kg) ,
         Example of gmac04.sas: axis labels + equation included.,
         loc.cat) ;

filename gsf 'ps.grf' ;

/*  The following procedure puts all 4 graphs on one page.           */
/*  The output file is ps.grf.                                       */

proc greplay NOFS IGOUT = loc.cat  TC = sashelp.templt ;
     template = L2R2S ;
     list IGOUT ;
     treplay 1:1 2:2 3:3 4:4 ;

x rm cat.sas7bcat ;

endsas ;

*===================================================================== ;        

     As shown, it is possible to put all 4 graphs on the same page 
using PROC GREPLAY in SAS.  This is accomplished as follows:

     1.  In the main program (macro6.sas):

         a)  Include a library card before the macros are called:

                libname loc '' ;

         b)  Include a parameter 'loc' in the calls to the macros,
             as follows:

             %gmac01 (LHS, s2fevpos, weightkg,
                      LHS Data: FEV1 (liters) vs Weight (kg) ,
                      Example of Macro gmac01.sas, loc.cat) ;


         c)  Include the following lines after all the macros are
             called:

              *---------------------------------------------------------------;
                proc greplay NOFS IGOUT = loc.cat  TC = sashelp.templt ;
                     template = L2R2S ;
                     list IGOUT ;
                     treplay 1:1 2:2 3:3 4:4 ;
                x rm cat.sct01 ;
              *---------------------------------------------------------------;

     2.  In each of the macros:

         a)  Include a parameter called 'loc' in the macros:

             %macro gmac01 (dataset, x, y, title1, title2, loc) ;

         b)  In the PROC GPLOT line,  include the phrase 'gout = &loc':

             proc gplot data = gmac010t gout = &loc ;



     The idea of all this rather mysterious code is the following.  The four graphs
originally are stored on the output file gsas.grf, in postscript format.  The
purpose of PROC GREPLAY is to put all the graphs together so that they will be
printed on one piece of paper.  This means that they must be resized and the way in
which they are laid out on the page must be specified. The layout is determined by a
file called a 'template'.

     In this case, the template is L2R2S, which means, print the first two graphs on
the left side of the paper, the second beneath the first, and print the next two on
the right side of the paper, and leave a space between the graphs (that is the
reason for the 'S' in L2R2S.

     All of the graphs appear on the file gsas.grf.  The first 4 pages are the
graphs with their original size, and the last page is all 4 graphs on one page.

     PROC GREPLAY expects the graphs it puts together to be in a library. That is
the reason for the "libname loc v8 '.' ;" line.  SAS names objects within a library in a
somewhat obtuse way.  The library has the internal name of loc.cat, and the phrase
'gout = &loc' in the PROC GPLOT lines within the macros cause the graphs to be put
in a library with the local name 'loc.cat'.  However, the graphs written out to the
library are actually stored in a file called 'cat.sas7bcat'. It is not clear why SAS
chose this peculiar and confusing naming convention for files in libraries.  In any
case, the library is removed at the end of the program by the command

     x rm cat.sas7bcat ;

*===================================================================== ;        

PROBLEM 26

    Write a SAS macro which produces a scatterplot of y versus x, and
which graphs the curves for the expected value of y as a quadratic and
cubic function of x.  That is, assume the models

           Y = b0 + b1*X + b2*X^2 + error, and

           Y = b0 + b1*X + b2*X^2 + b3*X^3 + error

and graph the two predicted curves on the same axis as the scatterplot.
The call to the macro should look like:

           %gqc (dataset, x, xlabel, y, ylabel, title1, title2) ;

     To illustrate how your macro works, generate random data for which the
expected value of Y is X - X^3, with normally distributed errors with
standard deviation .5, and X is chosen randomly with a uniform distribution
between -2 and +2.  (Generate about 100 points)


*===================================================================== ;        

/home/walleye/john-c/5421/notes.024    Last update: November 29, 2010