SAS MACROS, Continued SPH 5421 notes.024 SAS Macros - SAS/GRAPH SAS/GRAPH is a versatile graphics package. However, its versatility is offset by the fact that it can be difficult to learn and use. For this reason, it is good to be able to construct macros that construct graphs which have the general appearance that you want but which can be customized easily for different variables, titles and scales. As is often the case, a good way to learn SAS/GRAPH is to study examples. Below is a series of problems, followed by a macro/SASGRAPH solution: 1. Make a scatterplot of variable Y versus variable X. Input to the macro will be: the dataset which contains X and Y, and titles for the graph: macro gmac01.sas Solution: The appropriate SASGRAPH procedure is PROC GPLOT. If you want X on the horizontal axis and Y on the vertical axis, the appropriate PLOT statement is: plot y * x. The macro gmac01.sas shows how the problem can be solved with X = WEIGHTKG and Y = S2FEVPOS on the LHS datafile. The graphics output is a postscript file which can be examined using ghostview. 2. Add the regression line and 95% confidence-of-prediction curves to the plot: macro gmac02.sas Note the use of the overlay option after the PLOT statement: this permits multiple dependent variables to be plotted against one independent variable on the same axes. Note also the use of multiple SYMBOL statements in gmac02.sas, preceding the PROC GPLOT. These are used in the order of the graphs in the PLOT statement. Thus, SYMBOL1 refers to the first plot, which in this case is &y versus &x. The plotting symbol is 'o'. This produces the scatterplot of &y versus &x. SYMBOL2 refers to the second plot, which is the predicted regression line of &y on &x. Note the option l = 1 for SYMBOL2: this refers to line type: 1 = solid line. In SYMBOL3 and SYMBOL4, for the lower and upper 95% predicted values, l = 2 (dotted line). Also note in this example that the upper and lower 95% predicted curves are put on the output dataset from PROC REG in the OUTPUT statement. 3. Add labels to the x- and y-axes of the graph: macro gmac03.sas In gmac03.sas, labels for the x-axis and y-axis have been added using the AXIS1 and AXIS2 statements. Note the options used after the PLOT statement: / overlay haxis = axis1 vaxis = axis2 ; This means that the AXIS1 statement refers to the horizontal axis (haxis) and AXIS2 refers to the vertical axis (vaxis). AXIS1 specifies that the label be printed in black (c = black), the font is swissb (f = swissb), the font size is 2 (h = 2), and the label content is &xlabel. AXIS2 is similar except it includes the statement 'a = 90'. This means that the label will be rotated through an angle of 90 degrees before it is printed. 4. Add the equation of the regression line to the graph: macro gmac04.sas. This macro makes use of the ANNOTATE option in PROC GPLOT. Basically, an ANNOTATE dataset specifies text or boxes or other graphic elements, and the location of these elements. This is a complex and rather difficult option. In the 'data equation' datastep: length text $40 textb0 $8 textb1 $8 specifies the lengths of text strings. Here textb0 will be a character variable of length 8 which will contain the value of the intercept term. Similarly textb1 will contain the slope (coefficient of &x). The character variable 'text' contains the entire equation. The statements xsys = '1'; ysys = '1'; size = 2; hsys = '1'; say the following: x and y locations will be interpreted as percents of the graphics area. The statement hsys = '1' specifies that the coordinate system for 'size' is also percent of the graphics area. The statements textb0 = right(put(intercep, 7.3)); textb1 = right(put(&x, 7.3)); put text equalling the value of the intercept coefficient and the coefficient of &x (from the output dataset parest, from PROC REG) into the two specified text variables. The statement text = "&y" || ' = ' || textb0 || ' + ' || textb1 || '* ' || "&x" ; creates the text for the equation, putting together the name of the y-variable, an equals sign, the coefficient b0, a '+' sign, the coefficient b1, a multiplication sign, and the name of the x-variable all in one string. The vertical bars indicate concatenation of character variables. In this example, after everything is assembled, text = 's2fevpos = .987 + 0.022 * weightkg' The statements x = 2; y = 90; tell where the text will be printed: at 2% of the horizontal distance from the left side, and at 90% of the vertical distance from the bottom of the graph. The statements function = 'label'; position = '>'; say that the thing that is created here is a 'label' (text object), and that the position will be to the right of the specified x and y. Finally, the statement output ; puts the information on the dataset equation. In this case, remarkably, the dataset equation has only one observation. The dataset equation is later referenced in the PROC GPLOT statement as follows: proc gplot data = gmac04ot annotate = equation ; ================================================================================== /* Example of a SAS/GRAPH macro for scatterplots. */ FILENAME GRAPH 'gsas.grf' ; LIBNAME loc '' ; OPTIONS LINESIZE = 80 MPRINT ; GOPTIONS RESET = GLOBAL ROTATE = PORTRAIT FTEXT = SWISSB DEVICE = PSCOLOR GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; /* Example of a SAS/GRAPH macro for scatterplots. */ FILENAME GRAPH 'gsas.grf' ; LIBNAME loc v8 '.' ; OPTIONS LINESIZE = 80 MPRINT ; GOPTIONS RESET = GLOBAL ROTATE = PORTRAIT FTEXT = SWISSB DEVICE = PSCOLOR GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; footnote "prog: /home/gnome/john-c/5421/macro6.sas &sysdate &systime"; DATA lhs ; infile '/home/gnome/john-c/5421/lhs.data' ; retain nobs 0 ; INPUT CASENUM AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE BODYMASS F31MSTAT VPCQUIT1 VPCQUIT2 VPCQUIT3 VPCQUIT4 VPCQUIT5 CIGSA0 CIGSA1 CIGSA2 CIGSA3 CIGSA4 CIGSA5 S1MFEV S2FEVPRE A1FEVPRE A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE S2FEVPOS A1FEVPOS A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS WEIGHT0 WEIGHT1 WEIGHT2 WEIGHT3 WEIGHT4 WEIGHT5 ; nobs = nobs + 1 ; if nobs gt 500 then delete ; weightkg = weight0 ; RUN ; *===================================================================== ; run ; %include '/home/walleye/john-c/5421/gmac01.sas' ; %include '/home/walleye/john-c/5421/gmac02.sas' ; %include '/home/walleye/john-c/5421/gmac03.sas' ; %include '/home/walleye/john-c/5421/gmac04.sas' ; %gmac01 (LHS, weightkg, s2fevpos, LHS Data: FEV1 (liters) vs Weight (kg) , Example of Macro gmac01.sas, loc.cat) ; goptions gsfmode = append ; %gmac02 (LHS, weightkg, s2fevpos, LHS Data: FEV1 (liters) vs Weight (kg) , Example of gmac02.sas: regression line + 95% CI included, loc.cat) ; %gmac03 (LHS, weightkg, Weight in Kg, s2fevpos, FEV1 in Liters, LHS Data: FEV1 (liters) vs Weight (kg) , Example of gmac03.sas: axis labels included., loc.cat) ; %gmac04 (LHS, weightkg, Weight in Kg, s2fevpos, FEV1 in Liters, LHS Data: FEV1 (liters) vs Weight (kg) , Example of gmac04.sas: axis labels + equation included., loc.cat) ; filename gsf 'ps.grf' ; /* The following procedure puts all 4 graphs on one page. */ /* The output file is ps.grf. */ proc greplay NOFS IGOUT = loc.cat TC = sashelp.templt ; template = L2R2S ; list IGOUT ; treplay 1:1 2:2 3:3 4:4 ; x rm cat.sas7bcat ; endsas ; *===================================================================== ; As shown, it is possible to put all 4 graphs on the same page using PROC GREPLAY in SAS. This is accomplished as follows: 1. In the main program (macro6.sas): a) Include a library card before the macros are called: libname loc '' ; b) Include a parameter 'loc' in the calls to the macros, as follows: %gmac01 (LHS, s2fevpos, weightkg, LHS Data: FEV1 (liters) vs Weight (kg) , Example of Macro gmac01.sas, loc.cat) ; c) Include the following lines after all the macros are called: *---------------------------------------------------------------; proc greplay NOFS IGOUT = loc.cat TC = sashelp.templt ; template = L2R2S ; list IGOUT ; treplay 1:1 2:2 3:3 4:4 ; x rm cat.sct01 ; *---------------------------------------------------------------; 2. In each of the macros: a) Include a parameter called 'loc' in the macros: %macro gmac01 (dataset, x, y, title1, title2, loc) ; b) In the PROC GPLOT line, include the phrase 'gout = &loc': proc gplot data = gmac010t gout = &loc ; The idea of all this rather mysterious code is the following. The four graphs originally are stored on the output file gsas.grf, in postscript format. The purpose of PROC GREPLAY is to put all the graphs together so that they will be printed on one piece of paper. This means that they must be resized and the way in which they are laid out on the page must be specified. The layout is determined by a file called a 'template'. In this case, the template is L2R2S, which means, print the first two graphs on the left side of the paper, the second beneath the first, and print the next two on the right side of the paper, and leave a space between the graphs (that is the reason for the 'S' in L2R2S. All of the graphs appear on the file gsas.grf. The first 4 pages are the graphs with their original size, and the last page is all 4 graphs on one page. PROC GREPLAY expects the graphs it puts together to be in a library. That is the reason for the "libname loc v8 '.' ;" line. SAS names objects within a library in a somewhat obtuse way. The library has the internal name of loc.cat, and the phrase 'gout = &loc' in the PROC GPLOT lines within the macros cause the graphs to be put in a library with the local name 'loc.cat'. However, the graphs written out to the library are actually stored in a file called 'cat.sas7bcat'. It is not clear why SAS chose this peculiar and confusing naming convention for files in libraries. In any case, the library is removed at the end of the program by the command x rm cat.sas7bcat ; *===================================================================== ; PROBLEM 26 Write a SAS macro which produces a scatterplot of y versus x, and which graphs the curves for the expected value of y as a quadratic and cubic function of x. That is, assume the models Y = b0 + b1*X + b2*X^2 + error, and Y = b0 + b1*X + b2*X^2 + b3*X^3 + error and graph the two predicted curves on the same axis as the scatterplot. The call to the macro should look like: %gqc (dataset, x, xlabel, y, ylabel, title1, title2) ; To illustrate how your macro works, generate random data for which the expected value of Y is X - X^3, with normally distributed errors with standard deviation .5, and X is chosen randomly with a uniform distribution between -2 and +2. (Generate about 100 points) *===================================================================== ; /home/walleye/john-c/5421/notes.024 Last update: November 29, 2010