SAS MACROS SPH 5421 notes.022 For this segment of the course, a fairly good reference is : SAS Macro Language, First Edition (1997). SAS Institute, Inc., Cary, NC. Another reference that you may find useful is: A. Carpenter: Carpenter's Complete Guide to the SAS Macro Language (1998), SAS Institute, Inc., Cary NC. Neither of these books tells everything you might want to know, especially with respect to statistical applications. As with most of SAS, the best way to learn the topic is by studying and imitating examples. SAS Macro Variables Most SAS variables are defined within DATA steps or procedures. They are not defined 'globally' for the whole program. Macro variables are character variables which can be defined in a data step and used elsewhere in subsequent parts of the program. Consider the following example: ================================================================================ options linesize = 80 ; footnote "program: /home/walleye/john-c/5421/macro1.sas &sysdate &systime" ; %let author = Enola Malone ; %let dataset = atest ; %let varlist = x y z ; data atest ; input x y z w ; cards ; 1 3 5 10 2 4 7 15 3 5 9 20 4 6 10 25 5 7 11 30 ; run ; proc print data = &dataset ; var &varlist ; title1 "Listing of &dataset for &author" ; proc means n mean stddev min max data = &dataset ; var &varlist ; title1 "Basic descriptive stats of &dataset for &author" ; ======================================================================== Listing of atest for Enola Malone 1 17:14 Sunday, February 27, 2000 OBS X Y Z 1 1 3 5 2 2 4 7 3 3 5 9 4 4 6 10 5 5 7 11 program: /home/walleye/john-c/5421/macro1.sas 27FEB00 17:14 Basic descriptive stats of atest for Enola Malone 2 17:14 Sunday, February 27, 2000 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- X 5 3.0000000 1.5811388 1.0000000 5.0000000 Y 5 5.0000000 1.5811388 3.0000000 7.0000000 Z 5 8.4000000 2.4083189 5.0000000 11.0000000 ------------------------------------------------------------------- program: /home/walleye/john-c/5421/macro1.sas 27FEB00 17:14 ======================================================================== There are several things to note in this example: 1. Three macro variables are defined: author, dataset, and varlist. These are all character or string variables. 2. The three macro variables are all defined with %let statements. 3. Note that in the definitions, the macro variables are not preceded by & signs. However, they ARE preceded by & signs when they are referenced later in the program. 4. Note that the %let statements occur outside of any data step. 5. Note that the macro variable varlist actually includes 3 variables from the dataset atest: x, y, and z. In fact the actual content of this macro variable is the string 'x y z'. The way macro variables behave is to substitute the string into SAS code, and then execute the SAS code as if it were written that way in the first place. 6. Note that when macro variables are used in a title, the title must be surrounded by double quotes. 7. Note that two other macro functions are used in this program also: in the footnote line, &sysdate and &systime are referenced. These are system functions that cause the system date and time to be displayed. The SAS Macros manual, Chapter 3 (pages 21-32) is a good reference for macro variables. There are a number of SAS system macros in addition to &sysdate and &systime: see pages 22-23 for a list. For example, &sysday tells you the current day of the week. It is unlikely you will want to use many of these, but some are useful. See also an example of use of macro variables on pages 24.3-24.4 of the notes (the program on graphing the corrected approximation to the binomial). Most of the time you will be using macro variables that you have created. In the example below, the program reads an external datafile with the assumption that the length of the datafile is not known in advance. When the end of the datafile is encountered, a macro variable is created which stores the length of the datafile. The next procedure is a proc print which makes use of the macro variable to print either 5% or 20 lines of the file, whichever is larger. ================================================================================== OPTIONS LINESIZE = 80 ; footnote "program: /home/walleye/john-c/macro2.sas &sysdate &systime" ; DATA lhs ; infile '/home/walleye/john-c/5421/lhs.data' EOF = ENDOFILE ; RETAIN OBSCOUNT 0 ; INPUT CASENUM AGE GENDER BASECIGS GROUP RANDDATE DEADDATE DEADCODE BODYMASS F31MSTAT VPCQUIT1 VPCQUIT2 VPCQUIT3 VPCQUIT4 VPCQUIT5 CIGSA0 CIGSA1 CIGSA2 CIGSA3 CIGSA4 CIGSA5 S1MFEV S2FEVPRE A1FEVPRE A2FEVPRE A3FEVPRE A4FEVPRE A5FEVPRE S2FEVPOS A1FEVPOS A2FEVPOS A3FEVPOS A4FEVPOS A5FEVPOS WEIGHT0 WEIGHT1 WEIGHT2 WEIGHT3 WEIGHT4 WEIGHT5 ; OBSCOUNT = OBSCOUNT + 1 ; RETURN ; ENDOFILE: MOBS0520 = MAX(.05 * OBSCOUNT, 20) ; CALL SYMPUT('OBSLIM', TRIM(LEFT(MOBS0520))) ; RUN ; *======================================================================; PROC PRINT ; WHERE OBSCOUNT LE &OBSLIM ; VAR OBSCOUNT AGE GENDER BASECIGS DEADCODE BODYMASS ; TITLE1 'USE OF MACRO VARIABLES:' ; TITLE2 'TEST OF A PROGRAM WHICH PRINTS EITHER THE FIRST 20 OBSERVATIONS' ; TITLE3 'ON THE FILE, OR THE FIRST 5% OF THE OBSERVATIONS, WHICHEVER IS LARGER'; TITLE4 "THE LIMIT OF OBSERVATIONS TO PRINT HERE IS: &OBSLIM" ; ENDSAS ; ================================================================================== USE OF MACRO VARIABLES: 1 TEST OF A PROGRAM WHICH PRINTS EITHER THE FIRST 20 OBSERVATIONS ON THE FILE, OR THE FIRST 5% OF THE OBSERVATIONS, WHICHEVER IS LARGER THE LIMIT OF OBSERVATIONS TO PRINT HERE IS: 25 17:17 Monday, February 28, 2000 OBS OBSCOUNT AGE GENDER BASECIGS DEADCODE BODYMASS 1 1 51 0 20 . 26.7 2 2 45 0 60 . 25.3 3 3 44 0 40 . 31.8 4 4 54 0 40 1 21.7 5 5 47 0 35 . 29.0 6 6 55 0 40 . 20.7 7 7 53 0 30 . 28.9 8 8 54 0 40 . 26.5 9 9 46 1 40 . 22.1 10 10 47 0 35 . 30.5 11 11 54 0 30 . 29.9 12 12 59 0 30 . 20.0 13 13 54 0 30 . 25.2 14 14 50 0 40 . 23.9 15 15 54 0 40 . 29.5 16 16 50 0 30 . 28.1 17 17 58 0 35 . 23.4 18 18 53 0 30 . 21.8 19 19 54 0 40 . 22.2 20 20 59 0 30 . 26.7 21 21 50 0 20 . 29.4 22 22 44 0 45 . 25.2 23 23 56 0 20 . 25.1 24 24 56 1 20 . 33.4 25 25 57 1 30 . 33.9 program: /home/walleye/john-c/macro2.sas 28FEB00 17:17 ================================================================================== There are some points you might want to note in this program: 1. When the end of file is encountered, datastep execution is transferred to the label 'ENDOFILE'. The section of code under ENDOFILE is not executed until the end of file is reached, because of the 'RETURN' statement (which sends program execution back up to the top of the datastep for all observations which precede end of file). 2. The variable OBSCOUNT is initialized to 0 in a RETAIN statement. It is then incremented by 1 for each subsequent statement. 3. The SYMPUT subroutine is called to put the appropriate value in the macro variable &OBSLIM. SYMPUT makes it possible to create macro variables which are data dependent and which can be created in a datastep and then used later in the program in other datasteps or in procedures. In this case it is used in the procedure PROC PRINT. SYMPUT is a very useful routine to know about. There is an example using it on page 27 of the SAS Macro Manual, and extensive documentation on pages 226-229. There are some important limitations to using it. One is that a macro variable created using SYMPUT will not be available for use until AFTER completion of the datastep in which it was created. You might wonder why, instead of using SYMPUT, one might not instead write the following: %LET OBSLIM = MOBS0520 ; If you try this, you will find that OBSLIM will be a character variable which equals the string 'MOBS0520'. This is not what you want in the PROC PRINT statement that follows. You want the VALUE of MOBS0520, not the name of the variable. This is a subtle and confusing point about macro variables. A usual use of macro variables, as in the program macro1.sas, is to simply provide a way of substituting in variable names, not the values of the variables. That is why SYMPUT is so useful: it makes it possible to substitute the VALUES of variables in the places where you want them rather than just variable names. Problem 22 below asks that you use computed macro variables in conjunction with the PUT statement. You should look at the SAS Language manual for a description of, and examples of, PUT statements. Here is a short example that uses PUT statements to write text to a file (but without macro variables). Note that the name of the output file is 'outstuff': ======================================================================== options linesize = 80 ; footnote "prog: /home/walleye/john-c/5421/putexamp &sysdate &systime" ; data putexamp ; file 'outstuff' ; put "This is the first line of the file 'outstuff'. " ; put " " ; put " Today is Thursday, March 2." ; put " The sun is shining." ; put " It is about 40 degrees Fahrenheit" ; x = 13 / 3 ; put " " ; put " If you divide 13 by 3, you get: " x ; put " " ; put " This is the end of the output file." ; run ; endsas ; ======================================================================== This is the first line of the file 'outstuff'. Today is Thursday, March 2. The sun is shining. It is about 40 degrees Fahrenheit If you divide 13 by 3, you get: 4.3333333333 This is the end of the output file. ======================================================================== PROBLEM 22 Both of the following problems will require using an output file from PROC MEANS or PROC SUMMARY. 1. Write a program that reads the Lung Health Study data file, computes (using PROC MEANS or PROC SUMMARY) the mean values and standard deviations of AGE, GENDER, BODYMASS, and S2FEV1POS, and stores these statistics as macro variables. Then in a later data step, construct PUT statements using these macro variables which create a paragraph of text that looks like the following: The mean age of LHS participants was 48.6 years (+/- 6.7); 37% were women. The mean Body Mass Index was 25.5 kg/m2 (+/- 2.5), and the mean Screen 2 post-BD FEV1 was 2.88 (+/- 0.80). 2. Write a program using the PUT statement and macro variables which will produce a table based on the LHS data with the following format: Date: 03-06-00 Means and Standard Deviations of Selected Variables, for LHS Participants, by Gender Men Women -------------------- ------------------------ Variable N Mean Std Dev N Mean Std Dev -------------------- --- ------ ------- --- ------ -------- Age, yrs. xxx xx.x xx.x xxx xx.x xx.x Body Mass Index xxx xx.x xx.x xxx xx.x xx.x Cigs/Day at Baseline xxx xx.x xx.x xxx xx.x xx.x Cigs/Day at Year 1 xxx xx.x xx.x xxx xx.x xx.x Cigs/Day at Year 2 xxx xx.x xx.x xxx xx.x xx.x Cigs/Day at Year 3 xxx xx.x xx.x xxx xx.x xx.x Cigs/Day at Year 4 xxx xx.x xx.x xxx xx.x xx.x Cigs/Day at Year 5 xxx xx.x xx.x xxx xx.x xx.x ================================================================================== /home/walleye/john-c/5421/notes.022 Last update: March 14, 2000