SAS PROC NLIN SPH 5421 notes.026 PROC NLIN is a procedure to find least-squares estimates of coefficients for nonlinear models. The underlying model is of the form Yi = f(Xi; B) + ei, where the ei are assumed to be normally distributed, ei ~ N(0, sig^2), B is a vector of unknown coefficients, and Xi is a vector of covariates. In general the function f is nonlinear in the coefficients B. The randomness in the data is all contained in the ei. In general, we will assume that Xi is a k-vector. The number of unknown parameters may not equal k in general; we shall assume that B is a p x 1 vector. Here is an example: [1] Yi = a * exp(-b * ti) + ei In this case, Xi = ti, and B = (a, b)`. The unknowns to be found are a, b, and sig^2. Note that if this problem were changed just slightly, as follows, [2] Yi = a * exp(-b + ti) + ei, it would no longer be a nonlinear problem, because it could be written as [3] Yi = a + c * ui + ei, where c = exp(-b) and ui = exp(ti). It is thus just an ordinary linear regression problem. You would estimate b by first estimating c and then noting that b = -log(c). Hopefully your estimate of c is a positive number, since otherwise b will be undefined. In general if you can transform a nonlinear problem into a linear one, you should do so and not bother with using PROC NLIN. But the original model [1] is not linear. It cannot be converted to be linear without changing the error structure. Here is how PROC NLIN works. Consider the sum of squares SS = sum_i(Yi - f(Xi; B))^2. The objective is to find coefficients B which minimize this sum of squares. This is equivalent to finding maximum likelihood estimates of the coefficients. You proceed as follows: take the first derivative of SS with respect to each of the components of B = (b1, b2, b3, ... , bp)`, and set these derivatives equal to zero: [4] dSS/dbj = -2 * sum_i(Yi - f(Xi; B)) * df/dbj = 0. Note here that j is the index of the j-th parameter, while i is the index of the i-th observation. [4] is in fact a system of p equations. In general these equations are not linear. PROC NLIN can solve this system of equations by the use of several algorithms, most of which are variants of the Newton-Raphson algorithm. You need to specify the function f(X; B) and its first derivatives, and which algorithm you want PROC NLIN to use. The program below illustrates the use of several methods for PROC NLIN. In the older versions of SAS (through Version 6.12), it was necessary to specify the derivatives of the model with respect to the parameters. This is done in some of the versions listed below, for example: ------------------------------------------------------------------------------ proc nlin method = marquardt ; parms a .50 b .50 ; der.a = 1 - exp(-b * (x - 8)) ; der.b = - (x - 8) * (.49 - a) * exp(-b * (x - 8)) ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = MARQUARDT ' ; run ; ------------------------------------------------------------------------------ However, later versions of SAS replaced the requirement that you supply the derivative expressions with a method of automatic analytic computation of the derivatives. This means that you can omit the statments like 'der.a = ... ' . Note that this is used in the 'method = newton' example below. It is also possible to have SAS perform an initial grid search for a good starting point for the parameters. This is also incorporated into the 'metho = newton' example: ------------------------------------------------------------------------------ proc nlin method = newton ; parms a .2 to .6 by .05 b .01 to .5 by .05 ; * der.a = 1 - exp(-b * (x - 8)) ; * der.b = - (x - 8) * (.49 - a) * exp(-b * (x - 8)) ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = NEWTON with Initial Search ' ; run ; ------------------------------------------------------------------------------ Regarding the DUD method: This method uses numerical estimation of the derivatives. As you can see from the printout, this method fails to work even when you incorporate an initial search. It is not recommended. The dataset below is from the textbook, Applied Regression Analysis, by N. Draper and H. Smith. The nonlinear regression problem involves the amount of available chlorine in a product which is used for washing clothes. The amount of available chlorine deteriorates with time. In the following table, X is the time in weeks of storage of the product, and Y is the amount of available chlorine: ------------------------------------------------------------------------------- X = time Y = chlorine -------- ------------ 8 0.49 8 0.49 10 0.48 10 0.47 10 0.48 10 0.47 12 0.46 12 0.46 12 0.45 12 0.43 14 0.45 14 0.43 14 0.43 16 0.44 16 0.43 16 0.43 18 0.46 18 0.45 20 0.42 20 0.42 20 0.43 22 0.41 22 0.41 22 0.40 24 0.42 24 0.40 24 0.40 26 0.41 26 0.40 26 0.41 28 0.41 28 0.40 30 0.40 30 0.40 30 0.38 32 0.41 32 0.40 34 0.40 36 0.41 36 0.38 38 0.40 38 0.40 40 0.39 42 0.39 --------------------------------------------------------------------------------- The assumed model is: Y = a + (.49 - a) * exp(-b * (X - 8)) + e, where e is the error term, assumed N(0, sig2). Coefficients a and b are unknown, and the problem is to estimate them. Note that the coefficient a occurs in two places in the expression on the right. The goal here is to find estimates of a and b and the variance of the error term, sig2. The following is a PROC NLIN program which estimates the coefficients a and b. Note that in the printout, only the results of the procedure with method = marquardt are shown. ================================================================================== /* Program to illustrate the use of PROC NLIN for nonlinear least */ /* squares problems. The data here are from Draper & Smith, */ /* Applied Regression Analysis, Wiley & Sons, 1966 (1st edition).*/ /* Program to illustrate the use of PROC NLIN for nonlinear least */ /* squares problems. The data here are from Draper & Smith, */ /* Applied Regression Analysis, Wiley & Sons, 1966 (1st edition).*/ /* Program to illustrate the use of PROC NLIN for nonlinear least */ /* squares problems. The data here are from Draper & Smith, */ /* Applied Regression Analysis, Wiley & Sons, 1966 (1st edition).*/ options linesize = 80 ; footnote "prog: /home/walleye/john-c/5421/draper.sas &sysdate &systime"; libname sasplus '' ; FILENAME GRAPH 'gsas.grf' ; LIBNAME loc '' ; OPTIONS LINESIZE = 80 MPRINT ; GOPTIONS RESET = GLOBAL ROTATE = PORTRAIT FTEXT = SWISSB DEVICE = PS300 GACCESS = SASGASTD GSFNAME = GRAPH GSFMODE = REPLACE GUNIT = PCT BORDER CBACK = WHITE HTITLE = 2 HTEXT = 1 ; *===================================================================== ; data draper ; infile 'draper.dat' ; input x y ; time = x ; chlorine = y ; run ; data sasplus.drape ; set draper ; time = x ; chlorine = y ; run ; symbol v = 'o' w = 2 h = 1.5 c = black ; proc gplot data = draper ; plot chlorine * time ; title1 h = 2 'Draper data: Chlorine Concentration vs Storage Time' ; run ; proc nlin method = marquardt ; parms a .50 b .50 ; der.a = 1 - exp(-b * (x - 8)) ; der.b = - (x - 8) * (.49 - a) * exp(-b * (x - 8)) ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = MARQUARDT ' ; run ; proc nlin method = gradient maxiter = 1000 ; parms a .50 b .50 ; der.a = 1 - exp(-b * (x - 8)) ; der.b = - (x - 8) * (.49 - a) * exp(-b * (x - 8)) ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = GRADIENT ' ; run ; proc nlin method = gauss ; parms a .50 b .50 ; der.a = 1 - exp(-b * (x - 8)) ; der.b = - (x - 8) * (.49 - a) * exp(-b * (x - 8)) ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = GAUSS ' ; run ; proc nlin method = newton ; parms a .2 to .6 by .05 b .01 to .5 by .05 ; * der.a = 1 - exp(-b * (x - 8)) ; * der.b = - (x - 8) * (.49 - a) * exp(-b * (x - 8)) ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = NEWTON with Initial Search ' ; run ; title ' ... no title ...' ; proc nlin method = dud ; parms a .2 to .6 by .05 b .01 to .5 by .05 ; model y = a + (.49 - a) * exp(-b * (x - 8)) ; title 'PROC NLIN: Data from Draper and Smith: Method = DUD with Initial Search ' ; run ; proc iml ; use draper ; read all var {x} into x ; read all var {y} into y ; npoint = 0 ; a = j(2500, 1, 0) ; b = j(2500, 1, 0) ; obsn = j(2500, 1, 0) ; sumsq = j(2500, 1, 0) ; do i = 1 to 50 ; do j = 1 to 50 ; npoint = npoint + 1 ; obsn[npoint] = npoint ; a[npoint] = .385 + .01 * i / 50 ; b[npoint] = .095 + .01 * j / 50 ; ai = a[npoint] ; bj = b[npoint] ; sumsq[npoint] = 0 ; do k = 1 to 44 ; predy = ai + (.49 - ai) * exp(-bj * (x[k] - 8)) ; obsdy = y[k] ; sumsq[npoint] = sumsq[npoint] + 10000 * (obsdy - predy)**2 ; end ; end ; end ; V = obsn || a || b || sumsq ; varnames = {'npoint' 'a' 'b' 'sumsq'} ; create likesurf from V [colname = varnames] ; append from V ; quit ; proc print data = likesurf ; where npoint le 2500 ; run ; goptions gsfmode = append ; proc gcontour data = likesurf ; plot a * b = sumsq / levels = 50 to 56 by .10 ; title H = 2 f = swissb 'Sum-of-Squares Surface: Draper-Smith Chlorine Data' ; run ; endsas ; PROC NLIN: Data from Draper and Smith: Method = MARQUARDT 1 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Method: Marquardt Iterative Phase Sum of Iter a b Squares 0 0.5000 0.5000 0.2853 1 0.4084 6.2850 0.0383 2 0.4084 4.1490 0.0383 3 0.4084 3.8510 0.0382 4 0.4085 2.2122 0.0377 5 0.4087 1.6061 0.0362 6 0.4105 0.0983 0.0150 7 0.4006 0.1199 0.00588 8 0.3929 0.1072 0.00505 9 0.3904 0.1019 0.00500 10 0.3901 0.1016 0.00500 11 0.3901 0.1016 0.00500 NOTE: Convergence criterion met. Estimation Summary Method Marquardt Iterations 11 Subiterations 14 Average Subiterations 1.272727 R 1.818E-6 PPC(b) 1.549E-6 RPC(b) 0.000067 Object 7.22E-9 Objective 0.005002 Observations Read 44 Observations Used 44 Observations Missing 0 NOTE: An intercept was not specified for this model. Sum of Mean Approx Source DF Squares Square F Value Pr > F Regression 2 7.9820 3.9910 33513.1 <.0001 Residual 42 0.00500 0.000119 Uncorrected Total 44 7.9870 Corrected Total 43 0.0395 PROC NLIN: Data from Draper and Smith: Method = MARQUARDT 2 16:44 Wednesday, December 7, 2011 The NLIN Procedure Approx Approximate 95% Confidence Parameter Estimate Std Error Limits a 0.3901 0.00504 0.3800 0.4003 b 0.1016 0.0134 0.0747 0.1286 Approximate Correlation Matrix a b a 1.0000000 0.8878597 b 0.8878597 1.0000000 PROC NLIN: Data from Draper and Smith: Method = GRADIENT 3 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Method: Gradient Iterative Phase Sum of Iter a b Squares 0 0.5000 0.5000 0.2853 1 0.4005 0.4999 0.0290 2 0.4239 0.4990 0.0213 3 0.4163 0.4988 0.0192 4 0.4170 0.4971 0.0192 5 0.4159 0.4965 0.0192 . . . . . . . . . . . . 520 0.3901 0.1016 0.00500 521 0.3901 0.1016 0.00500 522 0.3901 0.1016 0.00500 523 0.3901 0.1016 0.00500 524 0.3901 0.1016 0.00500 525 0.3901 0.1016 0.00500 526 0.3901 0.1016 0.00500 527 0.3901 0.1016 0.00500 PROC NLIN: Data from Draper and Smith: Method = GRADIENT 14 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Method: Gradient NOTE: Convergence criterion met. Estimation Summary Method Gradient Iterations 527 Subiterations 1037 Average Subiterations 1.967742 R 8.576E-6 PPC(b) 5.396E-6 RPC(b) 2.161E-6 Object 7.25E-12 Objective 0.005002 Observations Read 44 Observations Used 44 Observations Missing 0 NOTE: An intercept was not specified for this model. Sum of Mean Approx Source DF Squares Square F Value Pr > F Regression 2 7.9820 3.9910 33513.1 <.0001 Residual 42 0.00500 0.000119 Uncorrected Total 44 7.9870 Corrected Total 43 0.0395 Approx Approximate 95% Confidence Parameter Estimate Std Error Limits a 0.3901 0.00504 0.3800 0.4003 b 0.1016 0.0134 0.0747 0.1286 Approximate Correlation Matrix a b a 1.0000000 0.8878586 b 0.8878586 1.0000000 PROC NLIN: Data from Draper and Smith: Method = GAUSS 15 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Method: Gauss-Newton Iterative Phase Sum of Iter a b Squares 0 0.5000 0.5000 0.2853 1 0.4084 6.2850 0.0383 2 0.4084 3.1255 0.0382 3 0.4084 3.0969 0.0382 4 0.4084 2.9622 0.0381 5 0.4085 2.4488 0.0379 6 0.4086 1.5364 0.0361 7 0.4092 0.8175 0.0290 8 0.4104 0.1457 0.00877 9 0.3944 0.0948 0.00599 10 0.3903 0.1017 0.00500 11 0.3901 0.1016 0.00500 12 0.3901 0.1016 0.00500 NOTE: Convergence criterion met. Estimation Summary Method Gauss-Newton Iterations 12 Subiterations 15 Average Subiterations 1.25 R 3.608E-7 PPC(b) 3.073E-7 RPC(b) 0.000013 Object 6.34E-10 Objective 0.005002 Observations Read 44 Observations Used 44 Observations Missing 0 NOTE: An intercept was not specified for this model. Sum of Mean Approx Source DF Squares Square F Value Pr > F Regression 2 7.9820 3.9910 33513.1 <.0001 Residual 42 0.00500 0.000119 Uncorrected Total 44 7.9870 Corrected Total 43 0.0395 PROC NLIN: Data from Draper and Smith: Method = GAUSS 16 16:44 Wednesday, December 7, 2011 The NLIN Procedure Approx Approximate 95% Confidence Parameter Estimate Std Error Limits a 0.3901 0.00504 0.3800 0.4003 b 0.1016 0.0134 0.0747 0.1286 Approximate Correlation Matrix a b a 1.0000000 0.8878600 b 0.8878600 1.0000000 PROC NLIN: Data from Draper and Smith: Method = NEWTON with Initial Search 17 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Grid Search Sum of a b Squares 0.2000 0.0100 0.0427 0.2500 0.0100 0.0618 0.3000 0.0100 0.0861 0.3500 0.0100 0.1156 0.4000 0.0100 0.1502 0.4500 0.0100 0.1899 0.5000 0.0100 0.2348 . . . . . . . . . 0.4000 0.4600 0.0277 0.4500 0.4600 0.0622 0.5000 0.4600 0.2851 0.5500 0.4600 0.6963 0.6000 0.4600 1.2960 PROC NLIN: Data from Draper and Smith: Method = NEWTON with Initial Search 19 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Method: Newton Iterative Phase Sum of Iter a b Squares 0 0.4000 0.1100 0.00628 1 0.3925 0.1090 0.00504 2 0.3901 0.1011 0.00500 3 0.3901 0.1016 0.00500 4 0.3901 0.1016 0.00500 NOTE: Convergence criterion met. Estimation Summary Method Newton Iterations 4 R 1.655E-7 PPC(b) 8.322E-8 RPC(b) 0.000239 Object 7.735E-8 Objective 0.005002 Observations Read 44 Observations Used 44 Observations Missing 0 NOTE: An intercept was not specified for this model. Sum of Mean Approx Source DF Squares Square F Value Pr > F Regression 2 7.9820 3.9910 33513.1 <.0001 Residual 42 0.00500 0.000119 Uncorrected Total 44 7.9870 Corrected Total 43 0.0395 Approx Approximate 95% Confidence Parameter Estimate Std Error Limits a 0.3901 0.00509 0.3799 0.4004 b 0.1016 0.0135 0.0744 0.1289 Approximate Correlation Matrix a b a 1.0000000 0.8899899 b 0.8899899 1.0000000 PROC NLIN: Data from Draper and Smith: Method = DUD with Initial Search 20 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y Grid Search Sum of a b Squares 0.2000 0.0100 0.0427 0.2500 0.0100 0.0618 0.3000 0.0100 0.0861 0.3500 0.0100 0.1156 0.4000 0.0100 0.1502 0.4500 0.0100 0.1899 0.5000 0.0100 0.2348 . . . . . . . . . 0.4000 0.4600 0.0277 0.4500 0.4600 0.0622 0.5000 0.4600 0.2851 0.5500 0.4600 0.6963 0.6000 0.4600 1.2960 PROC NLIN: Data from Draper and Smith: Method = DUD with Initial Search 22 16:44 Wednesday, December 7, 2011 The NLIN Procedure Dependent Variable Y DUD Initialization Sum of DUD a b Squares -3 0.4000 0.1100 0.00628 -2 0.4400 0.1100 0.0571 -1 0.4000 0.1210 0.00571 Iterative Phase Sum of Iter a b Squares 0 0.4000 0.1210 0.00571 1 0.3906 0.1010 0.00501 2 0.3902 0.1020 0.00500 3 0.3902 0.1018 0.00500 4 0.3902 0.1017 0.00500 5 0.3901 0.1016 0.00500 6 0.3901 0.1016 0.00500 NOTE: Convergence criterion met. Estimation Summary Method DUD Iterations 6 Object 1.035E-9 Objective 0.005002 Observations Read 44 Observations Used 44 Observations Missing 0 NOTE: An intercept was not specified for this model. Sum of Mean Approx Source DF Squares Square F Value Pr > F Regression 2 7.9820 3.9910 33513.1 <.0001 Residual 42 0.00500 0.000119 Uncorrected Total 44 7.9870 Corrected Total 43 0.0395 PROC NLIN: Data from Draper and Smith: Method = DUD with Initial Search 23 16:44 Wednesday, December 7, 2011 The NLIN Procedure Approx Approximate 95% Confidence Parameter Estimate Std Error Limits a 0.3901 0.00504 0.3800 0.4003 b 0.1016 0.0134 0.0747 0.1286 Approximate Correlation Matrix a b a 1.0000000 0.8878553 b 0.8878553 1.0000000 PROC NLIN: Data from Draper and Smith: Method = DUD with Initial Search 24 16:44 Wednesday, December 7, 2011 Obs npoint a b sumsq 1 1 0.3852 0.0952 51.7570 2 2 0.3852 0.0954 51.8282 3 3 0.3852 0.0956 51.9023 4 4 0.3852 0.0958 51.9794 5 5 0.3852 0.0960 52.0594 . . . . . . . . . . . . . . . 2497 2497 0.395 0.1044 53.5922 2498 2498 0.395 0.1046 53.4921 2499 2499 0.395 0.1048 53.3945 2500 2500 0.395 0.1050 53.2993 ------------------------------------------------------------------------ PROBLEM 28 Given the dataset below of 50 observations, assume the model Yi = a * cos(c + b * ti) + err, where err ~ N(0, sig2). 1. Use PROC NLIN with method = gauss to find estimates of a, b, and c. Also provide an estimate of sig2. You will probably need to graph the data to decide what the starting values for a, b, and c should be. 2. Using a test based on -2 * log likelihood, test the hypothesis that c = .9. Dataset: ------------------------ Obs ti yi 1 0.2 0.20427 2 0.4 -0.07374 3 0.6 -0.02483 4 0.8 -0.16959 5 1.0 -0.43232 6 1.2 -0.53239 7 1.4 -0.97995 8 1.6 -0.66267 9 1.8 -1.12233 10 2.0 -0.94835 11 2.2 -0.78410 12 2.4 -0.79513 13 2.6 -0.54265 14 2.8 -0.59817 15 3.0 -0.71626 16 3.2 -0.50005 17 3.4 -0.11546 18 3.6 -0.09870 19 3.8 -0.29911 20 4.0 0.38076 21 4.2 0.40349 22 4.4 1.07876 23 4.6 0.84918 24 4.8 0.60890 25 5.0 0.66975 26 5.2 0.53559 27 5.4 1.32738 28 5.6 1.01013 29 5.8 0.97087 30 6.0 0.99035 31 6.2 0.72526 32 6.4 0.55154 33 6.6 0.35486 34 6.8 0.36963 35 7.0 0.04066 36 7.2 -0.30633 37 7.4 -0.53084 38 7.6 -0.88813 39 7.8 -0.81896 40 8.0 -0.72137 41 8.2 -0.85303 42 8.4 -0.83439 43 8.6 -1.05811 44 8.8 -1.02931 45 9.0 -0.94405 46 9.2 -0.38597 47 9.4 -0.72289 48 9.6 -0.56342 49 9.8 -0.72862 50 10.0 0.04955 ------------------------------------------------------------------------ SPLUS MINIMIZATION FUNCTIONS Splus (and R) also have procedures which can minimize functions. The simplest of these is nlmin. As with the Simplex method, one need only define the function to be minimized; derivatives need not be specified. An example of nlmin is shown below. Note that nlmin does not produce estimates of derivatives of the function being minimized or estimated standard errors. The Splus documentation for nlmin says: "The optimizer is based on a quasi-Newton method using the double dogleg step with the BFGS update to the Hessian." References are given in the Splus Reference Manual. ------------------------------------------------------------------------ # This Splus program plots chlorine vs. time from the # Draper dataset (generated by SAS program sastos.draper.sas), # and does least-squares regression using the S+ function nlmin. # To run this program in Splus, say: source("draper.s"). # Then to see the results, say: m postscript("draper.ps", horizontal = F, onefile = T) par(mfrow = c(2, 1), oma = c(2, 2, 2, 2)) rm(sasdat) sasdat <- sas.get(library = "/home/gnome/john-c/5421", mem = "draper") time <- sasdat[, "time"] chlorine <- sasdat[, "chlorine"] hist(chlorine, nclass = 6, main = "SPH 5421 Draper data: chlorine.") par(new = F) plot(time, chlorine) b <- c(.5, .5) f1 <- function(v) { w <- exp(-v[2] * (time - 8)) f <- v[1] + (.49 - v[1]) * w ssq <- sum((chlorine - f)^2) ssq } m <- nlmin(f1, b, max.iter = 1000) # rm(time, chlorine, sasdat, f, ssq, b, w) dev.off() ------------------------------------------------------------------------ PROBLEM 28b: Use nlmin in R to find parameter estimates for the data and model specified in Problem 28. /home/walleye/john-c/5421/notes.026 Last update: December 7, 2011