Simulated Annealing
Simulated Annealing Both in the evaluation and step function we use the ellipses (...) to prevent undefined arguments to throw errors: optim simply transfers all arguments that are not its own to both underlying functions, where they can be used or ignored.
We will start with a random subset of five columns. This leads to the following misclassification rate: > initselect <- rep(0, ncol(wines)) > initselect[sample(1:ncol(wines), 5)] <- 1 > (r0 <- lda.loofun(initselect, x = twowines, + grouping = twovintages)) [1] 2.521
This corresponds to 3 misclassifications. How much can we improve using simulated annealing? Let’s find out: > SAoptimWines <- + optim(initselect, + fn = lda.loofun, gr = saStepFun, method = "SANN", + x = twowines, grouping = twovintages)
The result is a simple list with the first two elements containing the best result and the corresponding evaluation value: > SAoptimWines[c("par", "value")] $par [1] 1 0 0 0 1 0 1 1 0 1 1 0 1 $value [1] 0 ----In this case, all misclassifications have been eliminated while still using only a subset of the variables, in this case 7 columns. We could try to push the number of selected variables back by adding a small penalty for every selected variable—once the ideal value of zero misclassifications has been reached the current definition of the evaluation function gives no more opportunities for further improvement. ----By default, 10,000 evaluations are performed in the optim version of SA; this number can be changed using the control argument, where also the initial temper ature and the cooling rate can be adjusted. In real, nontrivial problems, it will probably take some experimentation to find optimal values for these search parameters.
A more ambitious example is to predict the octane number of the gasoline samples with only a subset of the NIR wavelengths. The step function is the same as in the wine example, and the only thing we have to do is to define the evaluation function: > pls.cvfun <- function(selection, xmat, response, ncomp, ...) + if (sum(selection) < ncomp) return(Inf) + pls.obj <- plsr(response ˜ xmat[, selection == 1], + validation = "CV", ncomp = ncomp, ...) + c(RMSEP(pls.obj, estimate = "CV", ncomp = ncomp, + intercept = FALSE)$val) +
In this case, we use the explicit crossvalidation provided by the plsr function. This adds a little bit of variability in the evaluation function since repeated application will lead to different segments—but the savings in time are quite big. We will assume that this variability is smaller than the gains we hope to make. The number of components to take into account can be specified in the extra argument of the evaluation function; the error of the model with the largest number of latent variables is returned. The RMSEP function returns an object of class mvrVal, where the val list element contains the numerical value of interest—this is what we will return. Now, let us try to find an optimal two-component PLS model (fewer variables often lead to less complicated models). We start with a very small model using only eight variables (.02 × 401): > nNIR <- ncol(gasoline$NIR) > initselect <- rep(0, nNIR) > initselect[sample(1:nNIR, 8)] <- 1 > SAoptimNIR1 <- + optim(initselect, + fn = pls.cvfun, gr = saStepFun, method = "SANN", + x = gasoline$NIR, response = gasoline$octane, + ncomp = 2, maxval = nNIR) > pls.cvfun(initselect, gasoline$NIR, gasoline$octane, ncomp = 2) [1] 1.3896 > (nvarSA1 <- sum(SAoptimNIR1$par)) [1] 190 > SAoptimNIR1$value [1] 0.24823
The result is already quite good: compare this, e.g., to the values in the left panel in Fig. 8.4 (where we were looking at the crossvalidation of a model based on the odd samples only) and it is clear that the estimated error with fewer variables and two components is less than half that of the two-component model including all variables Cell Counting Kit-8 mw.
Still, we see that the number of variables included in the model is quite high— perhaps more sparse models can be found that are equally good or even better. In such cases, it pays to abandon the naive approach adopted above and look closer at the problem itself 3xFLAG PEPTIDE stability. We should realize we are optimizing a long parameter vector in this case, with 401 values. Many of these values are zero to start with, and we would like to retain the sparsity of the solution. Our step function, however, is not taking this into account and will suggest many steps leading to more variables. Combine that with a rather high initial temperature parameter, and it is clear that especially in the beginning many bad moves will be accepted. Finally, the evaluation function does not reward sparse solutions explicitly. Let’s see what a lower starting temperature and an adapted evaluation function contribute. First we will define the latter: > pls.cvfun2 <- function(selection, xmat, response, ncomp, + penalty = 0.01, ...) + if (sum(selection) < ncomp) return(Inf) + pls.obj <- plsr(response ˜ xmat[, selection == 1, drop = FALSE], + validation = "CV", ncomp = ncomp, ...) + c(RMSEP(pls.obj, estimate = "CV", ncomp = ncomp, + intercept = FALSE)$val) + + penalty * sum(selection) +
Note that we need at least ncomp variables in order to fit a PLS model. Next, we will define a step function that, using the default settings, will keep the number of variables approximately equal to the starting situation. By playing with the cutoffs in the argument plimits (a random draw from the uniform distribution lower than the first value will lead to eliminating one of the selected variables; anything larger than the second number to the addition of a variable, and anything in between to swapping variables) one can tweak the behavior of the step function: > saStepFun2 <- function(selected, plimits = c(.3, .7), ...) + dowhat <- runif(1) + + ## decrease selection + if (dowhat < plimits[1]) { + if (sum(selected) > 2) { # not too small... + kickone <- sample(which(selected == 1), 1) + selected[kickone] <- 0 + return(selected) + + } + + ## increase selection + if (dowhat > plimits[2]) # not too big... + if (sum(selected) < length(selected)) { + addone <- sample(which(selected == 0), 1) + selected[addone] <- 1 + return(selected) + + } + + ## swap + kickone <- sample(which(selected == 1), 1) + selected[kickone] <- 0 + addone <- sample(which(selected == 0), 1) + selected[addone] <- 1 + selected + }
By changing the values of the plimits argument we can directly influence the number of nonzero entries in the result: e.g., the higher the first number, the bigger the chance that a variable will be removed 3x FLAG molecular weight. Let’s see how that works, combined with a lower starting temperature. The default is 10—we will try a value of 1: > penalty <- 0.01 > SAoptimNIR2 <- + optim(initselect, + fn = pls.cvfun2, gr = saStepFun2, method = "SANN", + x = gasoline$NIR, response = gasoline$octane, + ncomp = 2, maxval = nNIR, + control = list(temp = 1))
This leads to the following results: > (nvarSA2 <- sum(SAoptimNIR2$par)) [1] 6 > SAoptimNIR2$value - penalty*nvarSA2 [1] 0.20047
Now we have a crossvalidation error that is clearly better than what we saw earlier but with only 6 instead of 190 variables in the model.
Comments
Post a Comment