Penalized regression is a modern statistical method that has gained much attention over the last decade as researchers in many fields are able to measure far more variables than ever before. Linear regression suffers in two important ways as the number of predictors becomes large. First, overfitting may occur, meaning that the fitted model does not reliably generalize beyond the particular data observed. Second, it becomes difficult to interpret the fitted models. Penalized regression methods address both of these issues by “shrinking” some of the regression parameter estimates towards zero in order to identify a small number of predictors on which a reliable model can be built.
In this workshop, we will
· introduce the challenges of building models with large numbers of variables
· give a conceptual explanation for ridge regression, the Lasso, and elastic net
· demonstrate how these methods can be performed on an example dataset
· explain how to interpret the standard plots and outputs associated with penalized regression
· explain how cross validation is used within the context of the penalized regression
We will assume familiarity with linear regression. The code demonstrations will be performed in R. However, the workshop should be useful to those who do not use R, and we will briefly discuss performing penalized regression in other common statistical packages as well.