Variable Selection Using the Lasso Technique

The Lasso is a modern statistical method that has gained much attention over the last decade as researchers in many fields are able to measure far more variables than ever before. Linear regression suffers in two important ways as the number of predictors becomes large: First, overfitting may occur, meaning that the fitted model does not reliably generalize beyond the particular data observed; second, it becomes difficult to interpret the fitted models. The Lasso addresses both of these issues by identifying a small number of predictors on which a reliable model can be built.

In this workshop, we will

  • introduce the challenges of building models with large numbers of variables
  • give a conceptual explanation for the Lasso
  • demonstrate how the Lasso can be performed on an example dataset
  • explain how to interpret the standard plots and outputs associated with the Lasso
  • explain how cross validation is used within the context of the Lasso
  • discuss related variable selection methods such as stepwise approaches and the Elastic Net

We will assume familiarity with linear regression. The code demonstrations will be performed in R; however, the workshop should be useful to those who do not use R, and we will briefly discuss performing the Lasso in other common statistical packages as well.