Data Wrangling

To answer statistical research questions, data must be arranged correctly to apply the appropriate model. Often data comes to a researcher in the wrong arrangement, and a researcher must reconfigure the format or combine data from several sources before a model can be used. In this workshop we will go over the most common “data wrangling” procedures, including:

  • Sub-setting by observations or by variables
  • Creating new variables as functions of existing variables
  • Aggregation/summarizing observations
  • Reshaping (pivoting) from wide to tall or from tall to wide formats
  • Concatenating data from several tables
  • Merging/joining data from several tables

The workshop’s emphasis will be conceptual methods to get your data into a useful format. We will also discuss how to perform these basic steps in R, SAS, SPSS, Stata and JMP.