To answer statistical research questions, data must be arranged correctly to apply the appropriate model. Often data comes to a researcher in the wrong arrangement, and a researcher must reconfigure the format or combine data from several sources before a model can be used. In this workshop we will go over the most common “data wrangling” procedures, including:
- Sub-setting by observations or by variables
- Creating new variables as functions of existing variables
- Aggregation/summarizing observations
- Reshaping (pivoting) from wide to tall or from tall to wide formats
- Concatenating data from several tables
- Merging/joining data from several tables
The workshop’s emphasis will be conceptual methods to get your data into a useful format. We will also discuss how to perform these basic steps in R, SAS, SPSS, Stata and JMP.