Missing Data

Missing data are very common in all types of data sets, but most statistical procedures assume that all data are observed. The result is that most standard statistical software packages automatically drop from the analysis all observations with any missing data. This approach can lead to very low sample sizes and biased results. This two-part workshop will discuss the mechanisms that cause missing data, advantages and disadvantages of commonly-used approaches to dealing with missing data, some new and better approaches, and how to implement them with available statistical software.

The workshop is intended for participants who have the equivalent of two semesters of statistics and some previous experience doing data analysis. It is appropriate for faculty, research staff, and graduate students.

Topics:

Missing data mechanisms and why they are important
An overview of imputation techniques
Multiple Imputation
Maximum Likelihood and the EM Algorithm
Software for implementing these solutions

Participants in the workshop will:

Understand the processes that lead to missing data
Learn approaches for handling missing data and their advantages and disadvantages
Be able to put into practice some of these approaches

In This Section

Files for this workshop are available in Box.

Access Files

A video of this workshop is available at Cornell’s Video on Demand site.

View Video