This workshop is designed as an extension to our Introduction to Classification and Regression Trees workshop. Making inference from a single tree analysis may be severely limited by over-fitting to the dataset at hand. Minor changes to the data can greatly change the tree structure and model inference. In addition, the recursive partitioning method used in constructing a tree is a “greedy” algorithm, meaning that the best split at the beginning of the analysis does not always result in the best tree to characterize the entire data set.
A random forest analysis involves creating many trees and combining the results. This process reduces the variance of resulting model predictions through bagging (repeatedly sampling the training data) and decorrelating the trees (randomly subsetting the set of predictors at each split). This workshop will cover the motivation for using random forests, details on how they are assembled, and how to build and assess random forests in both JMP and R.