Stability Assessment for Trees and other Supervised Statistical Learning Results

Authors
Affiliation

Michel Philipp

UZH Zuerich

Carolin Strobl

Thomas Rusch

Kurt Hornik

Achim Zeileis

Workshop

Psychoco 2017

Classification trees are an example of a statistical learning algorithm that is known to be “unstable”, because small changes in the learning data can lead to substantially different trees. Ensemble methods, like random forests, are more stable but lack the interpretability of a single tree. Therefore, from a user’s perspective, the question is: When is it OK to interpret a single tree and when should it be considered with caution? In a first attempt to address this question, in this talk we illustrate a toolbox of summary statistics and plots for assessing stability, that will be available in the R package stablelearner. Furthermore, we will outline how the ideas can be generalized to a framework for measuring the stability of supervised statistical learning results in general.