Machine Learning models get better and better at various tasks. However, along with these improvements, the complexity of these models also rapidly increases. This negatively affects the comprehensibility of these models. In this thesis, we defend the case for comprehensibility of a specific predictive model, called a Random Forest.
To this end, we show a comprehensive overview of currently available interpretation methods: feature analysis, simpler model derivation and structural visualization. We analyze the effectiveness of the most promising methods, as well as various visualization techniques in order to effectively convey which choices a model makes.
In our endeavors, we found that explanations of a single prediction (instance-level) are under- represented in literature, as opposed to global explanations. We explore the possibilities of local model explanations and develop a local rule extraction technique that is, to the best of our knowledge, a novel approach in obtaining model insights.
We combined various visualization techniques into two dashboards that facilitate a dialogue between users and a given Random Forest model. The first dashboard is centered around features, giving a per-feature explanation of its contribution to the prediction. The second dashboard takes the possible target classes as a starting point, and uses model simplification to present a set of rules that describe the choices the model made for the prediction of those classes.
These dashboards were evaluated through a case study at Achmea (one of the leading providers of insurances in the Netherlands) within the context of insurance fraud detection. We use the standardized System Usability Scale (SUS) survey to quantify the satisfaction of the fraud team at Achmea, and show that these dashboards can aid them with their day-to-day tasks.