Day 36-37: Concerned DALEX
by Danielle Navarro, 02 Jun 2018
I was working on a longer post continuing the metaprogramming series, and realised I wasn’t going to get it done this evening. But it’s been a couple of days since I tried out something new, so I resorted to the twitters to find inspiration. As always, the wonderful twitter rstats folks rose to the occasion:
@_ColinFay) June 2, 2018
Ooh. What is this
DALEX package? I am curious. I hope it’s as kind and lovely as the concerned dalek:
TO THE VULNERABLE: CONCERNED DALEK LOVES AND OFFERS COMFORT TO YOU.— Concerned Dalek (@ConcernedDalek) April 11, 2018
TO THE VOICELESS: CONCERNED DALEK LOVES AND LISTENS TO YOU.
TO THE TIRED, THE BELEAGUERED, THE DEPRESSED, THE MOURNING, THE ANXIOUS, THE FRIGHTENED: YOU ARE LOVED AND HAVE A PLACE HERE!
A very brief investigation!
I don’t have a lot of time this evening, but upon checking out the homepage, I discover that DALEX is short for Descriptive mAchine Learning EXplanations. This sounds very lovely, and I think Concerned Dalek would be very concerned to know that DALEX is working hard to help the humans understand what the machine learners are doing. Concerned Dalek would not want us to be worried about these things. All I have time for is a quick run through for two of the examples, but they are nice!
## Welcome to DALEX (version: 0.3.0). ## This is a plain DALEX. Use 'install_dependencies()' to get all required packages.
First we run a linear regression model predicting wine quality as a function of pH, sugar, sulphates and alcohol. Then we imagine seeing a
new bottle of wine with known properties, and want to use this regression model (imaginatively named
mod) to make a prediction about whether this new wine will be any good:
mod <- lm(quality ~ pH + residual.sugar + sulphates + alcohol, data = wine) new <- data.frame( citric.acid = .35, sulphates = .6, alcohol = 12.5, pH = 3.36, residual.sugar = 4.8 ) predict(object = mod, newdata = new)
## 1 ## 6.648226
That’s nice but it isn’t immediately obvious why the model has made that prediction. To help with this, the
explain function in DALEX lets me created an explainer object
ex, and then use that to tell us something about the prediction
ex <- explain(model = mod, data = wine) pre <- prediction_breakdown(explainer = ex, observation = new) print(pre) plot(pre)
## variable contribution variable_name ## 1 (Intercept) 5.87790935 Intercept ## alcohol + alcohol = 12.5 0.70174103 alcohol ## pH + pH = 3.36 0.05780936 pH ## sulphates + sulphates = 0.6 0.04865885 sulphates ## citric.acid + citric.acid = 0.35 0.00000000 citric.acid ## residual.sugar + residual.sugar = 4.8 -0.03789283 residual.sugar ## 11 final_prognosis 6.64822576 ## variable_value cummulative sign position label ## 1 1 5.877909 1 1 lm ## alcohol 12.5 6.579650 1 2 lm ## pH 3.36 6.637460 1 3 lm ## sulphates 0.6 6.686119 1 4 lm ## citric.acid 0.35 6.686119 0 5 lm ## residual.sugar 4.8 6.648226 -1 6 lm ## 11 6.648226 X 7 lm
If I’ve understood this correctly, what the figure is showing me is what happens to the model prediction as I add the predictors in one by one. With just alcohol included, the prediction is just under 6.6. It goes up a little when pH is included, a little more when sulphates are added, and then goes down when residual sugar is considered.
It is kind of nice
For linear regression models, I could probably have done this myself with only slightly more effort, but most machine learning models are harder to interpret and I always feel very wary of trusting models whose behaviour is opaque. It’s helpful, then, that DALEX also lets you do this for things like random forest models:
library("randomForest", quietly = TRUE)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
wine_rf_model4 <- randomForest(quality ~ pH + residual.sugar + sulphates + alcohol, data = wine) wine_rf_explainer4 <- explain(wine_rf_model4, data = wine, label = "model_rf") wine_rf_predict4 <- prediction_breakdown(wine_rf_explainer4, observation = new) print(wine_rf_predict4) plot(wine_rf_predict4)
## variable contribution variable_name ## 1 (Intercept) 5.8809581 Intercept ## alcohol + alcohol = 12.5 0.5920844 alcohol ## sulphates + sulphates = 0.6 0.2078929 sulphates ## residual.sugar + residual.sugar = 4.8 0.2708172 residual.sugar ## pH + pH = 3.36 0.2860189 pH ## citric.acid + citric.acid = 0.35 0.0000000 citric.acid ## 11 final_prognosis 7.2377715 ## variable_value cummulative sign position label ## 1 1 5.880958 1 1 model_rf ## alcohol 12.5 6.473043 1 2 model_rf ## sulphates 0.6 6.680935 1 3 model_rf ## residual.sugar 4.8 6.951753 1 4 model_rf ## pH 3.36 7.237772 1 5 model_rf ## citric.acid 0.35 7.237772 0 6 model_rf ## 11 7.237772 X 7 model_rf
Yay. I like this. At some stage I’d like to have the time to look into this properly, but for now that will have to do.
YOU COMMUNITY MAY NOT BE EASY TO FIND, BUT CONCERNED DALEK KNOWS THERE ARE PEOPLE OUT THERE LIKE YOU, WHO WILL SUPPORT AND LOVE YOU THROUGH YOUR CHALLENGES! SEEK THEM OUT! THEY NEED YOU! AND YOU NEED THEM!— Concerned Dalek (@ConcernedDalek) March 28, 2018
I know, kind dalek. I know. ❤️