Last time we looked at model parameters and how to estimate their errors. Today we will take things a step further and look at how to use one of the methods we learned about, bootstrapping, to estimate errors on model predictions. As I previously explained, this is essential when basing business decisions on data. Let’s dive in!

We start off with the COMPASS data we used in the last post. We will fit a model predicting the two-year recidivism given age, number of juvenile misdemeanours, and the number of priors.

from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

features = ['age', 'juv_misd_count', 'priors_count']
target = 'two_year_recid'
X = data[features]
y = data[target]
model = DecisionTreeClassifier(max_depth=5)

Now, we would usually do something like cross-validating to see how well our model does on data it hasn’t seen before.

cross_val_score(model, X, y)
array([0.6950797 , 0.67290367, 0.63270963, 0.6950797 , 0.67822469])

Looking naively at this, we’d say we’re probably doing a good enough job, being correct in about 70% of the cases. Not too bad! But what if we want to know the recidivism probability for an individual? Let’s look as an example at a 30 year old individual, with three juvenile misdemeanors, and no priors.

model = DecisionTreeClassifier(max_depth=5).fit(X, y)
model.predict_proba([[30, 3, 0]])[0][0]

The model gives the individual roughly a 30% chance to not re-offend within two years. The model we chose, a decision tree, will estimate this probability by looking at the leaf node we end up in and compare the number of positive and negative cases. So far so good. But should we base a high-stakes decision on this? What is the error? We can estimate it using the bootstrap method. Let’s generate 1000 bootstrap samples.

models = []
for _ in range(1000):
    sample = data.sample(data.shape[0], replace=True)
    model = DecisionTreeClassifier(max_depth=5)\
       .fit(sample[features], sample[target])
plt.hist([m.predict_proba([[30, 3, 0]])[0][0] for m in models], bins=20)


As we see, the bootstrap gives us a sizable error bar on the probability prediction. What your final decision will be depends on many factors. The potential cost of a false positive or false negative will have to be weighted against each other. As will, among many other factors, how representative you think the data is for your use case. How would you decide?

I hope you’ve enjoyed today’s data adventure and stay tuned for more!