One thing I often miss in discussions and courses of data science and machine learning is the interpretation of models. A lot of emphasis is on predictive power and which model class is the better one, without stressing what a model can tell you about the data.

Often this is not a big issue. If you make a model prioritizing customers in a call center, the worst thing that can happen is someone getting annoyed. Not ideal, but the consequence is not as severe as giving someone a loan by accident which he/she then fails to repay. Sometimes error bars really matter. Let’s take for example the data set I donated earlier. Let’s assume I build a simple linear model like so:

The nice thing about linear models like this is that each parameter has a very well defined meaning. If you fit the above model to your dataset, $\beta_1$, for example is the average increase in speed (in kilometers per hour) for each extra BPM in heart rate. If we fit $\beta$ to the data using e.g. OLS (under the assumption that errors are normally distributed, but let’s not worry about this for now), you will get an estimation for the errors on the $\beta_i$. Doing this using Statsmodels, we e.g. get

Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 10.7468 0.5693 18.8784 0.0000 9.6298 11.8639
Heart Rate (bpm) -0.0078 0.0040 -1.9741 0.0486 -0.0156 -0.0000
Ascend (%) -0.0122 0.0016 -7.6485 0.0000 -0.0154 -0.0091


This tells us that for each extra 10 percent of ascend, I run 0.12 km/h slower. It also tells us that a lower heart rate actually means a higher speed. This doesn’t seem to make a lot of sense, but it just make under the assumptions of the model and given the data. It might just mean that I tend to run faster downhill, just when my heart rate actually goes down. That’s the thing about the interpretation of models, they contain your assumptions and see the world only as far as your data describes it. Looking at the error on the coefficient of the heart rate reveals that actually a effect of zero is compatible with the data. Error bars matter!

But what do we do if we have a model that’s less well understood theoretically or just much more complicated? One of the things you could do in order to get your hands on error bars is using a Markov Chain Monte Carlo (MCMC) simulation.

Markov Chain Monte Carlo

The Metropolis Hastings Algorithm

Metropolis Hastings Algorithm

You will need:

  • A transition probability $g(x’|x)$, with $g(x’|x) = g(x| x’)$, i.e. going from any sample $x$ to another sample $x’$ has the same probability as going back. This can e.g. trivially be satisfied by adding a small random number from an interval $[-\epsilon, \epsilon]$ to a model parameter.
  • A function $f$ that is proportional to $p$, meaning that we don’t even have to have complete knowledge about $p$. This is a big advantage as sometimes
  1. Generate a candidate sample $x’$ picking from $g(x’|x_t)$.
  2. Calculate $\alpha = f(x’) / f(x_t)$,
  3. Draw a random number $0 \leq u \leq 1$.
  4. If $u \leq \alpha$, set $x_{t+1} = x’$, else $x_{t+1} = x_t$.

Simulation History

Parameter Estimates