# Model Parameters from Markov Chain Monte Carlo

One thing I often miss in discussions and courses of data science and
machine learning is the *interpretation* of models. A lot of emphasis
is on predictive power and which model class is the better one,
without stressing what a model can tell you about the data.

Often this is not a big issue. If you make a model prioritizing customers in a call center, the worst thing that can happen is someone getting annoyed. Not ideal, but the consequence is not as severe as giving someone a loan by accident which he/she then fails to repay. Sometimes error bars really matter. Let’s take for example the data set I donated earlier. Let’s assume I build a simple linear model like so:

The nice thing about linear models like this is that each parameter has a very well defined meaning. If you fit the above model to your dataset, $\beta_1$, for example is the average increase in speed (in kilometers per hour) for each extra BPM in heart rate. If we fit $\beta$ to the data using e.g. OLS (under the assumption that errors are normally distributed, but let’s not worry about this for now), you will get an estimation for the errors on the $\beta_i$. Doing this using Statsmodels, we e.g. get

Coef. | Std.Err. | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|

Intercept | 10.7468 | 0.5693 | 18.8784 | 0.0000 | 9.6298 | 11.8639 |

Heart Rate (bpm) | -0.0078 | 0.0040 | -1.9741 | 0.0486 | -0.0156 | -0.0000 |

Ascend (%) | -0.0122 | 0.0016 | -7.6485 | 0.0000 | -0.0154 | -0.0091 |

This tells us that for each extra 10 percent of ascend, I run 0.12
km/h slower. It also tells us that a *lower* heart rate actually means a
*higher* speed. This doesn’t seem to make a lot of sense, but it just
make under the assumptions of the model and given the data. It might
just mean that I tend to run faster downhill, just when my heart rate
actually goes down. That’s the thing about the interpretation of
models, they contain your assumptions and see the world only as far as
your data describes it. Looking at the error on the coefficient of the
heart rate reveals that actually a effect of zero is compatible with
the data. Error bars matter!

But what do we do if we have a model that’s less well understood theoretically or just much more complicated? One of the things you could do in order to get your hands on error bars is using a Markov Chain Monte Carlo (MCMC) simulation.

# Markov Chain Monte Carlo

## The Metropolis Hastings Algorithm

You will need:

- A
*transition probability*$g(x’|x)$, with $g(x’|x) = g(x| x’)$, i.e. going from any sample $x$ to another sample $x’$ has the same probability as going back. This can e.g. trivially be satisfied by adding a small random number from an interval $[-\epsilon, \epsilon]$ to a model parameter. - A function $f$ that is
*proportional to*$p$, meaning that we don’t even have to have complete knowledge about $p$. This is a big advantage as sometimes

- Generate a candidate sample $x’$ picking from $g(x’|x_t)$.
- Calculate $\alpha = f(x’) / f(x_t)$,
- Draw a random number $0 \leq u \leq 1$.
- If $u \leq \alpha$, set $x_{t+1} = x’$, else $x_{t+1} = x_t$.

## Simulation History

## Parameter Estimates