# So I Donated a Dataset

You know we go the extra mile for an interesting data set. I’ve recently been doing so quite literally and logged around 5 running sessions per week with help of a GPS running watch. Thanks to some Selenium magic, I’ve been able to easily download the raw CSV files and am now able to donate them for your analysis pleasure. You can find them on github.

## Running Training Sessions Data Set

Each session is logged at a resolution of one second, containing

- Time
- Heart Rate (bpm)
- Speed (km/h)
- Pace (min/km)
- Cadence (steps/minute)
- Altitude (m)
- Distance (m)

The device used to log the runs (the discontinued Polar M400 running
watch) also creates columns for additional metrics that it doesn’t
log. These are left blank in the files. The data is provided as
gzipped CSV files. *Note that no data cleaning was attempted*. This
means that the data contains outliers and measurement errors. Below
you see a sample plot representing the altitude profile of a 20
kilometers run, including a number of apparent measurement errors.

The data set contains sessions of various distances ranging from a few kilometers to 20+ kilometers.

## How Slow Do I Go Uphill?

I’ll probably do some more sophisticated analyses on the data later on, but for inspiration for things you could do with these data, let’s find out how much slower I go uphill. I’ve added a column measuring ascend in percent. I fit an ordinary least square model from the Statsmodels package, predicting speed from heart rate (and it’s square), seconds ran, and ascend (as well as its square). The data is grouped in two minute sections to remove autocorrelation. This relatively simple model explains roughly a third of the variance in the data.

Model: | OLS | Adj. R-squared: | 0.310 |

Dependent Variable: | Speed (km/h) | AIC: | 2801.2025 |

Date: | 2018-07-15 13:12 | BIC: | 2830.8959 |

No. Observations: | 1042 | Log-Likelihood: | -1394.6 |

Df Model: | 5 | F-statistic: | 94.36 |

Df Residuals: | 1036 | Prob (F-statistic): | 5.84e-82 |

R-squared: | 0.313 | Scale: | 0.85614 |

Coef. | Std.Err. | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|

const. | -10.3113 | 3.3817 | -3.0491 | 0.0024 | -16.9472 | -3.6755 |

HR (bpm) | 0.2789 | 0.0483 | 5.7732 | 0.0000 | 0.1841 | 0.3737 |

Seconds | 0.0001 | 0.0000 | 4.1677 | 0.0000 | 0.0000 | 0.0001 |

HR^2 | -0.0010 | 0.0002 | -5.5585 | 0.0000 | -0.0013 | -0.0006 |

Ascend (%) | -0.0572 | 0.0073 | -7.7989 | 0.0000 | -0.0716 | -0.0428 |

Ascend^2 | -0.0017 | 0.0002 | -11.6011 | 0.0000 | -0.0020 | -0.0015 |

So how much does running up or down a hill slow me down? The mean ascend over the two minute windows I’ve used is distributed like this.

So the relevant ascend range is roughly between -20% and 20%. Taking the quadratic effect of the slow-down into account (remember I used the ascend as well as it’s square in the model), the influence of running up or down a hill looks like the following graph.

On the steep downhill sections, I gain roughly half a km/h, while I loose on the steep uphill sections more than twice that. This makes compete sense since I try to avoid running downhill too fast to save my knees.

This is it for now, have fun with the data and stay tuned for more data adventures.