Posts

  • Bayesian Estimation COVID-19's Reporduciton Number in Python

    On the last data adventure, we estimated the novel Coronavirus’ basic reproduction number $R_0$ using some Python scripting and basic exponential fitting. As much fun as that was, I was wondering if one could gain a more dynamic understanding of the situation. Luckily, I stumbled upon this blogpost, based on this journal article, accompanied by this notebook, which attempts estimating a dynamic version of the reproduction number, called $R_t$. The interesting idea behind $R_t$ is that it will give some indication on how well measures aimed at reducing the spread of COVID-19 work in a given country or region.

  • COVID-19: Estimating R_0 in Python

    Yes, I know, everyone and their brother is talking about the novel coronavirus SARS-CoV-2. There is a lot of bad information and misinformation out there. For this reason I thought it would be a good idea to go on a data adventure together to have a look at the numbers (and how to get them into an easy-to-use format for you to have a look for yourself) and make our very own rough estimation of the basic reproduction number $R_0$ that people keep talking about.

  • Analyzing Cloudfront Logs in Metaflow, Finding Hackers Along the Way

    After we had a first look at Metaflow in the last data adventure, today we’ll use Metaflow for something more useful. I’m rolling this blog using Jekyll as my static site generator, and upload the whole thing into an AWS S3 bucket, serving it through AWS CloudFront. Now, I used Google Analytics for a while, but stopped due to privacy concerns. However, CloudFront offers simple logging functionality that will give the inquisitive admin the option to do some basic analysis on how people use their website.

  • Metaflow: A first look

    In December, Netflix open-sourced Metaflow, a data science production framework. You can think of it as a one-stop-shop for your data analytics workflow needs. A corporate suit for your data mess. It is very easy to use, python based, supports many popular libraries such as SciKit Learn and Pytorch, as well as the AWS cloud (which comes as no big surprise since Netflix runs on AWS).

  • Exploring the Strava Api

    If you, like me, are interested in running and data science, you might be interested in analyzing running performance. You could do this on someone else’s data, e.g. on a donated a data set, or you could use your own. You would e.g. be interested how your average pace/speed and average heart rate are distributed.

  • So I Donated a Dataset

    You know we go the extra mile for an interesting data set. I’ve recently been doing so quite literally and logged around 5 running sessions per week with help of a GPS running watch. Thanks to some Selenium magic, I’ve been able to easily download the raw CSV files and am now able to donate them for your analysis pleasure. You can find them on github.

  • Fun with Neural Networks Part 2: Autoencoders

    After we familiarized ourselves with Keras in the last post, now is the time to get more serious. Much has been said and written about neural networks, and nobody working in analytics nowadays can really escape the hype. Most of the time you’ll however only read about neural networks for classification or regression, that is to say in a supervised learning setting. That is quite interesting and all, but there are exciting things that you can do with unsupervised problems as well.

subscribe via RSS