• Analyzing Cloudfront Logs in Metaflow, Finding Hackers Along the Way

    After we had a first look at Metaflow in the last data adventure, today we’ll use Metaflow for something more useful. I’m rolling this blog using Jekyll as my static site generator, and upload the whole thing into an AWS S3 bucket, serving it through AWS CloudFront. Now, I used Google Analytics for a while, but stopped due to privacy concerns. However, CloudFront offers simple logging functionality that will give the inquisitive admin the option to do some basic analysis on how people use their website.

  • Metaflow: A first look

    In December, Netflix open-sourced Metaflow, a data science production framework. You can think of it as a one-stop-shop for your data analytics workflow needs. A corporate suit for your data mess. It is very easy to use, python based, supports many popular libraries such as SciKit Learn and Pytorch, as well as the AWS cloud (which comes as no big surprise since Netflix runs on AWS).

  • Exploring the Strava Api

    If you, like me, are interested in running and data science, you might be interested in analyzing running performance. You could do this on someone else’s data, e.g. on a donated a data set, or you could use your own. You would e.g. be interested how your average pace/speed and average heart rate are distributed.

  • So I Donated a Dataset

    You know we go the extra mile for an interesting data set. I’ve recently been doing so quite literally and logged around 5 running sessions per week with help of a GPS running watch. Thanks to some Selenium magic, I’ve been able to easily download the raw CSV files and am now able to donate them for your analysis pleasure. You can find them on github.

  • Fun with Neural Networks Part 2: Autoencoders

    After we familiarized ourselves with Keras in the last post, now is the time to get more serious. Much has been said and written about neural networks, and nobody working in analytics nowadays can really escape the hype. Most of the time you’ll however only read about neural networks for classification or regression, that is to say in a supervised learning setting. That is quite interesting and all, but there are exciting things that you can do with unsupervised problems as well.

  • Fun with Neural Networks Part 1: First Steps with Keras

    I wanted to write about Neural Networks since a while, mainly because I saw this as an opportunity to summarize some of the things I learned working with Keras, a high level Neural Networks API building on top of Tensorflow (among others). I have used and came to like the latter, but always thought it lacks a bit on the usability side. This will be the first of a series of posts on Keras and Neural Networks. We won’t do anything fancy, just introduce a dataset we’ll be working on and showcase the basic usage of Keras.

  • Clustering 101, or: On Fridays, People Bike Differently!

    We have talked about the BABS open data data set many times before. It lists bike trips in the San Francisco Bay area, with start and end point, date, time, and some extra information about the rider. What we want to look at in this episode is some basic clustering, and some surprising results from this well-know data set. The plan is to find classes of typical days in terms of bike usage. One would e.g. expect different usage patterns between weekdays and weekends, and we will actually discover some fun things beyond these basics as we go along. Let’s dive right in.

subscribe via RSS