• Grouping Conflicts in Africa, Spark Edition

    In the last two posts I did some exploration of a fascinating data set published by the Armed Conflict Location & Event Data Project (ACLED). You can find the code I used on github. The ACLED data lists incidents in armed conflicts all over Africa and some countries in South and Southeast Asia, since 1997 and in great detail.

  • Data Science For a Cause, Grouping Conflicts in Africa

    Last time we started looking at a fascinating data set from the Armed Conflict Location & Event Data Project (ACLED), that lists incidents in armed conflicts all over Africa and some countries in South and Southeast Asia, since 1997 and in great detail. Make sure you have a look, if you like you can download my code from github in order to get you started.

  • Armed Conflicts in Africa, Illustrated in R

    Sometimes (most of the time), a data scientist’s life may seem like fun and games. But sometimes, we have to deal with the graver topics in life. Like armed conflicts.

  • A Data Scientist's Toolbox, Part 3: Mock Your Data!

    After we’ve discussed the importance of testing, you probably have the feeling that it’s a good idea, on general grounds, but your code is really tough to test. You have all these dependencies and your app writes to a database, and you have to load your data. So, yea. Not doable. Or so you think.

  • A Data Scientist's Toolbox, Part 2: Testing your code.

    In the last post, I talked about the usefulness of REPLs which indeed is hard to overstate. Exploratory data analysis would be a lot of hassle without the read-evaluate-print loop. We had a closer look at Jupyter in particular and first attempts at analyzing the data from the post on food and inflation can be found on github.

  • A Data Scientist's Toolbox, Part 1: REPLs

    What a strange animal a data scientist is, somewhere in between a craftsman and an artisan, working in code, on data, visualizing, modeling, tinkering. Most of you will agree though that much of what we do is more craft than art, as as with every good craftsman, we need good tools. Sure, a truly skilled worker can create beautiful things with sub-par tools, but he or she won’t nearly be as efficient as if they were using high quality tools. So what are those? For me, everything starts with the very basic needs of navigating, finding, and modifying files, for which I use shell tools and a great editor, such as vim or emacs. But this post is not about those basic building blocks. Not that they aren’t important, on the contrary, but that’s probably a topic for another post.

  • Food and Inflation

    After my last post a friend said: “What?! You blogged about baby names? Again?” This was when I realized that I really needed to kick my obsession and move on to a new data set. The good people at Open Data for Africa came to my rescue with a great data set from the Food and Agriculture Organization containing producer prices for primary crops, live animals, livestock primary products, all together 200 commodities, collected in 130 countries between 1900 and 2014. They claim that the data they collect represents 97% of the world’s gross agricultural produce and thus are of course an absolute treat for a data scientist’s lazy Sunday afternoon.

subscribe via RSS