After we’ve discussed the importance of testing, you probably have the feeling that it’s a good idea, on general grounds, but your code is really tough to test. You have all these dependencies and your app writes to a database, and you have to load your data. So, yea. Not doable. Or so you think.

Well, there are a few tricks you can apply. As for reading data, it’s rather easy, actually. You can use the StringIO package to mock a file object and use that as an argument to the function that reads your data. Let’s assume you have a

.csv

file that’s actually not _C_SV, but rather separated with the vertical bar “|” (or Sheffer Stroke). You want to make sure this is taken into account across the project, so you want to test for it (good idea!). It’s as easy as this:

import pandas as pd
import unittest
import StringIO

def readData(data_file):
    return pd.read_csv(data_file, sep="\|")

class TestReader(unittest.TestCase):
    def test_gets_separator_right(self):
        """Make sure we have '\|' as separator."""
        mockData = StringIO.StringIO("a\|b\n1\|2\n3\|4")
        df = readData(mockData)
        self.assertTrue(all(df.columns.values == ["a", "b"]))

That thing we just did? The one where we used a file object as input rather than a string and then open the file object inside the function? That’s a very good idea when writing code. It’s called flexibility. You don’t really care what object you get as an argument. It just needs to have a

read

function. You could even get input from a REST API or some form of database in the future. This is one of the neat things about testing. Making you code testable usually makes it more modular and flexible.

Speaking of databases. What if we want to mock a database instead of a file? We have to work a bit harder. Sometimes one can just use a local instance of the database. But often it’s better to make a mock object, one that simulates the database (or whatever you need to mock) with limited functionality. This way your tests can run on any machine, even if it has no access to a database of the right kind.

One instructive way to go is to hack your own. Let’s assume you want to store configuration in a MongoDB instance and need to make sure that the

update

function of the right database in the right collection is called. You can do this e.g. thusly.

import unittest
from collections import defaultdict

class MockDB(object):
    def __init__(self):
        self.objects = []
        self.calls = defaultdict(int)
    def update(self, search, newvlas, upsert):
        self.calls['update'] += 1

class MockMongo(object):
    def __init__(self, dictgen):
        self.d = dictgen
    def __getattr__(self, name):
        return self.d[name]
    def __contains__(self, item):
        return item in self.d

def saveOptions(mongo_client, **opts):
    for k in opts:
        mongo_client.myapp.settings.update(
            {'key': k},
            {'$set': {'value': opts[k]}},
            True)

class SaveOptionsTest(unittest.TestCase):
    def setUp(self):
        self.db = MockMongo(defaultdict(
            lambda: MockMongo(
                defaultdict(lambda: MockDB()))))
    def test_uses_right_collection(self):
        saveOptions(self.db, model='svm')
        self.assertTrue('myapp' in self.db)
    def test_uses_right_db(self):
        saveOptions(self.db, model='svm')
        self.assertTrue('settings' in self.db.myapp)
    def test_calls_update(self):
        saveOptions(self.db, model='svm', S=5)
        self.assertEqual(self.db.myapp.settings.calls['update'], 2)

You see that we have to jump through a bunch of hoops to get the member lookup right, but well. That’s how MongoDB behaves. In a real-world application you would probably want something more industry-grade. You can check the excellent selection of mock frameworks on the python homepage. A mock framework makes the creation of mock objects easy and often gives them some basic functionality, like watching and counting function calls. Happy mocking!