Testing databases like object oriented code

A few years ago, I was working on rebuilding a complex system for rewarding users for participating in on-site promotions. The results of these promotions were calculated in batch, and this logic was almost exclusively held within database procedures. When given a blank slate and a tight deadline to rewrite the application to allow for some substantial new business requirements, one of the core components I was most concerned with getting right, were these database procedures.

There was a consensus that CI/CD was something for software development teams, and couldn’t be applied to the way data teams work

I had been a big proponent of adding continuous integration and continuous deployment into our development processes at the time, but I felt some pushback about this. There was a consensus that CI/CD was something for software development teams, and couldn’t be applied to the way data teams work, that it was just too different. I felt strongly that it wasn’t just possible, but it would be awesome! But there hadn’t been the right opportunity to start doing it, until now.

In the years prior to this situation, we had developed these procedures and functions slowly, over a long period of time, with a long and tedious manual QA testing process. Now we had little time, but a lot more of resources to throw at the problem, a little like the COVID vaccine I suppose — we need it now and we need it right, so good thought leadership must prevail. I knew the way we were used to working previously, wouldn’t be able to scale easily to allow so many developers to work simultaneously on so many areas — while other people were panicking about the amount of work and the timeframes, I could see clearly that no amount of developers could get this done in time with current methods, so we needed to drag our database development processes into modern times, and quickly.

Breaking monolithic database procedures into composable chunks

We depended heavily (although not exclusively) on DB2. Honestly, I’m yet to talk to another developer outside the company that uses DB2, and although sometimes it was quite frustrating, and the documentation sometimes left a lot to be desired as a younger developer, this database is actually quite fantastic, and can handle huge scales for both OLAP and OLTP workloads.

At the core, we had maybe 12–16 frequently used procedures to account for the vast majority of the work we were doing. But some of these were not so pretty — database procedures standing at 3000 lines long can be difficult to reason about, it’s a lot of logic to hold in your memory when trying to modify, and even if you’re confident about your change, your team will obviously want that carefully checked at tested.

In comparison to object-oriented programming (OOP) code, this isn’t so different from refactoring. Taking a large function and extracting some commonly used low-level details out, is something most mid-level devs should be fairly familiar with. One aspect I like to carefully consider when writing clean code is to try to make sure everything in a function is at the same level of abstraction (ie. don’t have a function that adds a user to a list, which needs to do some detailed comparisons of users in the list, which needs to perform a regex on multiple fields — this crosses multiple levels of abstraction, making it difficult to read from line to line). This was a key area I knew we could improve.

An example starting point

Let’s look at some before and after examples, but I’ll add some comments to try to cut down 3000 lines to get the gist of how things were done.

The following function takes a few parameters as inputs, and outputs a table of results. This works a lot like a view in a database, but with parameters. (It’s all in lowercase, because a friend once told me it takes twice as long to read uppercase sentences, and ever since then, my SQL is always lowercase. I think he was right, after a while it’s better). I have vastly oversimplified, but I’ll try to explain what makes this complex.

note: it’s probable that none of these code snippets will actually compile, but they’re close enough to get the gist.

What’s undesirable about this? (bear in mind it’s now ~20x shorter to simplify things)

  • there are many subqueries, and reading the code forces the user to jump around in various levels of abstraction — considering how financial records should match up to game configuration is a low-level detail, whereas calculating the amount they should receive as a reward, given all the other credentials, is a more high-level detail. Frequently jumping between levels of abstraction is an easy way to get confused and create errors.
  • None of this is reusable: I can’t image any worthwhile promotion that wouldn’t at least use some of this information again, like financial details, promotion opt in, excluding certain abusive accounts. All mechanics will have some aspects of this.
  • This would be difficult to test — just like testing a huge OOP function becomes difficult with so many edge cases, this would be difficult to test.
  • The code reaches across schemas to get data from other places. There’s very little concept of dependency management, and isolating the areas we want to test.

Smaller chunks

We tackled this by splitting the logic into smaller domains. Here’s an example:

A few points to note:

  • all of the items referenced belong to the same schema — we’re no longer reaching across schemas to fetch data from other places. This will become important later, and we’ll dig into how this is managed in a test vs. production environment
  • this new function itself uses composition, and passes some of its input variables as arguments to a smaller function, “get_applicable_games” (we’ll ignore this right now in testing, and address it in a more complex scenario later).
  • this function focuses on a fairly small domain — we should be able to test this with relative ease.
  • it performs a low level function of joins, filtering and formatting, to return a higher level result set, that can be used at a higher-level function more intuitively.

Testing a single function

Here’s an example of the kind of test we’d run to prove this does what we think it does:

This works like so:

  • create 2 tables with the same structure as the function output
  • insert some data we want to use for testing, into a dependant table. This effectively mocks the real table that will be used in production.
  • call our function with some inputs, and store the results in one of those tables.
  • manually calculate the expected results, and store those in the other table.
  • use the testing framework, db2unit, to compare the 2 tables, which should be identical in content.

This is not unsimilar at all to some OOP style tests, it has some setup, some variables, execution of the object under test, and comparison of results — in this case it’s like comparing 2 arrays of objects and expecting them to be the same. Now imagine we have 4 tests for this under different scenarios. And now imaging we have 10–15 of these kinds of utility functions we can use as we please (ignoring for now how this will actually get created and executed).

Fetching data across schemas

The test above inserts data into a schema called dependencies, but the function definition uses an object in the schema promotions_and_rewards. This works via an alias. In the test environment, this alias points at this dependencies schema, but in the production environment, this alias instead points at the actual object in the other schema where our production data lives. These objects are all deployed using liquibase, an excellent database change management tool. We use an environment variable when running liquibase to decide whether to use the dev alias or live alias, like so:

This works like so:

  • We have a context applied, either live or not live
  • if not live, we create a dummy version of the table, and the alias points at this dummy version
  • if live, the alias simply points at the real version

Through this mechanism, we eventually uncovered all the dependencies of our schema, and could more easily identify changes being made outside out schema, and if they would have an affect on us.

Stubbing dependencies

In OOP, if you have a high level function calling a smaller function, in a test environment, you can simply “stub” that function, to do whatever you please in it’s place. This was exactly what I had in mind when I wanted to treat database tests like OOP tests, to get a highly productive development environment. This was not immediately straightforward, and took some head scratching to get it “right”, but I was sure the end result was essential for our ability to test effectively and do work in small batches with confidence.

Let’s assess where we are:

  • We’ve got our high level database function/procedure, which depends on some of our smaller utility functions.
  • We’ve also got other testing utilities available through db2unit, such as setup and teardown functions — this is common in OOP testing too.
  • We do NOT have a native “stubbing” function, (How would this realistically work in a general fashion in the testing framework?). We need a mechanism to do this.

Here’s my solution:

  • when we’re provisioning our database environment, we also create a “stub” table for each of our functions
  • we create a method to “stub” the actual function. This “saves” a copy of the current definition of the function (retreived from system tables) in a table, and replace the function with a dummy version, which simply returns the contents of the stub table.
  • we “revert” the stubbed function after every test invocation, by fetching that definition back from the table, and executing it. This is done in the test tear down function:

Now, we’ve got the ability to “mock” our function, just as we would in OOP style tests. Let’s see what this looks like in the context of a larger test. Again, this may not compile, but the point remains. The general structure of the test is similar to the previous test, but the important part here is the stubbing of other functions:

Now that we know for sure what this lower-level function will return when called, we can test the logic of our higher level function, without needing to test the logic of our lower level function, or insert data into those low-level tables. We can simply assume that members have met or not met some defined criteria, and check that our high level function stitching it all together, is doing it’s job properly.

In this way, each function is now massively reduced in size. I put a limit in place to say that as a rule of thumb, no function body should be greater than 100 lines long — and this was surprisingly easy to stick by!

CI/CD processes

If we were writing software, the first thing we’d do with those unit tests is stick them into a CI system — in DevOps style, we’d like to fail fast, especially since the business has been under a tight deadline the whole time I’ve been explaining this! I’ve now got 4 other developers working full steam ahead on database changes, all of which might overlap with one another, and I’m in the position of reviewing it all, whilst still doing a lot of other work related to the project. We absolutely needed a CI system to allow me to unblock developers as fast as possible, so they’re not sitting around waiting for me to review something.

developers get feedback for their changes in minutes

I created a CI pipeline to run on every pull request commit, to deploy the entire schema and it’s dependencies into a temporary db2 database in a container (easier said than done!). It then deploys all the unit tests, and executes those unit tests, finally report back to our version control system, the success status or any failure details. This means developers get rapid feedback for their changes within minutes, and we’re able to rapidly boost our productivity in this area.

We moved from scheduled deployments every 2 weeks, to several deployments per day, while increasing stability

And what comes after a CI pipeline? Obviously we followed up with continuous deployment pipelines for our database. This resolved some long standing issues caused by environment drift, because of the way we had deployed things in the past. We also shifted from scheduled deployments every 2 weeks, (not-so-)carefully managed by a deployment manager (a human), to several automated deployments per day, which increased deployment stability.


We were able to rapidly onboard developers to this area of those codebase, who had previously shied away from this area of our warehouse. This business function turned from a scary area, where any given procedure could turn out to be 1000’s of lines of mess, into the fastest moving area of our warehouse with modern techniques. The rest of the project was bouyed by the fact that this central piece was going so smoothly and on time.

Most importantly, we set the standard for what was possible with our current environment. Everyone hated our environments because they had so many hacks, historical oddities and quirks, development was slow, deployment was error prone and full of delays, the environments weren’t suitably isolated, and weren’t suitably live-like — everyone was very keen to get away from this, and onto the cloud. But I think the real reason we hated the environment, is because nobody ever loved it.

It certainly took some time to get this process narrowed down, but this time spent paid itself back over and over. If your data warehouse environments look horrifying, don’t despair — some effort in this area can go a long way to getting a data team moving on the path towards high performance development. I later demo’d how quick we could turn around other areas of our warehouse to use this kind of standard, and the first step of version controlling schemas and automating deployments, covered 1% of our warehouse, with a single developer effort over 2 days. This was enough to get the ball rolling and turn heads within the management level. I’d 100% recommend giving this a go, it’s one of the areas I’m most proud to have worked on.