Why we stopped using Vault & GoCD

I’m a huge fan of Hashicorp Vault — it’s really well designed, and some of the capabilities around dynamic credentials are excellent, and provide a real step forward in credential management. GoCD was the tool of choice for years, and provided plenty of advanced functionality, like Value Stream Mapping, inter-pipeline dependencies, and elastic agents. But towards the end of my tenure at Livescore, we decided to stop using them both. Here’s how and why that came about.

Shortly after joining livescore’s newly formed data department, we needed some new infrastructure. We needed at least:

  • CI/CD tools — we used GoCD, as the few engineers we had were familiar with it, and everyone was comfortable
  • Airflow for automating our new ETL processes
  • Elasticsearch and Kibana to collect logs from our production applications and feedback to ops teams
  • A few pieces of custom software we had developed for complex data reconciliation
  • A few processes deployed via Cloud Run into our cluster
  • Somewhere secure to store our passwords and API tokens for various services

Having good hands-on experience with kubernetes and GCP, and being in the fortunate position of being almost entirely cloud native, I was thrilled to get to deploy all of this, on a fancy autoscaling GKE cluster, which I felt comfortable depending on, after performing load testing.

I set up authentication with Okta and kubernetes, and installed the Vault secret auto-injector into the various clusters we were running, to make it easier for developers to get secrets injected into their pods. I even created CI/CD pipeline to manage vault policies, roles, and permissions from a git repo.

We were set up for the latest and greatest in secret management, and had advanced capabilities for CI/CD. So why did we abandon them?

Github Actions

At first, I wasn’t massively sold on the idea of GitHub actions, they looked fine, but we were reasonably happy with GoCD, in that we knew how to use it to do what we wanted, and had tons of pipelines in there already — moving would require a fair bit of effort. But the developer experience again with these was excellent — I created my first pipeline entirely in the UI, even authenticating with GCP, and it worked first time. I was shocked at how easy it was, and decided that should be our go to option from then on. After agreeing, I personally migrated all pipelines from GoCD to github actions, and we decommission GoCD. What I liked so much about this CI/CD tool was that it was built into our version control system — there was no complexity in joining the tools together.

With this change over, we were no longer using Vault for pipeline secrets — github actions had a primitive capability for storing and using secrets. It wasn’t amazing, but our pipelines weren’t amazingly complex with large Value Stream Maps, so it just about managed for us.

Running everything cloud native

  • for little things, we used cloud functions as the glue between workflows
  • for serving APIs, we used app engine or cloud run
  • For other software that wasn’t so easy to deploy in these models, we sometimes used compute engine.
  • For major data processing, we used dataflow
  • For workflow orchestration, we had google cloud composer
  • Now for deployment we’ve got github actions

After departing gamesys’ on-premise github, we now had no ties to physical infrastructure for on premise workloads.

But developers didn’t really take to Vault — even though I ran sessions explaining the what, why and how, and shared confluence articles about our setup. Developers would ask about secret management, and I’d explain about Vault, and their faces dropped to form an expression that I interpreted as “but I just wanted to deploy my thing, not learn all this other stuff”.

We eventually decided to move to a managed version of Elasticsearch, as we didn’t have the necessary skills to manage it, and didn’t want to invest in them.

Google Cloud Secret Manager

It turns out, that when you run your software on Dataflow, Cloud Functions, or other managed services, they already come with a strong identity using GCP service accounts, and it’s very easy to get secrets using their library. We can also manage IAM permissions based on individual secrets.

We lost the ability to use dynamic secrets, but the environments we were using meant we didn’t need that dynamic aspect — we didn’t have database passwords, because we mostly used serverless firestore, which depends upon GCP’s IAM functionality, rather than handing out credentials.

Reduced Workload on our central cluster

Conclusion

The close integration go github actions to the codebase — being able to see pull request feedback in seconds in the same UI, was incredibly helpful. It was one less tool to onboard developers with, making their experience easier.

The easy of google cloud secret manager, compared with the steeper learning curve of vault, was much more suited to our team, who were using GCP day in, day out, and more suited to our computing environment.

There are still things I miss about GoCD and Vault that can’t be quite so easily achieved with these simpler tools, but I wouldn't revert the developer experience gains we made with this switch.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store