I’m a huge fan of Hashicorp Vault — it’s really well designed, and some of the capabilities around dynamic credentials are excellent, and provide a real step forward in credential management. GoCD was the tool of choice for years, and provided plenty of advanced functionality, like Value Stream Mapping, inter-pipeline dependencies, and elastic agents. But towards the end of my tenure at Livescore, we decided to stop using them both. Here’s how and why that came about.
Shortly after joining livescore’s newly formed data department, we needed some new infrastructure. We needed at least:
- CI/CD tools — we used GoCD, as the few engineers we had were familiar with it, and everyone was comfortable
- Airflow for automating our new ETL processes
- Elasticsearch and Kibana to collect logs from our production applications and feedback to ops teams
- A few pieces of custom software we had developed for complex data reconciliation
- A few processes deployed via Cloud Run into our cluster
- Somewhere secure to store our passwords and API tokens for various services
Having good hands-on experience with kubernetes and GCP, and being in the fortunate position of being almost entirely cloud native, I was thrilled to get to deploy all of this, on a fancy autoscaling GKE cluster, which I felt comfortable depending on, after performing load testing.
I set up authentication with Okta and kubernetes, and installed the Vault secret auto-injector into the various clusters we were running, to make it easier for developers to get secrets injected into their pods. I even created CI/CD pipeline to manage vault policies, roles, and permissions from a git repo.
We were set up for the latest and greatest in secret management, and had advanced capabilities for CI/CD. So why did we abandon them?
Not long after splitting Livescore’s infrastructure from Gamesys, we moved from an on-premise GitHub environment, to github’s managed offering. This was great from a developer experience, always being up to date with the latest features, like multi-line comments and suggestions in pull requests. We also got a look at github actions.
At first, I wasn’t massively sold on the idea of GitHub actions, they looked fine, but we were reasonably happy with GoCD, in that we knew how to use it to do what we wanted, and had tons of pipelines in there already — moving would require a fair bit of effort. But the developer experience again with these was excellent — I created my first pipeline entirely in the UI, even authenticating with GCP, and it worked first time. I was shocked at how easy it was, and decided that should be our go to option from then on. After agreeing, I personally migrated all pipelines from GoCD to github actions, and we decommission GoCD. What I liked so much about this CI/CD tool was that it was built into our version control system — there was no complexity in joining the tools together.
With this change over, we were no longer using Vault for pipeline secrets — github actions had a primitive capability for storing and using secrets. It wasn’t amazing, but our pipelines weren’t amazingly complex with large Value Stream Maps, so it just about managed for us.
Running everything cloud native
As the team grew and progressed, we really did deploy everything in a cloud-native fashion:
- for little things, we used cloud functions as the glue between workflows
- for serving APIs, we used app engine or cloud run
- For other software that wasn’t so easy to deploy in these models, we sometimes used compute engine.
- For major data processing, we used dataflow
- For workflow orchestration, we had google cloud composer
- Now for deployment we’ve got github actions
After departing gamesys’ on-premise github, we now had no ties to physical infrastructure for on premise workloads.
But developers didn’t really take to Vault — even though I ran sessions explaining the what, why and how, and shared confluence articles about our setup. Developers would ask about secret management, and I’d explain about Vault, and their faces dropped to form an expression that I interpreted as “but I just wanted to deploy my thing, not learn all this other stuff”.
We eventually decided to move to a managed version of Elasticsearch, as we didn’t have the necessary skills to manage it, and didn’t want to invest in them.
Google Cloud Secret Manager
Sometime after we had Vault up and running and using it for a few things, (although half of this was just the UI to store and retreive secrets), google announced google cloud secret manager. When I first seen it, I did a little double take — “but it’s so basic?! what can that do?”. Of course, I was comparing it to Vault, a truly enterprise scale secret management solution. I hadn’t consider the uses for small, autonomous teams.
It turns out, that when you run your software on Dataflow, Cloud Functions, or other managed services, they already come with a strong identity using GCP service accounts, and it’s very easy to get secrets using their library. We can also manage IAM permissions based on individual secrets.
We lost the ability to use dynamic secrets, but the environments we were using meant we didn’t need that dynamic aspect — we didn’t have database passwords, because we mostly used serverless firestore, which depends upon GCP’s IAM functionality, rather than handing out credentials.
Reduced Workload on our central cluster
We no longer had the need to run GoCD, Vault, or ElasticStack components in our central cluster. We then moved any miscellaneous items to other computing models, or abandoned them, and we were able to turn off our cluster, saving around $1,000 a month.
We stopped using the more enterprise-focused tools, even though they were already at our disposal, because the close integration provided by simpler products was more suited to our small autonomous team.
The close integration go github actions to the codebase — being able to see pull request feedback in seconds in the same UI, was incredibly helpful. It was one less tool to onboard developers with, making their experience easier.
The easy of google cloud secret manager, compared with the steeper learning curve of vault, was much more suited to our team, who were using GCP day in, day out, and more suited to our computing environment.
There are still things I miss about GoCD and Vault that can’t be quite so easily achieved with these simpler tools, but I wouldn't revert the developer experience gains we made with this switch.