Shaving some savings from our BigQuery bill

Cost spikes, for just one project!

Reducing total data stored is key

  • Fire this employee (We’d never do that, we’re not a soulless org!) — but we’d simply end up hiring a replacement with no guarantee they won’t turn round and do the same thing.
  • Just remove the user’s access — they’ll get someone else to run the query. Possibly better, but problem persists.
  • Educate the user in best practice — certainly a worthwhile idea, which we’ll do, BUT — we’ll eventually hire more people, people forget things, and people make mistakes. Education is great, but it doesn’t wholly solve the problem: the problem isn’t this user, the problem is that anyone is capable of making this mistake.
  • Put guardrails in place to prevent this kind of abuse — I suggested using custom quotas, but unfortunately this was viewed as unnecessary and too restrictive.

Deleting duplicate GA360 export data

  • size of GA exports: 58TB
  • growing at a rate of ~0.12TB/day
  • For the last 90 days, that’s ~11TB of data, stored in short-term storage, or $220/month
  • That leaves ~47TB in long-term storage, or $470/month.
  • Total monthly bill for storage of this data is $690
  • We decided we’d keep 30 days worth of data for reconciliation purposes — 0.12TB * 30 days = 3.6TB in short-term storage, or $72.
  • With some effort, we can save ~ $600 per month.

Deleting data exported from Firebase

  • size of firebase exports: 224TB
  • growing at a rate of 0.35TB/day
  • for the last 90 days, that means we’ve got 31.5TB in short term storage, at ~ $630/month.
  • leaving ~192.5TB in long-term storage, costing $1,925/month
  • Total monthly bill for storage is $2,555
  • We decided again to keep 30 days worth of data for reconciliation resolution — 0.35TB * 30 days = 10.5TB stored, or $210.
  • With some effort, we can save $2,300 / month.

Deleting staging tables

Deleting stackdriver exports

Why didn’t you do this in the first place?

  • they weren't a problem initially
  • it either wasn’t evident that it would become a problem, or the solution required more focus on the implementation, and it wasn't the right time to put that effort in
  • these costs grew quite slowly over the course of a few years — you could’ve look at the numbers month to month and never noticed a significant increase. Only when looking at the total and asking whether this was money well spent, did this become worthy of investigation.

What was the outcome of this?

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Distributed systems and Parallel computing

Coding tips

Optimistic vs. Pessimistic locking in Rails

Ballerina JDBC Client — Performing DB Operations

5 Simple Steps To Master Todoist

Speed up your python code by some concepts

Fast vs. Clever BoM?

The new actuator framework in Spring Boot 2

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mark McCracken

Mark McCracken

More from Medium

DBT at scale on Google Cloud — Part 1

Slack notification for BigQuery results using GitHub Actions

Cloud Data Fusion: Upload UDD’s through the Rest API

Hosting DBT Documentation in GCP