A few years ago, I was getting some help from a mentor who worked in one of my company’s infrastructure teams, around some work we were doing on our on-premise private cloud. At one point we had to stop and ask each other the question, “What do you think the cloud is?”
He said “it’s elastic computing infrastructure”. He was a lot more experienced than me, so I just shut up and looked funny, until he asked me what I thought the cloud meant. I said “it’s all those completely managed services tied together, so you don’t have to think about the infrastructure, it’s just done for you”. It was an interesting comparison of how we had each developed different definitions for what “cloud” meant.
I think some better terminology might have helped.
cloud computing (noun)
the practice of using a network of remote servers hosted on the internet to store, manage, and process data, rather than a local server or a personal computer.
The public cloud is defined as computing services offered by third-party providers over the public Internet, making them available to anyone who wants to use or purchase them. They may be free of charge or sold on demand, allowing customers to only pay per usage for the CPU cycles, storage or bandwidth they consume.
If you’re working in IT and haven’t heard of the cloud, I’d love to know how! Thousands of companies and many governments are in some form of transition to migrate their infrastructure to a public cloud provider. The term “cloud-native” has become commonplace, which can very loosely be taken to mean applications that are designed with the “cloud” model in mind. But we’ve just seen there are 2 definitions of the cloud — does that mean there are 2 definitions of “cloud-native”? Which one do you want?
As my experience evolved, I delved deeper into my version of what “cloud-native” meant to me — heavy reliance on my cloud provider of choice, using their cloud services without thinking about instances or infrastructure. I chose serverless products, like app engine, cloud run, firestore, bigquery, pub/sub and more, and didn’t need any setup or fine-tuning. Google offers some of the following tools to enable cloud-native app development:
But recently I had a conversation with someone who said they were “cloud-native”, but they were scarely using any of the features of their cloud provider whatsoever — about kubernetes and that’s it. So what is the other definition of cloud-native? The Cloud Native Computing Foundation (CNCF) sounds authoritively named to define this. Here’s their take:
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
The CNCF is a part of The Linux Foundation, and was kickstarted with the kubernetes project, which productised 10 years of experience at google running fault-tolerant container orchestration.
They specifically mention a few aspects:
- hybrid clouds and service meshes
- infrastructure and declarative APIs
I didn’t have any of these things when doing my version of “cloud-native”, but this certainly raises the question, why would we want 2 definitions?
Vendor-specific features vs. Portability
Just like you might be “all-in” on apple products, because they offer a coherent experience across products, if you’re all in with google cloud, there are certainly good integrations available between products to make things a little more seamless. For example, their cloud operations suite offers easy logging, monitoring and alerting, regardless of whether you’re running networks, compute infrastructure, databases or serverless functions.
But if you’re all in with one public cloud provider, you’re laying a lot of trust in their hands. If they have an outage, deprecate a product, or increase their prices, there’s virtually nothing you can do.
Enter our CNCF cloud-native definition….
The essential idea is that you can run kubernetes as your application platform in many places — on-premise, and in multiple public clouds (optionally, you can connect them with a service mesh). Then since your application is “cloud-native” to kubernetes, you can easily shift it from on-premise to the cloud, or between cloud providers, with considerably less effort than if you were “all-in” with one of these cloud providers. This portability protects you against any of those problems with a single provider.
So which model should you go for?
Honestly, implementing the CNCF model of “cloud-native” can be a huge challenge — your team needs a lot of knowledge about kubernetes, and you need a hybrid-cloud or multi-cloud infrastructure set up. This requires a LOT of knowledge, and many teams working together to get up to speed. Senior engineers with kubernetes and cloud experience don’t come cheap, so why would a company pay for this?
For large businesses, the lock-in to a single vendor (whether their own on-premise infrastructure, or a public cloud offering), provides a single point of failure — their business has decided the potential for this downtime is too big of a risk.
But there are also enterprise architecture concerns coming into play — for a large organisation, standardising on a single provider may not be sufficient for so many teams. If there are hundreds of developers across multiple domains, demanding every team to use the same provider may not be pragmatic, but asking all teams to focus on kubernetes, provides a common platform in many cloud locations. The major cloud providers are aware of this, and know many customers want a presence in multiple clouds — in the last few years, all providers have created solutions to extend their clouds. Google offers managed kubernetes in other cloud providers using Anthos, Microsoft has Azure Arc to do something similar, and AWS has recently released EKS anywhere. Cloud providers are offering this version of cloud native to extend to all your locations (and charging a pretty penny for it!)
But for smaller businesses, this may not be a concern. GCP can offers at least 99.9% uptime for most of their services, equating to less than 9 hours downtime per year. Decreasing this number may not be a high-priority for a company desperately trying to develop a product, generate revenue and grow. They also don’t need to think about the complexities of developing a multi-cloud strategy or kubernetes expertise.
I think it ultimately boils down to the maturity and risk requirements of your organisation and teams, as to when you might want a more portable cloud model.
Enjoyed this? I’m planning on writing regularly, my next article will be about improving the quality of software you write — a combination of personal experience, and a review of 2 books, Clean Code and Refactoring: Improving the design of existing software, and will be published by the end of next week.