A questions we often get requested is “who is responsible for what inside Kubernetes?” Whilst the answer really depends on your organizational structure and how segmented your teams are, I will portray the most common structure we see.
In general, there are 4 roles that span the operational obligations of managing Kubernetes: the Infrastructure Team (for on-premises deployments), the Cloud/Platform Team, the DevOps Team, and Developers.
The infrastructure team only exists if you are using on-premises hardware, as somebody needs to handle the physical server/storage/networking equipment.
The Cloud/Platform team is responsible for the creation, upgrading, and scaling of Kubernetes / Docker Clusters. It is also responsible for triaging any performance or availability issues that arise with the platform (but not the applications). Fundamentally, it holds the internal OLA to ensure the system delivers acceptable SLAs.
The Cloud Platform team is responsible for:
- creating any automation for the deployment and configuration of the clusters (Infrastructure as Code),
- ensuring the platform complies with internal security requirements/policies (such as authentication, activity logging, policy brokers, and configuring position-based entry control).
- And, as the holders of the “cluster-admin” privilege, they also install any DevOps tooling that is required using this privilege, including tools such as Portainer, and any other observability, logging, and entry tooling.
Combined, the Infrastructure Team and the Cloud Platform team are commonly known as the “Ops” Team.
The DevOps team is responsible for
- getting (and keeping) applications operating in manufacturing and non-manufacturing environments. This often includes writing (or at least helping) the dockerfiles that create container images from developer code.
- They create the application deployment manifests, and are the people that configure any CI/CD automation pipelines that ensure the application is constructed and deployed as anticipated.
- They are generally the team that is on call to support any issues with the applications in manufacturing. They need to be able to triage application performance and availability issues and so are consumers of observability and logging tools that run in the cluster.
The Dev team is responsible for
- writing the application code and testing the code works regionally on their development environments,
- creating dockerfiles and native deployment manifests (compose files). They are also consumers of the CI/CD pipelines through automated image builds for their dedicated code.
- supporting their applications in manufacturing.
Combined, the Devs and DevOps teams are often known as the “Development” team.
For larger organisations, there is also likely an SRE team, whose sole focus is to improve the system reliability through continuous improvements, either by recommending adjustments to deployment configurations, implementing more reliable rolling update policies, monitoring load distribution and multi-geo deployments etc. When an SRE team is in play, they are the team ultimately responsible for application performance and availability.
For smaller organisations, it’s common for the Infrastructure Team, the Cloud/Platform Team, and the DevOps team to be 1 and the same group of people.
So, this is the most common structure we see most often in organizations today… but is it a panacea? Only time will inform. One thing is for sure though, where your organization obtains differentiated value from your digital assets (customer facing software), you should have your Devs focussed on improving software, not operating the platform that the apps run on.