DevOps Engineer’s Guide to Kubernetes Operating Systems

This is the DevOps Engineer’s Guide to Kubernetes Operating Systems. Now that you’ve decided to use Kubernetes (or thinking about exploring some Kubernetes deployments.) There are several compelling reasons for this, as you are surely aware: Kubernetes manages containers, schedules workloads into a cluster, handles scalability and redundancy, and automates rollouts and rollbacks.

Kubernetes is an infrastructure-agnostic solution that moves managed parts to the intended state using declarative statements defining the state your systems and applications should be in. As a result, a more manageable, powerful, and extensible system emerges.

Of course, there is a learning curve associated with this “ease of administration,” but the benefits of current container-based software development on infrastructure that provides scalability and infrastructure portability are well worth it.

While Kubernetes provides operational scalability and administration for containers, it does not immediately assist you in managing the infrastructure upon which Kubernetes is built. Kubernetes is an application (or a collection of apps) in and of itself, and these apps must execute someplace.

Related Posts

Despite popular belief, Kubernetes is not an operating system and requires the installation of Linux (or Windows) on the nodes. Kubernetes may run on cloud providers such as AWS or GCE, virtualization platforms such as VMware, laptops using Docker, or bare metal server hardware using Sidero – but all of them require the installation of an operating system first.

(Some, such as AWS EKS, eliminate the requirement to manage control plane nodes but still need the installation of Linux servers for worker nodes.)

On the operational level, the focus is on Kubernetes and the workloads it supports – as it should be! – but this leads to a problem that is prevalent in Kubernetes deployments. While Kubernetes may be patched and upgraded on a regular basis (although it is frequently not, leaving it in a security hazardous condition), the maintenance, updates, securing, and operations of the underlying operating systems are frequently overlooked or neglected – at least until it’s time for a security audit!! I’ve heard a lot of SREs and systems administrators complain that managing Linux and Kubernetes is like having a second job.

Register Courses for this Year

Kubernetes, like any other Linux OS, requires patching, updates, security, user access control, and so on. However, just because such activities are completed at the Kubernetes level does not imply they may be overlooked at the OS level. The correct underlying operating system distribution, on the other hand, may go a long way towards lowering the effort of maintaining the OS and mitigating the consequences of not staying current.

Hence, given that you’ll need to instal Linux first in order to run Kubernetes, and that the underlying OS will have ramifications, which Linux is ideal for Kubernetes? There are other alternatives available, but they typically fall into one of two categories: container-optimized OSs or general-purpose OSs.

General Purpose Linux Operating Systems

These are the “standard” Linux distributions.

Several people are familiar with Ubuntu, Debian, CentOS, Red Hat Enterprise Linux (RHEL), or Fedora, which are all general-purpose Linux operating systems. One of the key benefits of having a general purpose OS inside your Kubernetes cluster is that your systems administrators will be acquainted with installing, updating, and securing such Linux distributions. Existing toolsets for launching servers, installing the operating system, and configuring it for basic security may be utilised. Even with Kubernetes operating on top of these platforms, existing patch management and vulnerability detection tools should work perfectly.

However….

With a general-purpose Linux system comes the overhead of general-purpose Linux management. This implies that user account management, patch management, kernel updates, service firewalling, SSH security, root login disabling, unneeded daemons disabling, kernel tuning, and other tasks must all be completed and maintained up to date. As previously stated, many of these activities may be performed using existing tools (Ansible, Chef, Puppet, etc.) that may be used to manage other servers; but, modifying the manifests or control files so that the server profiles are acceptable for Kubernetes master and worker nodes is… to say the least, non-trivial.

The synchronisation of operating system updates with Kubernetes maintenance is another issue. There is frequently a lack of cooperation, resulting in the operating system being kept unchanged after an installation. Kubernetes will (ideally) be upgraded over time, but the underlying operating system may be kept unchanged, progressively collecting a weight of known CVEs (common vulnerabilities and exposures) in the different packages and kernel.

Trending Articles

  1. Amazon Linux Security EC2 Hardening Script Guide
  2. Talos OS v0.7 Platform – Modern Systems Kubernetes
  3. Linux Operating Systems for Kubernetes – OS Support
  4. Container 101 Tutorials: Kubernetes & Docker Technology
  5. UNIX & Linux File Permissions | Read/Write & Change – Part 1
  6. Managing Linux Advanced File Permissions — Part 2
  7. Kubectl command – How to Add or Remove Labels to Nodes in Kubernetes

In an ideal world, the automation platform (such as Ansible or Puppet) would work in tandem with Kubernetes, allowing the nodes’ operating systems to be upgraded without affecting Kubernetes operations. This implies that a system must:

  • Cordon the node so no new workloads are scheduled on the node
  • Drain the node so all of the running pods are moved to other nodes
  • Update and patch the node
  • Uncordon the node

Of course, the system must guarantee that not too many nodes are updated at the same time, or the cluster’s workload capacity would suffer (nor too few nodes, so that the updating of a large cluster does not occur slower than patches and updates are released). To reduce reboots and disturbance, you may wish to synchronise OS upgrades with Kubernetes updates, but you will also need to handle more essential OS changes on short notice.

The experience that personnel will have with a general purpose Linux OS is a significant benefit. This implies they’ll be comfortable with both deployment and troubleshooting procedures. They can utilise (and instal if not already installed) standard operating system utilities like tcpdump, strace, and lsof, among others. Configurations may be simply adjusted to remedy mistakes and test alternatives (which is both a benefit and a drawback!). The negative is the increased effort and labour required to protect the platforms, as well as the requirement to coordinate changes with Kubernetes infrastructure and operations.

Container Specific Operating Systems

The National Institute of Standards and Technology offers an excellent description of the benefits of a Container Specific Operating System:

“A container-specific host OS is a pulled operating system that is solely meant to execute containers, with all other services and functionality disabled, as well as read-only file systems and other security measures. Attack surfaces are often far less when utilising a container-specific host OS than when using a general-purpose host OS, hence there are less chances to attack and compromise a container-specific host OS. As a result, companies should employ container-specific host OSs wherever possible. ” https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-190.pdf

NIST Special Publication 800-190 Application Container Security Guide

To recap, the less software and packages an OS runs, the less there is to attack, and the fewer vulnerabilities there will be. This makes container-specific operating systems far more safe from the outset, even if they aren’t patched frequently.

Other security measures, such as making the root file system (or, ideally, all file systems!) read-only, may be used by container-specific operating systems to mitigate the effects of any vulnerability.

Package managers are often not executed (or supported) on container-specific operating systems. This decreases the chances of a conflict preventing a node or service from operating due to a package installation or upgrade. The lack of management tools like Chef and Puppet further decreases the risk of configuration modifications or unfinished runs compromising the system’s operational reliability. Instead, a complete OS image with all updates and customizations is installed in a secondary boot mechanism and booted into at the next reboot, with a fallback to the last known good image. This implies that the nodes’ configuration is precise at all times, and any version from the version control system may be reverted to.

Some Container Specific Operating Systems are more similar to general-purpose Linux distributions, such as VMware’s PhotonOS, which has fewer packages loaded than a typical Linux distribution but still has a package manager, SSH access, and does not mount file systems as read-only. People are occasionally perplexed by the fact that “cloud optimised” versions of general-purpose Linux systems remain general-purpose Linux systems. Ubuntu, for example, distributes “cloud images” that have been “customised by Ubuntu engineering to operate on public clouds.” Nonetheless, they are still full-fledged Linux distributions with all of the packages – with the addition of the cloud-init package to make it easier to set them to boot without human interaction.

Top Kubernetes & Docker Topics

CoreOS was the first widely used container-specific operating system, popularising the concept of running all processes in containers for increased security and isolation. Additionally, CoreOS ditched the package management in favour of rebooting into one of two read-only /usr partitions to insure atomic upgrades that could be undone. Therefore, CoreOS, on the other hand, has been retired by RedHat after its takeover.

Current Container Specific OSs all take the attitude of being minimal (with only a few packages loaded); locked down (to some extent); running processes in containers (for improved security, stability, and service isolation); and offering atomic updates (by booting into one bootable partition, and updating the other).

These are some examples:

  • Google’s “Container-Optimized OS“, which supports a read only root fs, but allows SSH and only runs in GCP
  • RancherOS, which runs SSH and does not use readonly file system to protect root.
  • K3os, is also by rancher, but does not run a full vanilla K8s distribution. Management is via Kubectl, but SSH is supported.
  • AWS Bottlerocket is another OS with immutable root fs and SSH support, that is, at least initially, focussed on AWS workloads.

Talos OS, the most opinionated of the Container Specific Operating Systems, is an exception. Secondly, Talos OS, like the others, is stripped-down, with no package management, read-only file systems (except /var and /etc/kubernetes, and one or two specific writeable but ephemeral (reset on reboot) files like /etc/resolv.conf), and integration with K8s through an upgrade controller. Thirdly, Talos OS, on the other hand, takes the notion of immutable infrastructure a step farther than previous operating systems by eliminating all SSH and console access and replacing it with API-based access and administration.

Other Courses related to Kubernetes Operating Systems

  1. Top Kubernetes Courses Online – IT & Software Development [Udemy]
  2. 10 Free Kubernetes Courses Online Tutorials [Udemy]
  3. Linux Operating Systems for Kubernetes – OS Support
  4. Container 101 Tutorials: Kubernetes Technology
  5. 10 Best Udemy Kubernetes Courses & Tutorials [2021]

There are API methods for everything you’d want to do on a Kubernetes node – list all the containers, investigate the network setup, etc. – but none for things you shouldn’t perform on a node, such as unmounting a file system. Talos also decided to completely redesign the Linux Init system to accomplish only one thing: launch Kubernetes. There is no way to manage user-defined services (they should all be managed through Kubernetes.) This eliminates security exposure (no ssh, no console), maintenance (no users, no patching), and the effect of any CVE (as file systems are immutable and ephemeral).

You might not think that giving up SSH access, limiting SRE activities, and forcing nodes to be totally immutable is a good idea – but that was also the case against immutable containers not long ago, so it’s worth considering. Having an API-managed OS also lends itself nicely to large-scale operations and management: it’s the same API call with various arguments to inspect the logs for a certain container on one node, one class of nodes, or all nodes.

Summary

If you’ve taken the cattle-not-pets approach to container management, which involves destroying a container and launching a new version whenever an update or repair is needed, it’s only natural to take the same approach to the infrastructure that supports the containers.

It may take some time to get used to the idea that your nodes should be maintained in the same way as containers are, with updates being destroyed and re-provisioned instead of patching, but using a Container Specific OS may help boost adoption, minimise administrative overhead, and improve security. Without the possibility for a sysadmin or developer to adjust a configuration to “just get it functioning,” the risk of human mistakes or misconfigurations that break the next upgrade is reduced.

Given that many businesses are still in the early stages of adopting Kubernetes, now is an excellent moment to learn about this new generation of operating systems. It is feasible to consider the entire Kubernetes cluster as a computer, decrease overhead, and nurture greater security by tightly integrating the OS with Kubernetes. This keeps the focus on the workloads and value that the computational infrastructure provides, and it’s another step towards an API-driven datacenter.

- Advertisement -

Related Stories