Part II: Kubernetes - High Availability

Before You Begin

Please review this document thoroughly before executing any of the instructions. Many errors and troubleshooting details are provided that may save you time and excessive effort.

High Availability (HA)

One way Kubernetes offers high availability for workloads and applications is by using multiple master (control plane) nodes with multiple worker nodes so there is no single point of failure. Using a load balancer in front of the master nodes ensures traffic flows across all control planes, and if one of the master node fails all traffic gets directed another master.

Another way to achieve high availability is by placing one or more master node components, for instance the cluster database (Etcd), on multiple servers, and establish intercommunication among all Etcd processes so data cluster can be replicated between them. If one Etcd process fails, there is no loss of cluster configurations.

Both of these methods can be used together to provide increased reliability.

Two Workers and a Master Node

To get hands-on experience with high availability, these instructions detail how to set up a master node with two worker nodes using Ubuntu servers. This document does not provide steps for creating those servers. Local virtualization software or cloud provider compute instances work.

A minimum of two vCPUs and two GB RAM is required, but plan to use more resources for better performance. The cluster will not start if it detects lower cpu and memory availability.

Configure the Master Node

Log into the server that will serve as the control plane with an account that has sudo access.

Follow these installation procedures.

Install Docker

sudo apt update -y && sudo apt upgrade -y
sudo apt-get install -y docker.io
sudo apt-get install -y apt-transport-https curl

Install Kubernetes Components

sudo curl https://packages.cloud.google.com/apt/doc/apt-key.gpg --output /etc/apt/trusted.gpg.d/k8s-apt-key.gpg # sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main" -ysudo apt update
sudo apt install kubeadm kubelet kubectl -y

Errors Received While Installing the Kubernetes Components

I received these errors every time I attempted to install the kube* tools.

Repository: 'deb http://apt.kubernetes.io/ kubernetes-xenial main'Description:Archive for codename: kubernetes-xenial components: mainE: Conflicting values set for option Trusted regarding source http://apt.kubernetes.io/ kubernetes-xenialE: The list of sources could not be read.E: Conflicting values set for option Trusted regarding source http://apt.kubernetes.io/ kubernetes-xenialE: The list of sources could not be read.E: Conflicting values set for option Trusted regarding source http://apt.kubernetes.io/ kubernetes-xenialE: The list of sources could not be read.Failed to install kubeadm kubelet kubectl. Please retry

To resolve this issue, remove the contents of these two files, and reinstall.

/etc/apt/sources.list.d/kubernetes.list/etc/apt/sources.list.d/archive_uri-http_apt_kubernetes_io_-jammy.list

Disable and Turn Off Swap

swapoff -a && sed -i '/swap/d' /etc/fstab

Initialize the Cluster

Documentation on how to create a cluster with kubeadm can be found at this link.

Initialize a cluster using the following command.

sudo kubeadm init --apiserver-advertise-address=<master node IP address> --pod-network-cidr=<pod network ip>

where apiserver-advertise-address is the ip address of the master node (use the default network ip address).

The “ -pod-network-cidr” and “-apiserver-advertise-address’ statements are optional.

Make a note of the kubeadm join command that is listed at the end of the command’s output. It will be needed to join worker nodes to the cluster.

Execute these three commands to complete the installation.

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Troubleshooting Note

If these errors are generated,

[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use [ERROR Port-2380]: Port 2380 is in use

reset the cluster.

kubeadm reset

Run the kubeadm init command again.

If you are using the root account, run this command.

export KUBECONFIG=/etc/kubernetes/admin.conf

A High Level Look at the Kubernetes Network Model

Detailed information about Services, Load Balancing, and Networking can be found at this link.

The documentation found at that link notes:

* pods can communicate with all other pods on any other node without NAT

* agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node

It also states:

Kubernetes networking addresses four concerns:

Containers within a Pod use networking to communicate via loopback.

Cluster networking provides communication between different Pods.

The Service resource lets you expose an application running in Pods to be reachable from outside your cluster.

You can also use Services to publish services only for consumption inside your cluster.

This document focuses establishing network connectivity for pods running on different worker nodes by using a Container Network Interface (CNI), and more specifically, by using Calico.

Calico is “an open source networking and network security solution for containers, virtual machines, and native host-based workloads”.

For full instructions on how to install Calico on public cloud, managed public cloud, or on-premises, use this link.

Install the Calico network plugin.

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Show all namespaces within the cluster, and verify all have the “Running” status.

kubectl get pods --all-namespaces
Something is not quite right.

Configuring the Worker Nodes

Log into the first worker node.

Install Docker

sudo apt update -y && sudo apt upgrade -y
sudo apt-get install -y docker.io
sudo apt-get install -y apt-transport-https curl

Install Kubernetes Components

sudo curl https://packages.cloud.google.com/apt/doc/apt-key.gpg --output /etc/apt/trusted.gpg.d/k8s-apt-key.gpg # This line was apt-keysudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main" -ysudo apt update
sudo apt install kubeadm kubelet kubectl -y
sudo apt-mark hold kubelet kubeadm kubectl

A Quick Note

If you are using cloud compute instances, take a snapshot of these servers in case a need arises to revert to the original configuration.

Join the Worker Node to the Cluster

This requires the kubeadm join command previously mentioned during the cluster creation section.

Find kubeadm join command, and execute it.

sudo kubeadm join 172.31.8.229:6443 --token 84k88a.0vtnmokcliaeoobv \
--discovery-token-ca-cert-hash sha256:a1f63387a5992688f729958d2571415dc7a38a5cbd1ae75b363e17253ee913be

If the original token provided at the end of the cluster installation is not available, create a new token using this command.

sudo kubeadm token create --print-join-command

Repeat the above steps on each worker node.

On the master node, verify the worker nodes have joined the cluster.

kubectl get nodes
Workers eventually enter the ‘Ready’ status

Time for a Reality Check

This article has been delayed for several days?

Why?

My cluster would start, be responsive, and then stop running. After creating multiple new environments and spending hours researching and troubleshooting, there was still no progress.

The current /var/log/syslog file was full of errors.

Here is a small sample of the errors.

Jun 16 15:31:25 localhost kubelet[453]: E0616 15:31:25.400055     453 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-flannel\" with CrashLoopBackOff: \"back-off 1m20s restarting failed container=kube-flannel pod=kube-flannel-ds-lg5q5_kube-system(d1365521-0b9c-43bb-b6b1-5f7324ba5466)\"" pod="kube-system/kube-flannel-ds-lg5q5" podUID=d1365521-0b9c-43bb-b6b1-5f7324ba5466Jun 16 15:31:26 localhost containerd[492]: time="2022-06-16T15:31:26.399268850Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:coredns-6d4b75cb6d-vsv27,Uid:66fee7e4-47ac-4e6e-b461-2058192f5ea3,Namespace:kube-system,Attempt:0,}"Jun 16 15:31:26 localhost containerd[492]: time="2022-06-16T15:31:26.448132869Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:coredns-6d4b75cb6d-vsv27,Uid:66fee7e4-47ac-4e6e-b461-2058192f5ea3,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox \"09dac84b2502cb73358e9822ae55b71cb45e2417696b12f70ad8ba47b547f539\": open /run/flannel/subnet.env: no such file or directory"Jun 16 15:31:26 localhost kubelet[453]: E0616 15:31:26.449344     453 remote_runtime.go:212] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"09dac84b2502cb73358e9822ae55b71cb45e2417696b12f70ad8ba47b547f539\": open /run/flannel/subnet.env: no such file or directory"Jun 16 15:31:26 localhost kubelet[453]: E0616 15:31:26.449417     453 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"09dac84b2502cb73358e9822ae55b71cb45e2417696b12f70ad8ba47b547f539\": open /run/flannel/subnet.env: no such file or directory" pod="kube-system/coredns-6d4b75cb6d-vsv27"Jun 16 15:31:26 localhost kubelet[453]: E0616 15:31:26.449449     453 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"09dac84b2502cb73358e9822ae55b71cb45e2417696b12f70ad8ba47b547f539\": open /run/flannel/subnet.env: no such file or directory" pod="kube-system/coredns-6d4b75cb6d-vsv27"Jun 16 15:31:26 localhost kubelet[453]: E0616 15:31:26.449520     453 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"coredns-6d4b75cb6d-vsv27_kube-system(66fee7e4-47ac-4e6e-b461-2058192f5ea3)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"coredns-6d4b75cb6d-vsv27_kube-system(66fee7e4-47ac-4e6e-b461-2058192f5ea3)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"09dac84b2502cb73358e9822ae55b71cb45e2417696b12f70ad8ba47b547f539\\\": open /run/flannel/subnet.env: no such file or directory\"" pod="kube-system/coredns-6d4b75cb6d-vsv27" podUID=66fee7e4-47ac-4e6e-b461-2058192f5ea3

The Kubernetes component log files are found in /var/log/container.

These are the files you need to explore to determine what the problem is, and there is plenty of information online to assist in finding a resolution.

etcd-kubernetes-master_kube-system_etcd-*
kube-apiserver-kubernetes-master_kube-system_kube-apiserver-*
kube-controller-manager-kubernetes-master_kube-system_kube-controller-manager-*kube-proxy-r47t4_kube-system_kube-proxy-*kube-scheduler-kubernetes-master_kube-system_kube-scheduler-*

Examining these logs gave insight but no resolution was found.

SUMMARY — REALITY CHECK!!!

Several days have passed, and this current configuration is still not error free.

A new direction is needed.

In an effort to save time, the following series of photos acts as a summary of this author’s thoughts and feelings regarding how incredibly valuable and rewarding troubleshooting this cluster has been.

HOW DARE YOU FRUSTRATION ME WITH ALL YOUR “INSTALL THIS” AND “USE KUBECTL AND KUBEADM COMMANDS TO DO THIS”, AND “GO GET SOME COFFEE AND COME BACK TO TROUBLESHOOT SOME MORE!!! TRUST ME IT WILL WORK THIS TIME”! DIDN’T YOUR MOTHER TEACH YOU BETTER THAN THIS?!?!?!
My personal, heartfelt message to this particular Kubernetes cluster.
KUBERNETES, YOUR MOTHER’S A DONKEY!!
K8s thrilled me so much I could contain neither my excitement nor my joy of orchestrating clusters.
Another kube-proxy error? How wonderful! Let’s research to find the answer!
Let’s call everyone, and tell them how much Kubernetes knowledge we’ve gained!
Here’s an idea. Let’s build a new K8s cluster! It will be fun!
Doctor approving a return home after a lengthy stay in his medical facility’s padded room “resort”. Good Times!

!! News Flash !!

A fully functional Kubernetes cluster was successfully created. As of this moment, it has been running for the past three hours without issue.

How was it done?

Find out in the next write up which is tentatively titled,

Redemption: Return of the Orchestrator

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store